Window Function#
先分举报量分层取各个举报原因对应的举报次数,然后通过开窗函数取每个举报量分层中举报次数排第一的举报原因
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
SELECT
DISTINCT report_cnt_range,
report_reason_name,
report_cnt,
ROW_NUMBER () OVER (
PARTITION BY report_cnt_range
ORDER BY
report_cnt
) as rank
FROM
(
SELECT
DISTINCT user_range.report_cnt_range as report_cnt_range,
report_detail.report_reason_name as report_reason_name,
COUNT (DISTINCT report_detail.object_id) as report_cnt
FROM
XXX
)
WHERE
rank = 1
|
WR-A Sampling#
Use log to retain precision
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
SELECT
room_id,
log(rand) *(1 /(vv / vv_all)) AS key
FROM
(
SELECT
type,
room_id,
vv,
sum(vv) OVER(partition by type) AS all_vv,
rand() AS rand
FROM
(
SELECT
'all' AS type,
room_id,
vv
FROM
room_id_vv
)
)
ORDER BY
key DESC
LIMIT
100
|
COLLECT_SET and SORT_ARRAY#
如果需要array中的元素保持一定的顺序,可以使用sort_array
函数来实现的:
sort_array(Array<T> a)
根据数组元素的自然顺序按升序对输入数组排序并返回它
select sort_array(collect_set(col))