Window Function

先分举报量分层取各个举报原因对应的举报次数,然后通过开窗函数取每个举报量分层中举报次数排第一的举报原因

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
SELECT
    DISTINCT report_cnt_range,
    report_reason_name,
    report_cnt,
    ROW_NUMBER () OVER (
        PARTITION BY report_cnt_range
        ORDER BY
            report_cnt
    ) as rank
FROM
    (
        SELECT
            DISTINCT user_range.report_cnt_range as report_cnt_range,
            report_detail.report_reason_name as report_reason_name,
            COUNT (DISTINCT report_detail.object_id) as report_cnt
        FROM
            XXX
    )
WHERE
    rank = 1

WR-A Sampling

Use log to retain precision

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
SELECT
    room_id,
    log(rand) *(1 /(vv / vv_all)) AS key
FROM
    (
        SELECT
            type,
            room_id,
            vv,
            sum(vv) OVER(partition by type) AS all_vv,
            rand() AS rand
        FROM
            (
                SELECT
                    'all' AS type,
                    room_id,
                    vv
                FROM
                    room_id_vv
            )
    )
ORDER BY
    key DESC
LIMIT
    100

COLLECT_SET and SORT_ARRAY

如果需要array中的元素保持一定的顺序,可以使用sort_array函数来实现的:

sort_array(Array<T> a) 根据数组元素的自然顺序按升序对输入数组排序并返回它

select sort_array(collect_set(col))