[Stratascratch - HARD] Top Percentile Fraud (윈도우함수 NTILE()로 지정된 크기의 버킷/집합 생성하기)

Netflix Interview Question

Problem

  • Difficulty level : Hard (다른 Hard level대비 아주 간단한 문제)

To identify the top 5 percentile of claims from each state.Your output should be policy number, state, claim cost, and fraud score.

Table : fraud_score

policy_num varchar
state varchar
claim_cost int
fraud_score int

My Solution

WITH CTE AS (
    SELECT 
        *,
        -- PERCENTILE of FRAUDULENT claims from each STATE
        NTILE(100) OVER (PARTITION BY state ORDER BY fraud_score DESC) AS pct
    FROM fraud_score
)

-- output : policy number, state, claim cost, fraud score.
SELECT 
    policy_num, 
    state, 
    claim_cost, 
    fraud_score
FROM CTE

-- top 5 pct
WHERE pct >= 5

Lessons learned

고정된 수의 버킷으로 나누고, 정렬할 때 사용하는 윈도우 함수 NTILE() OVER()

NTILE(number_expression) OVER(PARTITION BY .. ORDER BY ..)

  • number_expression 만큼 등분하여 출력하는 함수
  • assigns a number_expression for every row in a group, to which the row belongs.
NTILE(number_expression) OVER (
   [PARTITION BY partition_expression ]
   ORDER BY sort_expression [ASC | DESC]
)