Problem
- Difficulty level : Hard (다른 Hard level대비 아주 간단한 문제)
To identify the top 5 percentile of claims from each state.Your output should be policy number, state, claim cost, and fraud score.
Table : fraud_score
policy_num | varchar |
---|---|
state | varchar |
claim_cost | int |
fraud_score | int |
My Solution
WITH CTE AS (
SELECT
*,
-- PERCENTILE of FRAUDULENT claims from each STATE
NTILE(100) OVER (PARTITION BY state ORDER BY fraud_score DESC) AS pct
FROM fraud_score
)
-- output : policy number, state, claim cost, fraud score.
SELECT
policy_num,
state,
claim_cost,
fraud_score
FROM CTE
-- top 5 pct
WHERE pct >= 5
Lessons learned
고정된 수의 버킷으로 나누고, 정렬할 때 사용하는 윈도우 함수 NTILE() OVER()
NTILE(number_expression) OVER(PARTITION BY .. ORDER BY ..)
number_expression
만큼 등분하여 출력하는 함수- assigns a number_expression for every row in a group, to which the row belongs.
NTILE(number_expression) OVER (
[PARTITION BY partition_expression ]
ORDER BY sort_expression [ASC | DESC]
)