Stratascratch Meta/Facebook Interview Question - Rank Variance Per Country
Problem
- Difficulty level : Hard
- Find the countries have risen in the rankings based on the number of comments between Dec 2019 vs Jan 2020? (Avoid gaps between ranks when ranking countries.)
Table : fb_comments_count
user_id | int |
---|---|
created_at | datetime |
number_of_comments | int |
Table : fb_active_users
user_id | int |
---|---|
name | varchar |
status | varchar |
country | varchar |
My Solution
1. 3 CTEs + 2 JOINs
- 두 테이블을 조인하면서 필요한 정보만 필터링한 기본 테이블
CTE1
- 두 개의 다른 시기의 랭킹 변화를 구하기 위해 각 시기의 랭킹을 구하는
CTE2
CTE3
CTE2
CTE3
를 조인 후,CTE3
의 정보를 기준으로 랭킹이 증가한 국가를 출력한다.- 랭킹이 증가한 국가 조건을
WHERE
절에 작성했다.CTE3
에서만 등장한 국가 혹은CTE2
에서는 등장하지 않은 국가인 경우CTE3
과CTE2
ranking
변수의 차이가 0보다 큰 국가인 경우
( 1의 조건에 해당하는 레코드가 있는 경우를 대비해 NULL값에도 수식이 성립하도록COALESCE(CTE3.ranking,0)
)
- 랭킹이 증가한 국가 조건을
-- 1. LEFT JOIN `fb_comments_count` <- `fb_active_users`
-- 2. Filter records Dec 2019 - Jan 2020
-- 3. RANKING WINDOW FUNCTIONS - FOR DEC2019 JAN2020 SEPARATELY
-- 4. FIND THE COUNTRIES THAT HAVE RISEN IN RANKINGS (2CASES)
WITH CTE1 AS (
SELECT
u.country,
-- EXTRACT THE YEAR-MONTH FROM DATE COLUMN
EXTRACT(YEAR_MONTH FROM created_at) AS `period`,
-- AGGREGATE : SUM BY country & period(year-month)
SUM(number_of_comments) AS `number_of_comments`
FROM fb_comments_count c
-- JOIN TYPE : INNER JOIN
-- fb_active_users : there are records which don't have country information
JOIN fb_active_users u
ON c.user_id = u.user_id
-- Filter the records : Dec 2019, Jan 2020
WHERE created_at between '2019-12-01' AND '2020-01-31'
GROUP BY 1,2
)
-- 201912 RANKING INFORMATION
, CTE2 AS (
SELECT
country,
-- Avoid gaps between ranks when ranking countries : DENSE_RANK()
DENSE_RANK() OVER (ORDER BY number_of_comments DESC) AS ranking
FROM CTE1
WHERE period="201912"
)
-- 202001 RANKING INFORMATION
, CTE3 AS (
SELECT
country,
-- Avoid gaps between ranks when ranking countries : DENSE_RANK()
DENSE_RANK() OVER (ORDER BY number_of_comments DESC) AS ranking
FROM CTE1
WHERE period="202001"
)
-- RIGHT JOIN : WE NEED THE CHANGE ON RANKING >> CTE3
-- CTE3.country (NOT CTE2.country)
SELECT CTE3.country
FROM CTE2 -- 201912ranking
RIGHT JOIN CTE3 -- 202001ranking
ON CTE2.country = CTE3.country
-- countries which have RISEN in the rankings
-- NO RANKING INFORMATION ON 201912 PERIOD
-- OR CHANGE ON RANKING IS GREATER THAN 0
WHERE ISNULL(CTE2.ranking)
OR COALESCE(CTE3.ranking,0) - CTE2.ranking < 0
✔️ CTE를 3개를 사용하는 쿼리보다 효율적인 쿼리를 고민해봐야할 것 같다.
2. 2 CTEs + 3 JOINs
WITH CTE1 AS (
SELECT
country,
-- AGGREGATE : count(*) FOR SPECIFIC PERIOD
DENSE_RANK() OVER (ORDER BY count(*) DESC) AS `ranking`
FROM fb_comments_count c
-- JOIN TYPE : INNER JOIN
-- fb_active_users : there are records which don't have country information
JOIN fb_active_users u
ON c.user_id = u.user_id
-- Filter the records : Jan 2020
-- EXTRACT THE YEAR-MONTH FROM DATE COLUMN : `EXTRACT(YEAR_MONTH FROM created_at)`
WHERE EXTRACT(YEAR_MONTH FROM created_at) = "201912"
-- TO REMOVE DUPLICATE ROWS
GROUP BY 1
), CTE2 AS (
SELECT
country,
-- EXTRACT THE YEAR-MONTH FROM DATE COLUMN : `EXTRACT(YEAR_MONTH FROM created_at)`
-- AGGREGATE : count(*) FOR SPECIFIC PERIOD
DENSE_RANK() OVER (ORDER BY count(*) DESC) AS `ranking`
FROM fb_comments_count c
-- JOIN TYPE : INNER JOIN
-- fb_active_users : there are records which don't have country information
JOIN fb_active_users u
ON c.user_id = u.user_id
-- Filter the records : Jan 2020
WHERE EXTRACT(YEAR_MONTH FROM created_at) = "202001"
-- TO REMOVE DUPLICATE ROWS
GROUP BY 1
)
SELECT CTE2.country
FROM CTE2 -- 202001ranking
LEFT JOIN CTE1 -- 201912ranking
ON CTE2.country = CTE1.country
WHERE ISNULL(CTE1.ranking)
OR CTE2.ranking - COALESCE(CTE1.ranking, 0) < 0
- Submission
Lessons Learned
NULL값이 연산이 안될 경우를 대비하기 COALESCE()
WHERE ISNULL(CTE1.ranking) OR CTE2.ranking - COALESCE(CTE1.ranking, 0) < 0
To avoid "gaps" between ranks : DENSE_RANK()
RANK()
vs. DENSE_RANK()
테이블 조인타입 → 누락된 정보 확인
fb_comments_count
와 fb_active_users
를 LEFT JOIN
했다. fb_active_users
테이블에 유저 정보 특히 country
가 없는 경우가 있었고, country
변수로 집계하는 문제이므로, 해당 정보가 없는 경우는 의미가 없기 때문이다. (문제에서 특별한 조건이 없는 경우)
-- there are records which don't have country information
SELECT *
FROM fb_comments_count c
LEFT JOIN fb_active_users u
ON c.user_id = u.user_id
수식 실수 주의하기
랭킹이 더 높아진 경우를 출력하는 문제이다. 랭킹이 높아졌다 : 후자의 랭킹 - 전자의 랭킹 < 0
COALESCE(CTE3.ranking,0) - CTE2.ranking < 0