Stratascratch Microsoft Interview Question - Premium vs Freemium
Problem
Difficulty 분류 : HARD
Find the total number of downloads for paying and non-paying users by date.
- ONLY records where non-paying customers have more downloads than paying customers.
- OUTPUT
- sorted by earliest date first
- 3 COL :
columns date
,non-paying downloads
,paying downloads
.
MySolution
My Logic 3 key points : multiple JOINs, CASE statement, Subquery/CTEs
- multiple JOINs - JOIN 3 Tables
- categorize paying / non-paying
- aggreagate number of downloads - paying & nonpaying
- filter the records - non-paying downloads > paying downloads
1. CTE
/*
ms_user_dimension : user_id, acc_id
ms_acc_dimension : acc_id, paying_customer
ms_download_facts : date, user_id, downloads
OUTPUT COL : date | non-paying downloads | paying downloads |
*/
WITH CTE AS (
SELECT
date,
-- total number of downloads by paying and non-paying users
SUM(CASE WHEN paying_customer = 'yes' THEN downloads END) AS `paying`,
SUM(CASE WHEN paying_customer = 'no' THEN downloads END) AS `non_paying`
FROM ms_user_dimension u
LEFT JOIN ms_acc_dimension a
ON u.acc_id = a.acc_id
LEFT JOIN ms_download_facts d
ON u.user_id = d.user_id
GROUP BY date -- SUM by date
ORDER BY date -- ASC sorted by earliest date first
)
SELECT
date,
non_paying,
paying
FROM CTE
-- Include only records where non-paying customers have more downloads than paying customers.
WHERE non_paying > paying
2. Subquery
서브쿼리로도 작성가능하다. 단, MySQL 에서는 서브쿼리에도 Alias를 작성해줘야함을 잊지말 것!
SELECT date, non_paying, paying
FROM (
SELECT
date,
SUM(CASE WHEN paying_customer = 'yes' THEN downloads END) AS `paying`,
SUM(CASE WHEN paying_customer = 'no' THEN downloads END) AS `non_paying`
FROM ms_user_dimension u
LEFT JOIN ms_acc_dimension a
ON u.acc_id = a.acc_id
LEFT JOIN ms_download_facts d
ON u.user_id = d.user_id
GROUP BY date -- DAILY SUM
ORDER BY date -- ASC
) subqueryAlias
WHERE non_paying > paying
Lessons Learned
CASE 문으로 OUTPUT 변수 만들기
수정 전
user_type이라는 변수를 만들어 다시 집계할 생각으로 쿼리를 작성했다.
CASE WHEN paying_customer = "yes" THEN paying
ELSE nonpaying
END
AS user_type,
수정 후
집계함수 SUM
와 CASE
문 함께 사용해 보다 효율적인 쿼리를 작성했다.
-- total number of downloads* by paying and non-paying users
SUM(CASE WHEN paying_customer = 'yes' THEN downloads END) AS `paying`,
SUM(CASE WHEN paying_customer = 'no' THEN downloads END) AS `non-paying`,
..
GROUP BY date -- SUM by date