[Stratascratch-HARD] Premium vs Freemium (다중 테이블 조인, CASE문으로 변수 만들기, 서브쿼리/CTE)

Stratascratch Microsoft Interview Question - Premium vs Freemium

Problem

Difficulty 분류 : HARD

Find the total number of downloads for paying and non-paying users by date.

  • ONLY records where non-paying customers have more downloads than paying customers.
  • OUTPUT
    • sorted by earliest date first
    • 3 COL : columns date, non-paying downloads, paying downloads.

MySolution

My Logic 3 key points : multiple JOINs, CASE statement, Subquery/CTEs

  1. multiple JOINs - JOIN 3 Tables
  2. categorize paying / non-paying
  3. aggreagate number of downloads - paying & nonpaying
  4. filter the records - non-paying downloads > paying downloads

1. CTE

/*
ms_user_dimension : user_id, acc_id
ms_acc_dimension : acc_id, paying_customer
ms_download_facts : date, user_id, downloads

OUTPUT COL : date | non-paying downloads | paying downloads |
*/
WITH CTE AS (
    SELECT 
        date,
        -- total number of downloads by paying and non-paying users 
        SUM(CASE WHEN paying_customer = 'yes' THEN downloads END) AS `paying`,
        SUM(CASE WHEN paying_customer = 'no' THEN downloads END) AS `non_paying`
    FROM ms_user_dimension u
    LEFT JOIN ms_acc_dimension a
        ON u.acc_id = a.acc_id
    LEFT JOIN ms_download_facts d
        ON u.user_id = d.user_id
    GROUP BY date -- SUM by date
    ORDER BY date -- ASC sorted by earliest date first 
)

SELECT 
    date, 
    non_paying, 
    paying
FROM CTE

-- Include only records where non-paying customers have more downloads than paying customers.
WHERE non_paying > paying

2. Subquery

서브쿼리로도 작성가능하다. 단, MySQL 에서는 서브쿼리에도 Alias를 작성해줘야함을 잊지말 것!

SELECT date, non_paying, paying
FROM (
    SELECT 
        date,
        SUM(CASE WHEN paying_customer = 'yes' THEN downloads END) AS `paying`,
        SUM(CASE WHEN paying_customer = 'no' THEN downloads END) AS `non_paying`
    FROM ms_user_dimension u

    LEFT JOIN ms_acc_dimension a
        ON u.acc_id = a.acc_id
    LEFT JOIN ms_download_facts d
        ON u.user_id = d.user_id

    GROUP BY date -- DAILY SUM 
    ORDER BY date -- ASC 

) subqueryAlias

WHERE non_paying > paying

Lessons Learned

CASE 문으로 OUTPUT 변수 만들기

수정 전

user_type이라는 변수를 만들어 다시 집계할 생각으로 쿼리를 작성했다.

CASE WHEN paying_customer = "yes" THEN paying 
        ELSE nonpaying
        END
        AS user_type,

수정 후

집계함수 SUMCASE 문 함께 사용해 보다 효율적인 쿼리를 작성했다.

    -- total number of downloads* by paying and non-paying users
SUM(CASE WHEN paying_customer = 'yes' THEN downloads END) AS `paying`,
SUM(CASE WHEN paying_customer = 'no' THEN downloads END) AS `non-paying`,
..
GROUP BY date -- SUM by date