[Stratascratch - HARD] Top 5 States With 5 Star Businesses (Top N 구하기, CTE+SUBQUERY)

Yelp Interview Question

Problem

  • Difficulty level : Hard, Interview Question Date : March 2020

Find the top 5 states with the most 5 star businesses.

  • Output the state name along with the number of 5-star businesses and order records by the number of 5-star businesses in descending order. If two states have the same result, sort them in alphabetical order.
  • In case there are ties in the number of businesses, return all the unique states.

Table : yelp_business

business_id varchar
name varchar
neighborhood varchar
address varchar
city varchar
state varchar
postal_code varchar
latitude float
longitude float
stars float
review_count int
is_open int
categories varchar

My Solution

My Logic

  1. state별 5-star bussiness_id를 구한 변수 n_businesses를 포함한 CTE1을 만든다.
  2. CTE1에서 n_businesses를 기준으로 TOP5 state를 구해 CTE2를 만든다.
    • 순위 윈도우함수의 결과가 5이하인 경우의 state만 top_five 변수에 담는다.
    • CASE WHEN THEN END
    • (이때, 순위 5초과인 경우는 EMPTY로 출력이 된다. )
  3. CTE1에서 top_five에 해당하는 state 에 대해 state, n_businesses 출력한다.
    • SUBQUERY를 사용해 CTE2에서 top_five 정보를 참조한다.
    • 문제의 정렬조건을 따른다.
/*
Find the top 5 states with the most 5 star businesses. 
*/

-- COUNT THE NUMBER OF 5-star BUSSINESSES BY STATE
WITH CTE1 AS(
    SELECT 
        state,
        COUNT(business_id) AS `n_businesses`
    FROM yelp_business
    WHERE stars = 5
    GROUP BY 1
) 

-- GET THE RANK BY n_businesses
, CTE2 AS (
    SELECT
        -- In case there are ties in the number of businesses, return all the unique states. 
        CASE 
            WHEN RANK() OVER(ORDER BY n_businesses DESC) <=5 
            THEN state 
            END AS `top_five`
    FROM CTE1
)

-- Output the state name along with the number of 5-star businesses
SELECT state, n_businesses
FROM CTE1
-- SUBQUERY : GET THE TOP 5 STATES WITH THE MOST n_businesses
WHERE state 
        IN (
            SELECT top_five 
            FROM CTE2 
            WHERE top_five IS NOT NULL
            )
ORDER BY 
    -- Order records by the number of 5-star businesses in descending order. 
    2 DESC, 
    -- If two states have the same result, sort them in alphabetical order.
    1 ASC    

Lessons learned

IN 서브쿼리에서 윈도우함수를 사용할 수 없음 오류

내가 처음 작성한 코드

CTE2 를 별도로 생성하지 않고 서브쿼리내에서 "윈도우함수" 조건문을 작성했었다.

WHERE state 
        IN (
        -- GET THE TOP 5 STATES WITH THE MOST n_businesses
        SELECT
            -- In case there are ties in the number of businesses, return all the unique states. 
            CASE 
                WHEN RANK() OVER(ORDER BY n_businesses DESC) <= 5  
                THEN state 
                END AS `state`
        FROM CTE1
        )

SQL error

(pymysql.err.NotSupportedError) (1235, "This version of MySQL doesn't yet support 'the combination of this ALL/ANY/SOME/IN subquery with this comparison operator and with contained window functions'")

결과 출력에 필요한 필드 선택 (SELECT * 지양하기)

이번 문제는 필드가 많은 테이블이 주어져서, 출력 변수에 필요한 변수들을 제일 먼저 필터링한 결과를 CTE1에 담았다.