Problem
- Difficulty level : Hard, Interview Question Date : March 2020
Find the top 5 states with the most 5 star businesses.
- Output the state name along with the number of 5-star businesses and order records by the number of 5-star businesses in descending order. If two states have the same result, sort them in alphabetical order.
- In case there are ties in the number of businesses, return all the unique states.
Table : yelp_business
business_id | varchar |
---|---|
name | varchar |
neighborhood | varchar |
address | varchar |
city | varchar |
state | varchar |
postal_code | varchar |
latitude | float |
longitude | float |
stars | float |
review_count | int |
is_open | int |
categories | varchar |
My Solution
My Logic
state
별 5-starbussiness_id
를 구한 변수n_businesses
를 포함한CTE1
을 만든다.CTE1
에서n_businesses
를 기준으로 TOP5 state를 구해CTE2
를 만든다.- 순위 윈도우함수의 결과가 5이하인 경우의 state만
top_five
변수에 담는다. CASE WHEN THEN END
- (이때, 순위 5초과인 경우는 EMPTY로 출력이 된다. )
- 순위 윈도우함수의 결과가 5이하인 경우의 state만
CTE1
에서top_five
에 해당하는state
에 대해state, n_businesses
출력한다.- SUBQUERY를 사용해
CTE2
에서top_five
정보를 참조한다. - 문제의 정렬조건을 따른다.
- SUBQUERY를 사용해
/*
Find the top 5 states with the most 5 star businesses.
*/
-- COUNT THE NUMBER OF 5-star BUSSINESSES BY STATE
WITH CTE1 AS(
SELECT
state,
COUNT(business_id) AS `n_businesses`
FROM yelp_business
WHERE stars = 5
GROUP BY 1
)
-- GET THE RANK BY n_businesses
, CTE2 AS (
SELECT
-- In case there are ties in the number of businesses, return all the unique states.
CASE
WHEN RANK() OVER(ORDER BY n_businesses DESC) <=5
THEN state
END AS `top_five`
FROM CTE1
)
-- Output the state name along with the number of 5-star businesses
SELECT state, n_businesses
FROM CTE1
-- SUBQUERY : GET THE TOP 5 STATES WITH THE MOST n_businesses
WHERE state
IN (
SELECT top_five
FROM CTE2
WHERE top_five IS NOT NULL
)
ORDER BY
-- Order records by the number of 5-star businesses in descending order.
2 DESC,
-- If two states have the same result, sort them in alphabetical order.
1 ASC
Lessons learned
IN 서브쿼리에서 윈도우함수를 사용할 수 없음 오류
내가 처음 작성한 코드
CTE2
를 별도로 생성하지 않고 서브쿼리내에서 "윈도우함수" 조건문을 작성했었다.
WHERE state
IN (
-- GET THE TOP 5 STATES WITH THE MOST n_businesses
SELECT
-- In case there are ties in the number of businesses, return all the unique states.
CASE
WHEN RANK() OVER(ORDER BY n_businesses DESC) <= 5
THEN state
END AS `state`
FROM CTE1
)
SQL error
(pymysql.err.NotSupportedError) (1235, "This version of MySQL doesn't yet support 'the combination of this ALL/ANY/SOME/IN subquery with this comparison operator and with contained window functions'")
결과 출력에 필요한 필드 선택 (SELECT *
지양하기)
이번 문제는 필드가 많은 테이블이 주어져서, 출력 변수에 필요한 변수들을 제일 먼저 필터링한 결과를 CTE1
에 담았다.