- Difficulty level : Hard
- SQL window functions for time series analysis
Table : google_gmail_emails
id | int |
---|---|
from_user | varchar |
to_user | varchar |
day | int |
- Email activity rank = total number of emails sent
- Output Columns : from_user | total_emails | activity_rank
ORDER BY total_emails DESC, from_user -- ASC
- unique rank value →
RANK()
My Solution
- Get a total_emails by user
- Get the rank - 2 ordering steps : total_emails DESC, from_user ASC
- Sort the result : total_emails DESC, from_user ASC
SELECT
from_user,
-- the number of emails by from_user with GROUP BY
count(*) AS total_emails,
-- rank window function
-- two disctinct ordering steps : total_emails DESC, from_user ASC
RANK() OVER (ORDER BY count(*) DESC, from_user) as `activity_rank`
FROM google_gmail_emails
-- aggregate the number of emails by from_user
GROUP BY 1
-- sort the result
ORDER BY 2 DESC, 1