이커머스 유저 행동 데이터 (세션 로그 데이터)를 활용해서 퍼널 분석 과정에서 필요한 MySQL 쿼리, 분석 결과를 정리한 글입니다.
데이터 분석가는 Conversion Rate, Bounce Rate등의 지표를 기준으로 개선이 필요한 퍼널을 찾는다. 웹 페이지 퍼포먼스를 분석하기 위한 다양한 예시들을 통해 쿼리를 작성하고 결과를 해석해보자.
웹 페이지 퍼포먼스 분석
목적 : 개선이 필요한 퍼널을 찾기 위함
- TOP PAGES
- TOP ENTRY PAGES
- BOUNCE RATE
- LANDING PAGE TEST
- LANDING PAGE TREND
- CONVERSION FUNNEL
- CONVERSION FUNNEL TEST
1. TOP PAGES : 트래픽이 많은 페이지 파악
- 조건 : most-viewed website page, rank by session volume, date < 2012.06.09
- STEPS : url별 세션 수 > rank() over ()
with a as
(
# url 별 조회수(세션수)
select pageview_url url,
count(website_session_id) volume
from website_pageviews
where created_at < '2012-06-09'
group by 1
)
select url,
volume,
# grouped reference인 경우 partition by 하지 않아도 됨
rank() over (order by volume desc) as volume_rank
from a
order by volume_rank ;
url | volume | volume_rank |
---|---|---|
/home | 10301 | 1 |
/products | 4188 | 2 |
/the-original-mr-fuzzy | 3002 | 3 |
/cart | 1292 | 4 |
/shipping | 860 | 5 |
/billing | 709 | 6 |
/thank-you-for-your-order | 304 | 7 |
▶ 홈 > 상품 > 카트 > 배송 > 구매 > 구매완료 페이지 순서의 퍼널 확인 가능
2. TOP ENTRY PAGES : 유입량이 많은 페이지 파악
- 동일한 session_id 내 pageview_id 리스트 중 첫번째 page_view_id 를
entry page
# CTE
# all entry pages, rank by session volume, date < 2012.06.09
# with + row_number & CASE when page_url else null
with a as(
select
case when row_number() over ( # 페이지뷰리스트
partition by website_session_id # 동일한세션
order by website_pageview_id
) = 1 # 첫번째
then pageview_url
else null end landing_page
from website_pageviews
where created_at < '2012-06-09'
)
select landing_page ,
# count(session_id) group by entry
count(landing_page) sessions_hitting_this_lander
from a
# filter out entry null
where landing_page is not null
group by 1 ;
landing_page | sessionssessions_hitting_this_lander |
---|---|
/home | 10301 |
▶ /home 웹의 첫번째 홈 페이지로 제일 많이 유입된다.
3. BOUNCE RATE : Pageview별 Bounce Rate 파악
# 1. CTE
# 1) Landing Page by session
# 2) Pageview & Bounce by session
# 3) Total sessions & Bounced sessions by Landing Page & % of Sessions which Bounced
with a as (
select
website_pageview_id,
website_session_id,
# 1) 동일한 session_id내 session_pageview_id 마다 pageview_seq 순서 표시
row_number() over(
partition by website_session_id
order by website_pageview_id ) as pageview_seq
from website_pageviews
where created_at <'2012-06-09'
)
, b as (
select website_session_id,
# pageview_seq의 `최대`가 1인 경우 bounced session
case when max(pageview_seq) = 1 then 1 else 0 end as bounce
from a
group by 1
)
select
# Total Sessions
count(*) sessions ,
# Bounced Sessions
sum(case when bounce = 1 then 1 else 0 end ) as bounced_sessions ,
# % of Bounced Sessions
concat((sum(case when bounce = 1 then 1 else 0 end )/count(*)*100),'%') as bounce_rate
from b
# 2. Temporary Table
CREATE temporary table first_pageview
SELECT website_session_id,
min(website_pageview_id) as min_pv_id
FROM website_pageviews
where created_at < '2012-06-09'
group by 1;
# sessions_landing_page
CREATE TEMPORARY TABLE sessions_landing_page
select f.website_session_id,
w.pageview_url as landing_page
from first_pageview f
left join website_pageviews w on w.website_pageview_id = f.min_pv_id
where pageview_url = '/home' ;
# bounce_sessions
CREATE TEMPORARY TABLE bounce_sessions
select l.website_session_id,
l.landing_page,
count(w.website_pageview_id) as cnt_viewed
from sessions_landing_page l
left join website_pageviews w
on w.website_session_id=l.website_session_id
group by 1,2
having count(w.website_pageview_id) = 1; -- BOUNCE SESSION
select
count(distinct l.website_session_id) as sessions,
count(distinct b.website_session_id) as bounced_sessions,
count(distinct b.website_session_id)/count(distinct l.website_session_id) as bounce_rate
from sessions_landing_page l
left join bounce_sessions b
on l.website_session_id = b.website_session_id ;
sessions | bounced_sessions | bounce_rate |
---|---|---|
10301 | 6113 | 59.3438% |
▶ 전체 세션수의 약 60%가 랜딩페이지에서 Bouced됨. 유저의 액션을 이끌어내지 못하고 그대로 유출, 해당 페이지에 대한 테스트가 필요해보임.
4. LANDING PAGE TEST : 랜딩 페이지 Bounce rate 테스트
- Bounce rate 가 높은 entry page 에 대해, new custom landing page (/lander-1) 테스트 필요
- 50/50 Bounce rate comparison (/lander-1) vs (/home)
# 1. CTE
WITH a as (
select
w.website_session_id,
p.website_pageview_id ,
pageview_url landing_page,
# first website_pageview_id for relevant sessions
row_number() over(
partition by website_session_id
order by website_pageview_id ) as pageview_seq
from website_sessions w
left join website_pageviews p
on p.website_session_id = w.website_session_id
where utm_source = 'gsearch'
and utm_campaign = 'nonbrand'
# when the new page lander launched
and w.created_at between '2012-06-19' and '2012-07-28'
)
, b as (
select
website_session_id,
# landing page of each session
landing_page,
case when max(pageview_seq) = 1 then 1 else 0 end as bounce
from a
group by 1
)
, c as(
# summarizing total sessions and bounced sessions
select landing_page,
count(distinct website_session_id) total_sessions,
sum(bounce) bounce_sessions,
concat(sum(bounce)/count(distinct website_session_id)*100,'%')
from b
group by 1
)
select *
from c
where landing_page in ('/lander-1','/home')
;
# 2. Temporary Table
-- 1) find out when the new page lander launched
select min(created_at), min(website_pageview_id) from website_pageviews where pageview_url = '/lander-1' and created_at is not null ;
CREATE temporary table first_test -- pageviews
SELECT w.website_session_id,
min(website_pageview_id) as min_pv_id
-- 2) finding the first website_pageview_id for relevant sessions
FROM website_pageviews p
inner join website_sessions w
on w.website_session_id = p.website_session_id
and w.created_at < '2012-07-28' and p.website_pageview_id > 23504 -- after 1) min(website_pageview_id) of lander-1
and utm_source = 'gsearch' and utm_campaign = 'nonbrand'
group by 1;
create temporary table nonbrand_test_landing_page
select f.website_session_id,
p.pageview_url as landing_page
from first_test f
left join website_pageviews p on p.website_pageview_id = f.min_pv_id
-- 3) identifying the landing page of each session
where p.pageview_url in ('/lander-1','/home');
create temporary table nonbrand_test_bounced
select l.website_session_id,
l.landing_page,
count(p.website_pageview_id) as cnt_viewed
-- 4) counting pageviews for each session to identify bounces
from nonbrand_test_landing_page l left join website_pageviews p on p.website_session_id = l.website_session_id
group by 1,2
having count(p.website_pageview_id) = 1 ; -- count(website_pageview_id) =1 :: BOUNCED
-- 5) summarizing total sessions and bounced sessions by LP
select l.landing_page,
COUNT(DISTINCT l.website_session_id) as sessions,
COUNT(DISTINCT b.website_session_id) as bounced_sessions,
COUNT(DISTINCT b.website_session_id) / COUNT(DISTINCT l.website_session_id) as bounce_rate
FROM nonbrand_test_landing_page l
left join nonbrand_test_bounced b on l.website_session_id = b.website_session_id
group by 1;
landing_page | total_sessions | bounce_sessions | bounce_rate |
---|---|---|---|
/home | 2300 | 1338 | 58.1739% |
/lander-1 | 2268 | 1206 | 53.1746% |
▶ 동일 기간 비교, 새로운 랜딩페이지인 lander-1이 기존 랜딩페이지 home 보다 bounce rate이 더 낮다.
5. LANDING PAGE TREND** Pageview별 Bounce Rate 트렌드 파악
조건 : Bounce rate trended weekly (paid search nonbrand traffic landing on /home and /lander-1)
STEPS
- 1) finding the first website_pageview_id for relevant sessions
- 2) identifying the landing page of each session >> home or lander-1
- 3) counting pageviews for each session & identify bounces
- 4) summarizing by week :: bounce rate, sessions to each lander
# CTE
with landing_page as(
select w.website_session_id,
w.created_at, -- session created_at
p.website_pageview_id,
p.pageview_url,
case when row_number() over (partition by w.website_session_id order by w.website_session_id, p.website_pageview_id) = 1 then 1 else 0 end as landing
from website_sessions w
left join website_pageviews p on w.website_session_id = p.website_session_id
where utm_campaign = 'nonbrand' and w.created_at < '2012-08-31'
) , bounce as(
select website_session_id,
case when count(distinct website_pageview_id) = 1 then 1 else 0 end bounced -- count of pageviewid = 1 : BOUNCED
from landing_page
group by website_session_id
)
select min(date(created_at)) as week_start_date,
concat(sum(bounced)/ count(distinct l.website_session_id)*100,'%') bounce_rate,
sum(case when pageview_url='/home' then 1 else 0 end) home_sessions,
sum(case when pageview_url='/lander-1' then 1 else 0 end) lander_sessions
from landing_page l left join bounce b on b.website_session_id = l.website_session_id
group by yearweek(created_at);
2012-03-19 | 62.9586% | 845 | 0 |
---|---|---|---|
2012-03-25 | 58.8421% | 950 | 0 |
2012-04-01 | 63.6522% | 1150 | 0 |
2012-06-10 | 61.4943% | 870 | 0 |
2012-06-17 | 55.9763% | 515 | 330 |
2012-06-24 | 58.5106% | 367 | 385 |
2012-07-15 | 54.0793% | 431 | 427 |
2012-07-22 | 50.8906% | 396 | 390 |
2012-07-29 | 49.7570% | 57 | 972 |
2012-08-05 | 54.7729% | 0 | 1079 |
2012-08-12 | 50.8595% | 0 | 989 |
2012-08-19 | 51.4286% | 0 | 1330 |
2012-08-26 | 55.1655% | 0 | 997 |
▶ new custom landing page인 lander-1으로 전환되어 Lower Bounce rate 보인다. (improved)
6. CONVERSION FUNNEL : 고객의 상품 구매 여정을 개선/최적화
Common path/flow, large abondonment point, conversion rate at each step
Define Paths : Bring sessions in the relavant pageviews and flag each session as having made it to certain funnel
STEPS
- 1) select all pageviews for relevant sessions
- 2) identify each pageview as the specific funnel step
- 3) create the session-level conversion funnel view
- 4) aggregate the data to assess funnel performance
# 1. CTE
# WITH + total sessions & sessions by pageview_url_ + CASE >Pivot
with total as(
select p.pageview_url, w.website_session_id
from website_sessions w inner join website_pageviews p on w.website_session_id=p.website_session_id
where w.created_at > '2012-08-05' and w.created_at < '2012-09-05'
and w.utm_source ='gsearch' and w.utm_campaign='nonbrand'
and pageview_url in ('/lander-1','/products','/the-original-mr-fuzzy','/cart','/shipping','/billing','/thank-you-for-your-order')
-- define the funnel paths
group by 1, 2
)
, page as (
select count(distinct website_session_id) sessions,
/**analyzing how many customers make it to each step**/
sum(case when pageview_url = '/products' then 1 else 0 end) as to_products,
sum(case when pageview_url = '/the-original-mr-fuzzy' then 1 else 0 end) as to_mrfuzzy,
sum(case when pageview_url = '/cart' then 1 else 0 end) as to_cart,
sum(case when pageview_url = '/shipping' then 1 else 0 end) as to_shipping,
sum(case when pageview_url = '/billing' then 1 else 0 end) as to_billing,
sum(case when pageview_url = '/thank-you-for-your-order' then 1 else 0 end) as to_thankyou
from total
) select sessions,
to_products/sessions *100 as lander_clickthrough_rate,
to_mrfuzzy/to_products *100 as products_clickthrough_rate,
to_cart/to_mrfuzzy *100 as mrfuzzy_clickthrough_rate,
to_shipping/to_cart *100 as cart_clickthrough_rate,
to_billing/to_shipping *100 as shipping_clickthrough_rate,
to_thankyou/to_billing *100 as billing_clickthrough_rate
from page;
# 2. Sub-Query
# Create Funnel Table
create temporary table session_level_funnel -- 세션별 funnel 테이블 생성
select website_session_id,
max(products) to_products,
max(mrfuzzy) to_mrfuzzy,
max(cart) to_cart,
max(shipping) to_shipping,
max(billing) to_billing,
max(thankyou) to_thankyou
from (
select w.website_session_id, p.pageview_url, p.created_at as pageview_created_at, -- created at of each pageview session
case when pageview_url = '/products' then 1 else 0 end as products,
case when pageview_url = '/the-original-mr-fuzzy' then 1 else 0 end as mrfuzzy,
case when pageview_url = '/cart' then 1 else 0 end as cart,
case when pageview_url = '/shipping' then 1 else 0 end as shipping,
case when pageview_url = '/billing' then 1 else 0 end as billing,
case when pageview_url = '/thank-you-for-your-order' then 1 else 0 end as thankyou
from website_sessions w inner join website_pageviews p on w.website_session_id=p.website_session_id
where w.created_at > '2012-08-05' and w.created_at < '2012-09-05'
and w.utm_source ='gsearch' and w.utm_campaign='nonbrand'
and pageview_url in ('/lander-1','/products','/the-original-mr-fuzzy','/cart','/shipping','/billing','/thank-you-for-your-order')
order by 1 , 2 -- GROUP BY session_id
) as pageview_level
group by website_session_id ; -- aggregate by website session id
select count(distinct website_session_id) sessions,
count(distinct case when to_products =1 then website_session_id else null end)
/count(distinct website_session_id) *100 as lander_clickthrough_rate,
count(distinct case when to_mrfuzzy =1 then website_session_id else null end)
/count(distinct case when to_products =1 then website_session_id else null end) *100 as product_clickthrough_rate,
count(distinct case when to_cart =1 then website_session_id else null end)
/count(distinct website_session_id) *100 as mrfuzzy_clickthrough_rate,
count(distinct case when to_shipping =1 then website_session_id else null end)
/count(distinct case when to_cart =1 then website_session_id else null end)*100 as cart_clickthrough_rate,
count(distinct case when to_billing =1 then website_session_id else null end)
/count(distinct case when to_shipping =1 then website_session_id else null end) *100 as shipping_clickthrough_rate,
count(distinct case when to_thankyou =1 then website_session_id else null end)
/count(distinct case when to_billing =1 then website_session_id else null end) *100 as billing_clickthrough_rate
from session_level_funnel;
sessions | lander_clickthrough_rate | products_clickthrough_rate | mrfuzzy_clickthrough_rate | cart_clickthrough_rate | shipping_clickthrough_rate | billing_clickthrough_rate |
---|---|---|---|---|---|---|
4420 | 47.0814 | 74.1470 | 43.5515 | 66.2202 | 79.3258 | 43.3428 |
▶ 페이지 뷰 단계별로 click through rate(클릭전환율)산출시, 50% 미만으로 개선이 필요하다.
7. CONVERSION FUNNEL TEST
# CTE
# WITH sessions by pageview_url_ + CASE >Pivot
with billing as(
select
p.website_session_id,
/** aggregate by website_session_id **/
sum(case when pageview_url ='/billing' then 1 else 0 end) as billing,
sum(case when pageview_url ='/billing-2' then 1 else 0 end)as billing2,
sum(case when order_id is not null then 1 else 0 end) as orders
-- TO FILTER OUT order_id null이면 실제 주문으로 이어지지 않은 것
from website_pageviews p
left join orders o on p.website_session_id=o.website_session_id
where p.created_at < '2012-11-10'
and p.website_pageview_id >= 53550
and pageview_url in ('/billing','/billing-2')
group by website_session_id
)
select
/** Pivot **/
case when billing = 1 then '/billing' else '/billing-2' end as billing_version_seen,
count(distinct website_session_id) sessions,
count(distinct case when orders =1 then website_session_id else null end) orders,
count(distinct case when orders =1 then website_session_id else null end)
/ count(distinct website_session_id) *100 as billing_to_order_rt
from billing
group by 1 ;
# 2. Sub-Query
select
billing_version_seen,
count(distinct website_session_id) as sessions,
count(distinct order_id) as orders,
count(distinct order_id)/count(distinct website_session_id) * 100 as billing_to_order_rt
from
(
select
p.website_session_id,
p.pageview_url as billing_version_seen,
o.order_id
from website_pageviews p
/** JOIN ORDER TABLE :: Order table내 order_id 기준 filter out **/
left join orders o on p.website_session_id=o.website_session_id
where p.created_at < '2012-11-10' -- analysis date
and website_pageview_id >= 53550
and pageview_url in ('/billing','/billing-2')
) billing
group by 1
billing_version_seen | sessions | orders | CONVERSION RATE |
---|---|---|---|
/billing | 650 | 299 | 46.0000 |
/billing-2 | 645 | 404 | 62.6357 |
▶ Improvement on Billing page : 신규 랜딩페이지에illing-2에서 더 높은 전환율을 보인다.
'Data Science > Product analytics' 카테고리의 다른 글
[Paper Review] 링크드인 사례 - Detecting interference: An A/B test of A/B tests (0) | 2021.08.11 |
---|---|
[데이터분석 실무] 로그 분석 (0) | 2021.07.17 |
[데이터 분석 실무] 고객 리텐션 분석 (Spark SQL) (0) | 2021.04.02 |
[ 데이터 분석 실무 ] A/B Testing (0) | 2021.03.26 |
[데이터 분석 실무] Cohort Analysis, 코호트 분석 Python (0) | 2021.03.20 |