This case study focuses on using subscription style digital data to answer important business questions.
The Database and the entire case study is available at the following link: https://8weeksqlchallenge.com , in this page we will focus directly to answer the business questions.
THE DATATABASE
Just a quick overview of the database, we have 2 tables, connected by “plan_id”.
Let’s jump directly to answer the business questions
How many customers has Foodie-Fi ever had?
SELECT
COUNT(DISTINCT customer_id) AS total_customers
FROM subscriptions;
————————————————————
What is the monthly distribution of trial
plan start_date
values for our dataset ?
SELECT
YEAR(start_date) AS year_,
MONTH(start_date) AS m_,
MONTHNAME(start_date) AS month_,
COUNT(plan_name) AS trial_plans
FROM plans p
LEFT JOIN subscriptions s
ON p.plan_id = s.plan_id
WHERE plan_name = "trial"
GROUP BY 1,2,3
ORDER BY 1,2;
————————————————————
What plan start_date
values occur after the year 2020 for our dataset?
SELECT
p.plan_name,
COUNT(s.customer_id)
FROM plans p
LEFT JOIN subscriptions s
ON p.plan_id = s.plan_id
WHERE start_date > "2020-12-31"
GROUP BY 1;
————————————————————
What is the customer count and percentage of customers who have churned rounded to 1 decimal place?
SELECT
COUNT(DISTINCT customer_id) AS total_customers,
COUNT(CASE WHEN plan_id = 4 THEN 1 ELSE NULL END) AS churned_customers,
ROUND(COUNT(CASE WHEN plan_id = 4 THEN 1 ELSE NULL END) /
COUNT(DISTINCT customer_id) * 100,1) AS pct_churn
FROM subscriptions;
————————————————————
How many customers have churned straight after their initial free trial – what percentage is this rounded to the nearest whole number?
To answer this question I run before distinct queries and then combine all of them in a single query with a subquery, the idea was to generate a list with all customers after 1 week, and join this list with the subscriptions table by customer_id so that we can obtain their plan status after 1 week and then just calculate the total and the percentage.
It’s a subquery, so we’ll start to read from the bottom, I left comments with the order on how to read the query, from 1 to 3
SELECT -- 3 count the number of total customers, and the churned customers (plan_id = 4)
COUNT(customer_id) AS total_customers,
COUNT(CASE WHEN plan_id = 4 THEN 1 ELSE NULL END) AS church_after_trial,
CEILING(COUNT(CASE WHEN plan_id = 4 THEN 1 ELSE NULL END) / COUNT(customer_id) * 100) AS pct_of_church_after_trial
FROM
(
SELECT -- 2 we join the list(1) with the subscriptions table so that we can see their status
a.customer_id,
a.after_1_week,
s.plan_id
FROM
(
SELECT -- 1 generate a list with all customers after 1 week
customer_id AS customer_id,
DATE_ADD(MIN(start_date), INTERVAL 7 DAY) AS after_1_week
FROM subscriptions s
LEFT JOIN plans p
ON s.plan_id = s.plan_id
GROUP BY 1
ORDER BY 1
) AS a
LEFT JOIN subscriptions s
ON a.customer_id = s.customer_id AND
a.after_1_week = s.start_date) AS b;
————————————————————
What is the number and percentage of customer plans after their initial free trial?
We can use the same query used above to obtain all the plan id of the customers after the end of the trial period and use it in a subquery to calculate the percentage.
SELECT
COUNT(CASE WHEN plan_id = 1 THEN 1 ELSE NULL END) AS basic_monthly,
ROUND((COUNT(CASE WHEN plan_id = 1 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_basic_monthly,
COUNT(CASE WHEN plan_id = 2 THEN 1 ELSE NULL END) AS pro_monthly,
ROUND((COUNT(CASE WHEN plan_id = 2 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_pro_monthly,
COUNT(CASE WHEN plan_id = 3 THEN 1 ELSE NULL END) AS pro_annual,
ROUND((COUNT(CASE WHEN plan_id = 3 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_pro_annual,
COUNT(CASE WHEN plan_id = 4 THEN 1 ELSE NULL END) AS churn,
ROUND((COUNT(CASE WHEN plan_id = 4 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_churn
FROM
(
SELECT
a.customer_id,
a.after_1_week,
s.plan_id
FROM
(
SELECT
customer_id AS customer_id,
DATE_ADD(MIN(start_date), INTERVAL 7 DAY) AS after_1_week
FROM subscriptions s
LEFT JOIN plans p
ON s.plan_id = s.plan_id
GROUP BY 1
ORDER BY 1
) AS a
LEFT JOIN subscriptions s
ON a.customer_id = s.customer_id AND
a.after_1_week = s.start_date) AS b;
————————————————————
What is the customer count and percentage breakdown of all 5 plan_name
values at 2020-12-31
?
The process it’s very similar to the previous one, we generate a list of all customers after their trial period and before the end of 2020-12-31 with the MAX date (so the last update of their subscription status), then we join the result with the subscriptions table to have their status and finally we calculate the number of customers and percentage.
SELECT
COUNT(CASE WHEN plan_id = 0 THEN 1 ELSE NULL END) AS trial,
ROUND((COUNT(CASE WHEN plan_id = 0 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_trial,
COUNT(CASE WHEN plan_id = 1 THEN 1 ELSE NULL END) AS basic_monthly,
ROUND((COUNT(CASE WHEN plan_id = 1 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_basic_monthly,
COUNT(CASE WHEN plan_id = 2 THEN 1 ELSE NULL END) AS pro_monthly,
ROUND((COUNT(CASE WHEN plan_id = 2 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_pro_monthly,
COUNT(CASE WHEN plan_id = 3 THEN 1 ELSE NULL END) AS pro_annual,
ROUND((COUNT(CASE WHEN plan_id = 3 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_pro_annual,
COUNT(CASE WHEN plan_id = 4 THEN 1 ELSE NULL END) AS churn,
ROUND((COUNT(CASE WHEN plan_id = 4 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_churn
FROM(
SELECT -- 2 we join the list(1) with the subscriptions table so that we can see their status
a.customer_id,
a.after_1_week,
s.plan_id
FROM (
SELECT -- 1 generate a list with all customers after 1 week and before the 2020-12-31
customer_id AS customer_id,
MAX(start_date) AS after_1_week
FROM subscriptions s
LEFT JOIN plans p
ON s.plan_id = s.plan_id
WHERE start_date < "2020-12-31"
GROUP BY 1
ORDER BY 1)
AS a
LEFT JOIN subscriptions s
ON a.customer_id = s.customer_id AND
a.after_1_week = s.start_date) AS b;
How many customers have upgraded to an annual plan in 2020?
SELECT
COUNT(plan_id) AS total_pro_annual_2020
FROM subscriptions
WHERE plan_id = 3
AND start_date <= "2020-12-31";
————————————————————
How many days on average does it take for a customer to an annual plan from the day they join Foodie-Fi?
I solved this one in 2 ways, one was to create 2 temporary tables, one with the only trial customers and the other one with the pro customers, then subtract the dates and calculate the average.
The other solution was using CTE, I’ll show you the CTE version, we’ll use the temporary tables for the next questions.
WITH trial_users AS (
SELECT
s.customer_id,
s.plan_id,
s.start_date,
p.plan_name
FROM subscriptions s
LEFT JOIN plans p
ON s.plan_id = p.plan_id
WHERE plan_name = "trial"
ORDER BY 1),
pro_users AS (
SELECT
customer_id,
plan_id,
start_date,
plan_name
FROM (
SELECT
s.customer_id,
s.plan_id,
s.start_date,
p.plan_name
FROM subscriptions s
LEFT JOIN plans p
ON s.plan_id = p.plan_id
ORDER BY 1) AS a
WHERE plan_name = "pro annual"
)
SELECT
CEILING(AVG(days_to_became_pro)) AS average_days_to_pro
FROM (
SELECT
pu.customer_id,
DATEDIFF(pu.start_date, tu.start_date) AS days_to_became_pro
FROM pro_users pu
LEFT JOIN trial_users tu
ON pu.customer_id = tu.customer_id) AS f;
————————————————————
Can you further breakdown this average value into 30 day periods (i.e. 0-30 days, 31-60 days etc)
I created 2 temporary tables as anticipated in the previous question, one with the PRO customers and the other one with the TRIAL customers, then join the 2 table and insert them in an inner query and calculate the average for each single period
-- Create Temporary Table TRIAL customers
CREATE TEMPORARY TABLE trial_customers
SELECT
customer_id,
plan_id,
start_date,
plan_name
FROM (
SELECT
s.customer_id,
s.plan_id,
s.start_date,
p.plan_name
FROM subscriptions s
LEFT JOIN plans p
ON s.plan_id = p.plan_id
ORDER BY 1) AS a
WHERE plan_name = "trial";
-- Create Temporary Table PRO Customers
CREATE TEMPORARY TABLE pro_annual_customers
SELECT
customer_id,
plan_id,
start_date,
plan_name
FROM (
SELECT
s.customer_id,
s.plan_id,
s.start_date,
p.plan_name
FROM subscriptions s
LEFT JOIN plans p
ON s.plan_id = p.plan_id
ORDER BY 1) AS a
WHERE plan_name = "pro annual";
-- Join the 2 temporary table, insert them in a inner query and calculate the average for each period
SELECT
CEILING(AVG(period_0_30)) AS avg_0_30,
CEILING(AVG(period_31_60)) AS avg_31_60,
CEILING(AVG(period_61_120)) avg_61_120,
CEILING(AVG(period_more_than_120)) AS avg_over_120
FROM (
SELECT
CASE WHEN DATEDIFF(pro.start_date, trial.start_date) <= 30 THEN DATEDIFF(pro.start_date, trial.start_date) ELSE NULL END AS period_0_30,
CASE WHEN DATEDIFF(pro.start_date, trial.start_date) > 30 AND DATEDIFF(pro.start_date, trial.start_date) <= 60 THEN DATEDIFF(pro.start_date, trial.start_date) ELSE NULL END AS period_31_60,
CASE WHEN DATEDIFF(pro.start_date, trial.start_date) > 60 AND DATEDIFF(pro.start_date, trial.start_date) <= 120 THEN DATEDIFF(pro.start_date, trial.start_date) ELSE NULL END AS period_61_120,
CASE WHEN DATEDIFF(pro.start_date, trial.start_date) > 120 THEN DATEDIFF(pro.start_date, trial.start_date) ELSE NULL END AS period_more_than_120
FROM pro_annual_customers pro
LEFT JOIN trial_customers trial
ON pro.customer_id = trial.customer_id) AS a;
————————————————————
How many customers downgraded from a pro monthly to a basic monthly plan in 2020?
I created 2 new different temporary tables, one with the PRO monthly plans, and the other one with the BASIC monthly plans where the date is greater than in the previous table.
-- create temporary table PRO MONTHLY PLANS
CREATE TEMPORARY TABLE pro_monthly
SELECT
customer_id,
plan_id,
start_date,
plan_name
FROM (
SELECT
s.customer_id,
s.plan_id,
s.start_date,
p.plan_name
FROM subscriptions s
LEFT JOIN plans p
ON s.plan_id = p.plan_id
ORDER BY 1) AS a
WHERE plan_name = "pro monthly";
-- create temporary table BASIC MONTHLY PLANS
SELECT
customer_id,
plan_id,
start_date,
plan_name
FROM (
SELECT
s.customer_id,
s.plan_id,
s.start_date,
p.plan_name
FROM subscriptions s
LEFT JOIN plans p
ON s.plan_id = p.plan_id
ORDER BY 1) AS b
WHERE plan_name = "basic monthly";
-- combine the 2 tables and calculate
SELECT
COUNT(pro.customer_id) AS downgraded_customers
FROM pro_monthly pro
INNER JOIN basic_monthly basic
ON pro.customer_id = basic.customer_id
WHERE basic.start_date > pro.start_date;
Thanks for your time.
Giulio Pulino.
Back to portfolio