Case Study FOODIE – FI

This case study focuses on using subscription style digital data to answer important business questions.

The Database and the entire case study is available at the following link: https://8weeksqlchallenge.com , in this page we will focus directly to answer the business questions.

THE DATATABASE

Just a quick overview of the database, we have 2 tables, connected by “plan_id”.

Let’s jump directly to answer the business questions

How many customers has Foodie-Fi ever had?

SELECT 
	COUNT(DISTINCT customer_id) AS total_customers
FROM subscriptions;
Answer: There are 1000 unique customers.

————————————————————

What is the monthly distribution of trial plan start_date values for our dataset ?

SELECT
	YEAR(start_date) AS year_,
    MONTH(start_date) AS m_,
	MONTHNAME(start_date) AS month_,
	COUNT(plan_name) AS trial_plans
FROM plans p
	LEFT JOIN subscriptions s
		ON p.plan_id = s.plan_id
WHERE plan_name = "trial"
GROUP BY 1,2,3
ORDER BY 1,2;

————————————————————

What plan start_date values occur after the year 2020 for our dataset? 

SELECT
		p.plan_name,
        COUNT(s.customer_id)
	FROM plans p
		LEFT JOIN subscriptions s
			ON p.plan_id = s.plan_id
	WHERE start_date > "2020-12-31"
    GROUP BY 1;

————————————————————

What is the customer count and percentage of customers who have churned rounded to 1 decimal place?

SELECT
	COUNT(DISTINCT customer_id) AS total_customers,
    COUNT(CASE WHEN plan_id = 4 THEN 1 ELSE NULL END) AS churned_customers,
    ROUND(COUNT(CASE WHEN plan_id = 4 THEN 1 ELSE NULL END) /
		COUNT(DISTINCT customer_id) * 100,1) AS pct_churn
FROM subscriptions;

————————————————————

How many customers have churned straight after their initial free trial – what percentage is this rounded to the nearest whole number?

To answer this question I run before distinct queries and then combine all of them in a single query with a subquery, the idea was to generate a list with all customers after 1 week, and join this list with the subscriptions table by customer_id so that we can obtain their plan status after 1 week and then just calculate the total and the percentage.

It’s a subquery, so we’ll start to read from the bottom, I left comments with the order on how to read the query, from 1 to 3

SELECT -- 3 count the number of total customers, and the churned customers (plan_id = 4)
	COUNT(customer_id) AS total_customers,
    COUNT(CASE WHEN plan_id = 4 THEN 1 ELSE NULL END) AS church_after_trial,
    CEILING(COUNT(CASE WHEN plan_id = 4 THEN 1 ELSE NULL END) / COUNT(customer_id) * 100) AS pct_of_church_after_trial
FROM 
(
SELECT -- 2 we join the list(1) with the subscriptions table so that we can see their status
	a.customer_id,
    a.after_1_week,
    s.plan_id
FROM 
(
SELECT -- 1 generate a list with all customers after 1 week 
	customer_id AS customer_id,
	DATE_ADD(MIN(start_date), INTERVAL 7 DAY) AS after_1_week
FROM subscriptions s
	LEFT JOIN plans p
		ON s.plan_id = s.plan_id
GROUP BY 1
ORDER BY 1
) AS a
	LEFT JOIN subscriptions s
		ON a.customer_id = s.customer_id AND
			a.after_1_week = s.start_date) AS b;

————————————————————

What is the number and percentage of customer plans after their initial free trial?

We can use the same query used above to obtain all the plan id of the customers after the end of the trial period and use it in a subquery to calculate the percentage.

SELECT
	COUNT(CASE WHEN plan_id = 1 THEN 1 ELSE NULL END) AS basic_monthly,
    ROUND((COUNT(CASE WHEN plan_id = 1 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_basic_monthly,
	COUNT(CASE WHEN plan_id = 2 THEN 1 ELSE NULL END) AS pro_monthly,
    ROUND((COUNT(CASE WHEN plan_id = 2 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_pro_monthly,
	COUNT(CASE WHEN plan_id = 3 THEN 1 ELSE NULL END) AS pro_annual,
    ROUND((COUNT(CASE WHEN plan_id = 3 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_pro_annual,
	COUNT(CASE WHEN plan_id = 4 THEN 1 ELSE NULL END) AS churn,
    ROUND((COUNT(CASE WHEN plan_id = 4 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_churn
FROM
(
SELECT
	a.customer_id,
    a.after_1_week,
    s.plan_id
FROM 
(
SELECT 
	customer_id AS customer_id,
	DATE_ADD(MIN(start_date), INTERVAL 7 DAY) AS after_1_week
FROM subscriptions s
	LEFT JOIN plans p
		ON s.plan_id = s.plan_id
GROUP BY 1
ORDER BY 1
) AS a
	LEFT JOIN subscriptions s
		ON a.customer_id = s.customer_id AND
			a.after_1_week = s.start_date) AS b; 

————————————————————

What is the customer count and percentage breakdown of all 5 plan_name values at 2020-12-31?

The process it’s very similar to the previous one, we generate a list of all customers after their trial period and before the end of 2020-12-31 with the MAX date (so the last update of their subscription status), then we join the result with the subscriptions table to have their status and finally we calculate the number of customers and percentage.

SELECT
	COUNT(CASE WHEN plan_id = 0 THEN 1 ELSE NULL END) AS trial,
    ROUND((COUNT(CASE WHEN plan_id = 0 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_trial,
    COUNT(CASE WHEN plan_id = 1 THEN 1 ELSE NULL END) AS basic_monthly,
    ROUND((COUNT(CASE WHEN plan_id = 1 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_basic_monthly,
    COUNT(CASE WHEN plan_id = 2 THEN 1 ELSE NULL END) AS pro_monthly,
    ROUND((COUNT(CASE WHEN plan_id = 2 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_pro_monthly,
    COUNT(CASE WHEN plan_id = 3 THEN 1 ELSE NULL END) AS pro_annual,
    ROUND((COUNT(CASE WHEN plan_id = 3 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_pro_annual,
    COUNT(CASE WHEN plan_id = 4 THEN 1 ELSE NULL END) AS churn,
    ROUND((COUNT(CASE WHEN plan_id = 4 THEN 1 ELSE NULL END) / COUNT(customer_id)) * 100,2) AS pct_churn
FROM(

SELECT -- 2 we join the list(1) with the subscriptions table so that we can see their status
    a.customer_id,
    a.after_1_week,
    s.plan_id
FROM (
SELECT -- 1 generate a list with all customers after 1 week and before the 2020-12-31
	customer_id AS customer_id,
    MAX(start_date) AS after_1_week
FROM subscriptions s
    LEFT JOIN plans p
        ON s.plan_id = s.plan_id
WHERE start_date < "2020-12-31"
GROUP BY 1
ORDER BY 1)
AS a
    LEFT JOIN subscriptions s
        ON a.customer_id = s.customer_id AND
            a.after_1_week = s.start_date) AS b;

How many customers have upgraded to an annual plan in 2020?

SELECT
	COUNT(plan_id) AS total_pro_annual_2020
FROM subscriptions
WHERE plan_id = 3
	AND start_date <= "2020-12-31";

————————————————————

How many days on average does it take for a customer to an annual plan from the day they join Foodie-Fi?

I solved this one in 2 ways, one was to create 2 temporary tables, one with the only trial customers and the other one with the pro customers, then subtract the dates and calculate the average.

The other solution was using CTE, I’ll show you the CTE version, we’ll use the temporary tables for the next questions.

WITH trial_users AS (
SELECT 
	s.customer_id,
    s.plan_id,
    s.start_date,
    p.plan_name
FROM subscriptions s
	LEFT JOIN plans p
		ON s.plan_id = p.plan_id
WHERE plan_name = "trial"
ORDER BY 1),
pro_users AS (
SELECT
	customer_id,
    plan_id,
    start_date,
    plan_name
FROM (
SELECT 
	s.customer_id,
    s.plan_id,
    s.start_date,
    p.plan_name
FROM subscriptions s
	LEFT JOIN plans p
		ON s.plan_id = p.plan_id
ORDER BY 1) AS a
WHERE plan_name = "pro annual"
)
SELECT
	CEILING(AVG(days_to_became_pro)) AS average_days_to_pro
FROM (
	SELECT 
		pu.customer_id,
        DATEDIFF(pu.start_date, tu.start_date) AS days_to_became_pro
	FROM pro_users pu
		LEFT JOIN trial_users tu
			ON pu.customer_id = tu.customer_id) AS f;

————————————————————

Can you further breakdown this average value into 30 day periods (i.e. 0-30 days, 31-60 days etc)

I created 2 temporary tables as anticipated in the previous question, one with the PRO customers and the other one with the TRIAL customers, then join the 2 table and insert them in an inner query and calculate the average for each single period

-- Create Temporary Table TRIAL customers
CREATE TEMPORARY TABLE trial_customers
SELECT
	customer_id,
    plan_id,
    start_date,
    plan_name
FROM (
SELECT 
	s.customer_id,
    s.plan_id,
    s.start_date,
    p.plan_name
FROM subscriptions s
	LEFT JOIN plans p
		ON s.plan_id = p.plan_id
ORDER BY 1) AS a
WHERE plan_name = "trial";


-- Create Temporary Table PRO Customers
CREATE TEMPORARY TABLE pro_annual_customers
SELECT
	customer_id,
    plan_id,
    start_date,
    plan_name
FROM (
SELECT 
	s.customer_id,
    s.plan_id,
    s.start_date,
    p.plan_name
FROM subscriptions s
	LEFT JOIN plans p
		ON s.plan_id = p.plan_id
ORDER BY 1) AS a
WHERE plan_name = "pro annual";

-- Join the 2 temporary table, insert them in a inner query and calculate the average for each period
SELECT
	CEILING(AVG(period_0_30)) AS avg_0_30,
    CEILING(AVG(period_31_60)) AS avg_31_60,
    CEILING(AVG(period_61_120)) avg_61_120,
    CEILING(AVG(period_more_than_120)) AS avg_over_120
FROM (
SELECT
	CASE WHEN DATEDIFF(pro.start_date, trial.start_date) <= 30 THEN DATEDIFF(pro.start_date, trial.start_date) ELSE NULL END AS period_0_30,
    CASE WHEN DATEDIFF(pro.start_date, trial.start_date) > 30 AND DATEDIFF(pro.start_date, trial.start_date) <= 60 THEN DATEDIFF(pro.start_date, trial.start_date) ELSE NULL END AS period_31_60,
    CASE WHEN DATEDIFF(pro.start_date, trial.start_date) > 60 AND DATEDIFF(pro.start_date, trial.start_date) <= 120 THEN DATEDIFF(pro.start_date, trial.start_date) ELSE NULL END AS period_61_120,
    CASE WHEN DATEDIFF(pro.start_date, trial.start_date) > 120 THEN DATEDIFF(pro.start_date, trial.start_date) ELSE NULL END AS period_more_than_120
FROM pro_annual_customers pro
	LEFT JOIN trial_customers trial
		ON pro.customer_id = trial.customer_id) AS a;

————————————————————

How many customers downgraded from a pro monthly to a basic monthly plan in 2020?

I created 2 new different temporary tables, one with the PRO monthly plans, and the other one with the BASIC monthly plans where the date is greater than in the previous table.

-- create temporary table PRO MONTHLY PLANS
CREATE TEMPORARY TABLE pro_monthly
SELECT
	customer_id,
    plan_id,
    start_date,
    plan_name
FROM (
SELECT 
	s.customer_id,
    s.plan_id,
    s.start_date,
    p.plan_name
FROM subscriptions s
	LEFT JOIN plans p
		ON s.plan_id = p.plan_id
ORDER BY 1) AS a
WHERE plan_name = "pro monthly";

-- create temporary table BASIC MONTHLY PLANS
SELECT 
	customer_id,
    plan_id,
    start_date,
    plan_name
FROM (
SELECT 
	s.customer_id,
    s.plan_id,
    s.start_date,
    p.plan_name
FROM subscriptions s
	LEFT JOIN plans p
		ON s.plan_id = p.plan_id
ORDER BY 1) AS b
WHERE plan_name = "basic monthly";


-- combine the 2 tables and calculate

SELECT
	COUNT(pro.customer_id) AS downgraded_customers
FROM pro_monthly pro
	INNER JOIN basic_monthly basic
		ON pro.customer_id = basic.customer_id
			WHERE basic.start_date > pro.start_date;

Thanks for your time.

Giulio Pulino.

Back to portfolio