T-SQL で顧客データをデシル分析

 
MS SQL Server サンプルデータの AdventureWorksDW2019 スキーマを使って T-SQL を覚えるメモ。

ファクトテーブルの dbo.FactInternetSales と関連ディメンションテーブルを結合して、累計購入価格を基準にユーザーを10個のグループに分けるデシル分析をする。

declare @startdate date = '2013-01-01';
declare @enddate date = '2013-12-31';
with t1 as (
    SELECT
        s.[ProductKey],
        s.[CustomerKey],
        c.[CustomerAlternateKey],
        s.[OrderDate],
        s.[OrderDateKey],
        c.[Gender],
        c.[BirthDate],
        convert(int, format(c.[BirthDate], 'yyyyMMdd')) as i_birthdate,
        floor((s.[OrderDateKey] - convert(int, format(c.[BirthDate], 'yyyyMMdd'))) / 10000) as age,
        c.[DateFirstPurchase],
        c.[GeographyKey],
        g.[CountryRegionCode],
        d.[FullDateAlternateKey],
        convert(varchar(7), format(d.[FullDateAlternateKey], 'yyyy-MM')) as year_month,
        s.[SalesOrderNumber],
        s.[SalesOrderLineNumber],
        s.[OrderQuantity] * s.[UnitPrice] as revenue,
        pc.[EnglishProductCategoryName] as category,
        psc.[EnglishProductSubcategoryName] as subcategory,
        p.[EnglishProductName] as productname
    FROM
        [dbo].[FactInternetSales] as s
        inner join [dbo].[DimDate] as d on s.[OrderDateKey] = d.[DateKey]
        inner join [dbo].[DimCustomer] as c on s.[CustomerKey] = c.[CustomerKey]
        inner join [dbo].[DimGeography] as g on c.[GeographyKey] = g.[GeographyKey]
        inner join [dbo].[DimProduct] as p on s.[ProductKey] = p.[ProductKey]
        left join [dbo].[DimProductSubcategory] as psc on p.[ProductSubcategoryKey] = psc.[ProductSubcategoryKey]
        left join [dbo].[DimProductCategory] as pc on psc.[ProductCategoryKey] = pc.[ProductCategoryKey]
    where
        s.[OrderDate] between @startdate and @enddate
),
t2 as (
    select
        t1.[CustomerAlternateKey],
        sum(t1.[revenue]) as revenue
    from
        t1
    group by
        t1.[CustomerAlternateKey]
),
t3 as (
    select
        t2.[CustomerAlternateKey],
        t2.[revenue],
        ntile(10) over(order by t2.[revenue] desc) as decile
    from
        t2
),
t4 as (
    select
        t3.[decile],
        sum(t3.[revenue]) as revenue,
        avg(t3.[revenue]) as avg_revenue,
        sum(sum(t3.[revenue])) over( order by decile) as stacked_revenue,
        sum(sum(t3.[revenue])) over() as total_revenue
    from
        t3
    group by
        t3.[decile]
)
select
    t4.[decile],
    t4.[revenue],
    t4.[avg_revenue],
    100.0 * t4.[revenue] / t4.[total_revenue] as total_rate,
    100.0 * t4.[stacked_revenue] / t4.[total_revenue] as stacked_rate
from
    t4;

 
結果はこうなる。

T-SQL で顧客データをデシル分析

 
PowerBI で表示するとこうなる。

Power BI でデシル分析を可視化

 

参考図書

ビッグデータ分析・活用のためのSQLレシピ
SQL Server 2016の教科書 開発編