MS SQL Server サンプルデータの AdventureWorksDW2019 スキーマを使って T-SQL を覚えるメモ。
ファクトテーブルの dbo.FactInternetSales と関連ディメンションテーブルを結合して、累計購入価格を基準にユーザーを10個のグループに分けるデシル分析をする。
declare @startdate date = '2013-01-01'; declare @enddate date = '2013-12-31'; with t1 as ( SELECT s.[ProductKey], s.[CustomerKey], c.[CustomerAlternateKey], s.[OrderDate], s.[OrderDateKey], c.[Gender], c.[BirthDate], convert(int, format(c.[BirthDate], 'yyyyMMdd')) as i_birthdate, floor((s.[OrderDateKey] - convert(int, format(c.[BirthDate], 'yyyyMMdd'))) / 10000) as age, c.[DateFirstPurchase], c.[GeographyKey], g.[CountryRegionCode], d.[FullDateAlternateKey], convert(varchar(7), format(d.[FullDateAlternateKey], 'yyyy-MM')) as year_month, s.[SalesOrderNumber], s.[SalesOrderLineNumber], s.[OrderQuantity] * s.[UnitPrice] as revenue, pc.[EnglishProductCategoryName] as category, psc.[EnglishProductSubcategoryName] as subcategory, p.[EnglishProductName] as productname FROM [dbo].[FactInternetSales] as s inner join [dbo].[DimDate] as d on s.[OrderDateKey] = d.[DateKey] inner join [dbo].[DimCustomer] as c on s.[CustomerKey] = c.[CustomerKey] inner join [dbo].[DimGeography] as g on c.[GeographyKey] = g.[GeographyKey] inner join [dbo].[DimProduct] as p on s.[ProductKey] = p.[ProductKey] left join [dbo].[DimProductSubcategory] as psc on p.[ProductSubcategoryKey] = psc.[ProductSubcategoryKey] left join [dbo].[DimProductCategory] as pc on psc.[ProductCategoryKey] = pc.[ProductCategoryKey] where s.[OrderDate] between @startdate and @enddate ), t2 as ( select t1.[CustomerAlternateKey], sum(t1.[revenue]) as revenue from t1 group by t1.[CustomerAlternateKey] ), t3 as ( select t2.[CustomerAlternateKey], t2.[revenue], ntile(10) over(order by t2.[revenue] desc) as decile from t2 ), t4 as ( select t3.[decile], sum(t3.[revenue]) as revenue, avg(t3.[revenue]) as avg_revenue, sum(sum(t3.[revenue])) over( order by decile) as stacked_revenue, sum(sum(t3.[revenue])) over() as total_revenue from t3 group by t3.[decile] ) select t4.[decile], t4.[revenue], t4.[avg_revenue], 100.0 * t4.[revenue] / t4.[total_revenue] as total_rate, 100.0 * t4.[stacked_revenue] / t4.[total_revenue] as stacked_rate from t4;
結果はこうなる。
PowerBI で表示するとこうなる。
参考図書
ビッグデータ分析・活用のためのSQLレシピ
SQL Server 2016の教科書 開発編