This blog post will give a simple introduction to using a recursive SQL statement to generate a temporary result set which can be used as the basis for a subsequent query. Recursive SQLs use Common Table Expressions (CTE).
Suppose you have been asked to create a report of total sales per day for a given month, including those days where there were no sales. At first sight this might seem like an easy problem to solve, simply summarise the sales data by day for the required month and you’re done. However, this doesn’t satisfy the requirement to include those days when there were no sales. One solution is to use a recursive SQL to “construct” a result set with one row for each day of the month then LEFT JOIN to the sales summary data.
Create Some Test Data
Before we start, let’s create a table with some dummy sales data that we can use in our example. As date columns, especially in older database designs, are very often defined as DECIMAL(8, 0) instead of real DATE-type fields, I’ve also made the dates as decimal fields in the sample data.
declare global temporary table SALES_DATA as ( select ORDER_NUMBER, dec(ORDER_DATE, 8, 0) as ORDER_DATE, ORDER_QUANTITY from( values(1, 20160701, 4), (2, 20160701, 400), (3, 20160701, 1340), (4, 20160702, 654), (5, 20160703, 865), (6, 20160704, 632), (7, 20160704, 976), (8, 20160705, 345), (9, 20160707, 9767), (10, 20160707, 67) ) SALES_DATA(ORDER_NUMBER, ORDER_DATE, ORDER_QUANTITY) ) with data with replace
The result of this is a table SALES_DATA in QTEMP with the following rows.
Creating a Summary of Sales Data
Our first attempt to solve the problem would most likely be to summarise the sales data for the required month by date as follows.
select ORDER_DATE, sum(ORDER_QUANTITY) as ORDER_QUANTITY, count(*) as NUM_ORDERS from SALES_DATA where int(ORDER_DATE / 100) = 201607 group by ORDER_DATE
However, as we’ve already mentioned, this doesn’t satisfy the original requirement to include all dates for the month even if there are no sales.
Creating a Calendar for the Month
One solution is to use a recursive SQL to generate a temporary result set with one row for every day of the month, which we can then use as the basis for our query. In the example below I have hard-coded the selected month (‘2016-07’) but this would normally be passed as a parameter.
with CALENDAR_MONTH ( CALENDAR_DATE ) as ( /* Priming select. */ select date(CALENDAR_DATE) from(values('2016-07' concat '-01')) x(CALENDAR_DATE) union all select CALENDAR_DATE + 1 day from CALENDAR_MONTH where month(CALENDAR_DATE + 1 day) = month(CALENDAR_DATE) ) select * from CALENDAR_MONTH
This SQL works by taking the result of the first “priming” SELECT inside the CTE as the initial row in the result. By combining the selected month ‘2016-07’ with ‘-01’ we create a new DATE-type field for the first day of the month. The second SELECT then selects from the CTE result itself, it finds the first row and adds one day to it. This new row is included in the result, which now contains the priming row and the new row. The new row is then selected and one day added to that and so process continues recursively with the second SELECT always reading the last row added to the result set until adding one day starts a new month, at which point the recursion stops and the result is complete. The final result is a calendar of days for the month specified:
Combining the Sales Summary Data
Now we have a complete list of the dates in the month we can join this to our sales data summary to produce the final report. I prefer to use a second CTE instead of a sub-query to summarise the sales data, which is then used in the final selection. As the date column in our test data is numeric I’ve included the conversion from the DATE-type column in the CALENDAR data to DECIMAL(8, 0) in the JOIN to the sales summary. I’ve also used COALESCE to replace the NULL values that occur when there are no sales with zeros.
with CALENDAR_MONTH ( CALENDAR_DATE ) as ( select date(CALENDAR_DATE) from(values(&MONTH concat '-01')) x(CALENDAR_DATE) union all select CALENDAR_DATE + 1 day from CALENDAR_MONTH where month(CALENDAR_DATE + 1 day) = month(CALENDAR_DATE) ), SALES_SUMMARY as ( select ORDER_DATE, sum(ORDER_QUANTITY) as ORDER_QUANTITY, count(*) as NUM_ORDERS from SALES_DATA group by ORDER_DATE ) select CALENDAR_DATE, coalesce(ORDER_QUANTITY, 0) as ORDER_QUANTITY, coalesce(NUM_ORDERS, 0) as NUM_ORDERS from CALENDAR_MONTH left join SALES_SUMMARY on ORDER_DATE = dec(replace(char(CALENDAR_DATE, ISO), '-', ''), 8, 0)
This gives the final report below: