Using a Recursive SQL to Generate a Temporary Result Set

Introduction

This blog post will give a simple introduction to using a recursive SQL statement to generate a temporary result set which can be used as the basis for a subsequent query. Recursive SQLs use Common Table Expressions (CTE).

Problem

Suppose you have been asked to create a report of total sales per day for a given month, including those days where there were no sales. At first sight this might seem like an easy problem to solve, simply summarise the sales data by day for the required month and you’re done. However, this doesn’t satisfy the requirement to include those days when there were no sales. One solution is to use a recursive SQL to “construct” a result set with one row for each day of the month then LEFT JOIN to the sales summary data.

Create Some Test Data

Before we start, let’s create a table with some dummy sales data that we can use in our example. As date columns, especially in older database designs, are very often defined as DECIMAL(8, 0) instead of real DATE-type fields, I’ve also made the dates as decimal fields in the sample data.

declare global temporary table SALES_DATA as (
    select ORDER_NUMBER,
           dec(ORDER_DATE, 8, 0) as ORDER_DATE,
           ORDER_QUANTITY
    from(
        values(1, 20160701, 4),
              (2, 20160701, 400),
              (3, 20160701, 1340),
              (4, 20160702, 654),
              (5, 20160703, 865),
              (6, 20160704, 632),
              (7, 20160704, 976),
              (8, 20160705, 345),
              (9, 20160707, 9767),
              (10, 20160707, 67)
    ) SALES_DATA(ORDER_NUMBER, ORDER_DATE, ORDER_QUANTITY)
) with data with replace

The result of this is a table SALES_DATA in QTEMP with the following rows.

Capture

Creating a Summary of Sales Data

Our first attempt to solve the problem would most likely be to summarise the sales data for the required month by date as follows.

select ORDER_DATE,
       sum(ORDER_QUANTITY) as ORDER_QUANTITY,
       count(*) as NUM_ORDERS

from SALES_DATA

where int(ORDER_DATE / 100) = 201607

group by ORDER_DATE

Capture

However, as we’ve already mentioned, this doesn’t satisfy the original requirement to include all dates for the month even if there are no sales.

Creating a Calendar for the Month

One solution is to use a recursive SQL to generate a temporary result set with one row for every day of the month, which we can then use as the basis for our query. In the example below I have hard-coded the selected month (‘2016-07’) but this would normally be passed as a parameter.

with CALENDAR_MONTH (
    CALENDAR_DATE
)
as (
    /* Priming select. */
    select date(CALENDAR_DATE)
    from(values('2016-07' concat '-01')) x(CALENDAR_DATE)

    union all

    select CALENDAR_DATE + 1 day
    from CALENDAR_MONTH
    where month(CALENDAR_DATE + 1 day) = month(CALENDAR_DATE)
)

select * from CALENDAR_MONTH

This SQL works by taking the result of the first “priming” SELECT inside the CTE as the initial row in the result. By combining the selected month ‘2016-07’ with ‘-01’ we create a new DATE-type field for the first day of the month. The second SELECT then selects from the CTE result itself, it finds the first row and adds one day to it. This new row is included in the result, which now contains the priming row and the new row. The new row is then selected and one day added to that and so process continues recursively with the second SELECT always reading the last row added to the result set until adding one day starts a new month, at which point the recursion stops and the result is complete. The final result is a calendar of days for the month specified:

Capture

Combining the Sales Summary Data

Now we have a complete list of the dates in the month we can join this to our sales data summary to produce the final report. I prefer to use a second CTE instead of a sub-query to summarise the sales data, which is then used in the final selection. As the date column in our test data is numeric I’ve included the conversion from the DATE-type column in the CALENDAR data to DECIMAL(8, 0) in the JOIN to the sales summary. I’ve also used COALESCE to replace the NULL values that occur when there are no sales with zeros.

with CALENDAR_MONTH (
    CALENDAR_DATE
)
as (
    select date(CALENDAR_DATE)
    from(values(&MONTH concat '-01')) x(CALENDAR_DATE)

    union all

    select CALENDAR_DATE + 1 day
    from CALENDAR_MONTH
    where month(CALENDAR_DATE + 1 day) = month(CALENDAR_DATE)
),

SALES_SUMMARY as (
    select ORDER_DATE,
           sum(ORDER_QUANTITY) as ORDER_QUANTITY,
           count(*) as NUM_ORDERS

    from SALES_DATA

    group by ORDER_DATE
)

select CALENDAR_DATE,
       coalesce(ORDER_QUANTITY, 0) as ORDER_QUANTITY,
       coalesce(NUM_ORDERS, 0) as NUM_ORDERS

from CALENDAR_MONTH

left join SALES_SUMMARY
    on ORDER_DATE = dec(replace(char(CALENDAR_DATE, ISO), '-', ''), 8, 0)

This gives the final report below:

Capture

Advertisements

Using SQL’s LIKE predicate against a list of search strings

A colleague asked me recently if it was possible to select a set of rows using the LIKE predicate against a potentially large list of search strings, without repeating the LIKE for each string. It turns out it is possible using the two methods below.

Common Table Expressions (CTEs)

Before we get stuck into the examples, a word on Common Table Expressions (using the WITH clause). I’m a big fan of using Common Table Expressions to try to make my SQLs more readable, and in the examples below I have used them to “lift” the subqueries out of the main SQL body, hopefully making everything a bit more readable. However, they are not necessary for the techniques to work and the contents of the CTEs could just as easily be embedded directly where the CTE is referenced or replaced by tables if the search criteria were table driven. At the end of this post are some examples that don’t use CTEs.

Test Data

First let’s create a simple test table with some data to experiment with. We’ll scan this table for countries that match a list of specified search strings.

/* Create some test data. */
 declare global temporary table COUNTRIES as (
     select *
     from(
         values('England'),
               ('Ireland'),
               ('Scotland'),
               ('Wales'),
               ('France'),
               ('Germany'),
               ('Italy'),
               ('Spain'),
               ('Austria'),
               ('Belgium'),
               ('Holland')
     ) x(COUNTRY)
 ) with data

 

Method 1 – Using JOIN with LIKE

As I mentioned, I use a Common Table Expression (CTE), named SEARCH_STRINGS, to generate the list of search strings but they could just as easily be embedded as a subquery in the join or come from another table. The CTE is JOINed to the table to be scanned using the LIKE predicate on the column to be scanned, as follows:

/* Scan for countries whose name is LIKE any of the search strings. */
with SEARCH_STRINGS as (
    select '%' concat SEARCH_STRING concat '%' as SEARCH_STRING 
    from(
        values('EN'), 
              ('LY'), 
              ('LAND'), 
              ('SOUTH')
    ) x(SEARCH_STRING)
)

select * 
from COUNTRIES 
join SEARCH_STRINGS 
    on upper(COUNTRY) like SEARCH_STRING

This has the benefit of showing which countries matched which search string, but at the expense of showing duplicates where more than one of the search strings matches.

Capture

For our single column table this can easily be solved with the DISTINCT keyword.

/* Scan for countries whose name is LIKE any of the search strings. */
 with SEARCH_STRINGS as (
     select '%' concat SEARCH_STRING concat '%' as SEARCH_STRING
     from(
         values('EN'),
               ('LY'),
               ('LAND'),
               ('SOUTH')
     ) x(SEARCH_STRING)
 )

select distinct(COUNTRY)
from COUNTRIES
join SEARCH_STRINGS
     on upper(COUNTRY) like SEARCH_STRING

Capture

However, a better way of selecting distinct matches is to use Method 2 below.

Method 2 – Using EXISTS with a Subquery

This time the CTE of search strings is used as part of an EXISTS subquery which uses the LIKE predicate against the outer SQL’s scan column (COUNTRY), together with the SEARCH_STRING from the CTE of search strings. This results in only one row being selected from the COUNTRIES table when any of the search strings match.

with SEARCH_STRINGS as (
    select '%' concat SEARCH_STRING concat '%' as SEARCH_STRING
    from(
        values('EN'),
              ('LY'),
              ('LAND'),
              ('SOUTH')
    ) x(SEARCH_STRING)
)

select *
from COUNTRIES
where exists (
    select *
    from SEARCH_STRINGS
    where upper(COUNTRY) like SEARCH_STRING
)

Capture

Of course, we can negate the WHERE clause to find all countries that do not match any of the search strings:

select *
from COUNTRIES
where not exists (
    select *
    from SEARCH_STRINGS
    where upper(COUNTRY) like SEARCH_STRING
)

Capture

Examples of Both Methods Without CTEs

Method 1

select *
from COUNTRIES
join (
    select '%' concat SEARCH_STRING concat '%' as SEARCH_STRING
    from(
        values('EN'),
              ('LY'),
              ('LAND'),
              ('SOUTH')
    ) x(SEARCH_STRING)
) SEARCH_STRING
    on upper(COUNTRY) like SEARCH_STRING

Method 2

select *
from COUNTRIES
where exists (
    select *
    from(
        values('EN'),
              ('LY'),
              ('LAND'),
              ('SOUTH')
    ) x(SEARCH_STRING)
    where upper(COUNTRY) like '%' concat SEARCH_STRING concat '%'
)