It’s pretty common to use SQL to extract sales summaries from tables that
represent sales orders as rows. These kinds of queries often use the
AVG()
aggregate function to summarise the sales volume. There is a major pitfall when
doing this, though!
We might run a sub-query like this to get daily revenue totals from the orders
table:
SELECT order_date,
SUM(order_total) AS total_revenue
FROM orders
GROUP BY order_date
Those aggregated revenue query results might look something like this:
+------------+---------------+
| date | total_revenue |
+------------+---------------+
| 2021-08-01 | 940.00 |
| 2021-08-02 | 795.00 |
| 2021-08-05 | 795.00 |
| 2021-08-06 | 1860.00 |
| 2021-08-08 | 495.00 |
+----------+-----------------+
If we want to calculate a daily average over a period like this, we might use
AVG(total_revenue) . In this case with these five rows that would be 977.00 .
This looks like it might be right, but it’s not.
The issue is that there are days with no orders in that range. 2021-08-03 ,
2021-08-04 and 2021-08-07 sadly had no sales. They should be in the
aggregate query results with a total_revenue of 0.00 , like this:
+------------+---------------+
| date | total_revenue |
+------------+---------------+
| 2021-08-01 | 940.00 |
| 2021-08-02 | 795.00 |
| 2021-08-03 | 0.00 |
| 2021-08-04 | 0.00 |
| 2021-08-05 | 795.00 |
| 2021-08-06 | 1860.00 |
| 2021-08-07 | 0.00 |
| 2021-08-08 | 495.00 |
+----------+-----------------+
With those empty days included, the result of AVG(total_revenue) is now
610.625 , which is the real average daily revenue in that range of dates.
This is a common pitfall when grouping by date in SQL queries, as it’s easy to
forget that if there are no representative rows in the source table, then those
rows will be absent in the aggregated results, instead of appearing as a zero
value like they should.
So how can we ensure that we get rows for all the days, even if some of them had
no sales?
We can re-arrange the query to be based on a date range, and then join the
orders table on that, so that no absent days are skipped.
In Postgres, you can achieve this by selecting from the
generate_series
function which makes this relatively easy.
What about MySQL? It’s a bit harder to
generate a date range in MySQL,
but doable.
We end up with a query like this:
SELECT
date_series.dates AS date,
IF(SUM(orders.order_total) IS NULL, 0.00, SUM(orders.order_total)) AS total_revenue
FROM (
SELECT series.dates
FROM (
SELECT DATE('2021-08-08') - INTERVAL units.pos DAY AS dates
FROM (SELECT 0 AS pos UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS units
) series
WHERE series.dates >= DATE('2021-08-08') - INTERVAL 7 DAY
) date_series
LEFT JOIN orders
ON DATE(orders.created_at) = date_series.dates
GROUP BY date
Which gives results like this:
+------------+---------------+
| date | total_revenue |
+------------+---------------+
| 2021-08-01 | 940.00 |
| 2021-08-02 | 795.00 |
| 2021-08-03 | 0.00 |
| 2021-08-04 | 0.00 |
| 2021-08-05 | 795.00 |
| 2021-08-06 | 1860.00 |
| 2021-08-07 | 0.00 |
| 2021-08-08 | 495.00 |
+----------+-----------------+
Notice that the days without any orders still get represented with a row with
total_revenue as 0.00 . It’s then safe to run an AVG() across that:
SELECT AVG(total_revenue)
FROM (
SELECT
date_series.dates AS date,
IF(SUM(orders.order_total) IS NULL, 0.00, SUM(orders.order_total)) AS total_revenue
FROM (
SELECT series.dates
FROM (
SELECT DATE('2021-08-26') - INTERVAL units.pos DAY AS dates
FROM (SELECT 0 AS pos UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS units
) series
WHERE series.dates >= DATE('2021-08-26') - INTERVAL 7 DAY
) date_series
LEFT JOIN orders
ON DATE(orders.created_at) = date_series.dates
GROUP BY date
) AS daily_totals
We then get the correct average daily revenue in the period of 610.625 .
By the way, if you need help with SQL databases in your business, you can hire
me as a
freelance database developer.
View post:
Beware of SQL averages over dates with empty days
|