# Difference between revisions of "Window LAG"

## COVID-19 Data

Notes on the data: This data was assembled based on work done by Rodrigo Pombo based on John Hopkins University, based on World Health Organisation. The data was assembled 21st April 2020 - there are no plans to keep this data set up to date.

## Introducing the `covid` table

The example uses a WHERE clause to show the cases in 'Italy' in March.

Modify the query to show data from Spain

```SELECT name, DAY(whn),
confirmed, deaths, recovered
FROM covid
WHERE name = 'Italy'
AND MONTH(whn) = 3
ORDER BY whn
```
```SELECT name, DAY(whn),
confirmed, deaths, recovered
FROM covid
WHERE name = 'Spain'
AND MONTH(whn) = 3
ORDER BY whn
```

## Introducing the LAG function

The LAG function is used to show data from the preceding row or the table. When lining up rows the data is partitioned by country name and ordered by the data whn. That means that only data from Italy is considered.

Modify the query to show confirmed for the day before.

```SELECT name, DAY(whn), confirmed,
LAG(whn, 1) OVER (PARTITION BY name ORDER BY whn)
FROM covid
WHERE name = 'Italy'
AND MONTH(whn) = 3
ORDER BY whn
```
```SELECT name, DAY(whn), confirmed,
LAG(confirmed, 1) OVER (partition by name ORDER BY whn) AS dbf
FROM covid
WHERE name = 'Italy'
AND MONTH(whn) = 3
ORDER BY whn
```

## LAG operation

Here is the correct query showing the cases for the day before:

```SELECT name, DAY(whn), confirmed,
LAG(confirmed, 1) OVER (partition by name ORDER BY whn) AS lag
FROM covid
WHERE name = 'Italy'
AND MONTH(whn) = 3
ORDER BY whn
```

Notice how the values in the LAG column match the value of the row diagonally above and to the left.

nameDAY(whn)confirmeddbf
Italy11694null
Italy220361694
Italy325022036
Italy430892502
Italy538583089
Italy646363858
Italy758834636
Italy873755883
Italy991727375
Italy10101499172
...

## Number of new cases

The number of confirmed case is cumulative - but we can use LAG to recover the number of new cases reported for each day.

Show the number of new cases for each day, for Italy, for March.

```SELECT name, DAY(whn), confirmed,
LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn)
FROM covid
WHERE name = 'Italy'
AND MONTH(whn) = 3
ORDER BY whn
```
```SELECT name, DAY(whn), confirmed -
LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn) as new
FROM covid
WHERE name = 'Italy'
AND MONTH(whn) = 3
ORDER BY whn
```

## Weekly changes

The data gathered are necessarily estimates and are inaccurate. However by taking a longer time span we can mitigate some of the effects.

You can filter the data to view only Monday's figures WHERE WEEKDAY(whn) = 0.

Show the number of new cases in Italy for each week - show Monday only.

```SELECT name, DATE_FORMAT(whn,'%Y-%m-%d'), confirmed
FROM covid
WHERE name = 'Italy'
AND WEEKDAY(whn) = 0
ORDER BY whn
```
```SELECT name,DATE_FORMAT(whn,'%Y-%m-%d'),
confirmed-LAG(confirmed,1) OVER (ORDER BY whn) "new this week"
FROM covid
WHERE name='Italy' and WEEKDAY(whn) = 0
```

## LAG using a JOIN

You can JOIN a table using DATE arithmetic. This will give different results if data is missing.

Show the number of new cases in Italy for each week - show Monday only.

In the sample query we JOIN this week tw with last week lw using the DATE_ADD function.

```SELECT tw.name, DATE_FORMAT(tw.whn,'%Y-%m-%d'),
tw.confirmed, lw.confirmed
FROM covid tw LEFT JOIN covid lw ON
DATE_ADD(lw.whn, INTERVAL 1 WEEK) = tw.whn
AND tw.name=lw.name
WHERE tw.name = 'Italy'
ORDER BY tw.whn
```
```SELECT tw.name, DATE_FORMAT(tw.whn,'%Y-%m-%d'),
tw.confirmed - lw.confirmed
FROM covid tw LEFT JOIN covid lw ON
DATE_ADD(lw.whn, INTERVAL 1 WEEK) = tw.whn
AND tw.name=lw.name
WHERE tw.name = 'Italy'
AND WEEKDAY(tw.whn) = 0
ORDER BY tw.whn
```