14 Fire and Smoke Lab (Alberta-first)

This lab chapter turns the book’s running thread into a repeatable workflow:

find station metadata (where are the monitors?)
pull a long-running PM2.5 time series
define “smoke episodes” as event windows
produce a small set of visuals that you can reuse in later modelling

The point is not to build the final “best” model. The point is to create a disciplined dataset and evaluation story that the mathematics in Chapters 1–8 can attach to.

14.1 Data access (OpenAQ)

OpenAQ is a convenient aggregator for public air-quality monitoring data, but it requires a free API key. Set it as an environment variable:

OPENAQ_API_KEY=...

If the key is missing, this chapter will still render using a small synthetic demo dataset so the figures exist and the workflow is legible.

14.2 Station map (context first)

14.3 Pull a daily PM2.5 time series

For a first pass, daily aggregates are enough to reveal seasonality and smoke episodes without drowning in detail.

14.4 A reusable “data card”

Use this as the book’s shared context when you write about the smoke project. It forces you to say what you used, what you excluded, and what is unknown.


::: {.callout-note}
## Smoke data card (draft)

- Station: **Calgary (demo)**
- Window: **2025-04-16 → 2026-04-15** (365 days)
- Missing PM2.5 values: **0**
- Threshold: **25 µg/m³**
- Days above threshold: **10** (2.7%)

Top detected episodes (by peak):

|   episode_id | start      | end        |    peak |   days |
|-------------:|:-----------|:-----------|--------:|-------:|
|            1 | 2025-08-14 | 2025-08-14 | 67.4727 |      1 |
|            2 | 2025-08-15 | 2025-08-15 | 57.0567 |      1 |
|            7 | 2025-12-22 | 2025-12-22 | 50.2258 |      1 |
|            3 | 2025-08-16 | 2025-08-16 | 47.1319 |      1 |
|            8 | 2025-12-23 | 2025-12-23 | 43.7008 |      1 |
:::

14.5 Episode summaries (a first “event lens”)

Event-structure visuals make the later evaluation story easier to tell. A first step is to summarise how many smoke days occur in each month, and how peaks cluster.

14.6 Smoke-alert calibration (a decision-facing visual)

A smoke alert is a decision, not a number. If you publish probabilities for a next-day smoke event, they need to be calibrated: days predicted at 30% should be smoke days about 30% of the time.

This plot is a minimal baseline: a simple logistic model predicts next-day smoke-day probability from yesterday’s PM2.5 and seasonality terms. The point is not the model. The point is the evaluation discipline.

14.7 Where to take this next

Add weather covariates and plot “PM2.5 conditional on wind direction” to see transport patterns.
Add a fire-activity proxy and test whether it helps predict episodes without simply overfitting a particular year.
Move from station-only to population-weighted exposure once your station evaluation story is honest.