When is the air the dirtiest in the Pacific Northwest? An air pollution module for teachers

In this blog post, we set out to answer two questions:

  1. When is the air the dirtiest in the Pacific Northwest (or PNW)?
  2. Has the timing of air pollution changed in the PNW over the last decade?

This blog post is aimed at anyone interested in air quality in the PNW, but especially teachers who may find this module useful in their classrooms.    We will start with our questions (see above), use guided inquiry, identify the datasets and finally work with the data using Excel to answer our questions.

Guided inquiry

To answer these questions I am going to use data from Seattle WA.   Here is a nice pic of downtown Seattle on a clear day from the Space Needle webcam (June 15, 2019).  You can see Mt. Rainier in the background.

We’re going to focus on PM2.5.   This is particulate matter with diameters less than 2.5 µm.   These are really small particles, with a diameter about 1/20th of a human hair.  But these are the particles that are most important from a human health perspective, as they are inhaled deep into our lungs (see figure).  These particles are mostly solids, but sometimes also include liquid aerosol particles as well.    Gases (like carbon monoxide, carbon dioxide, etc) are in a different category and not included in PM2.5.

While these data are collected by state and local agencies, and are usually available thru state agency website, it is reported to the EPA and I find that’s the simplest place to get it.   The data will always have the same format from EPA, no matter what state you are looking at:

https://www.epa.gov/outdoor-air-quality-data/air-data-concentration-plot

Choose the pollutant (PM2.5), time frame (1999-2023) and location  (Seattle-Tacoma-Bellevue, WA).   The regions are based on U.S. Census Bureau designations.   You will get a list of every site in the region according to their specific ID code (called AQS code).    Start by choosing “All sites”.  When you hit “Plot data” it will take a few minutes but eventually you will get a plot of every official PM2.5 measurement in the region.   Here you can choose which site you want to look at.   Some sites have very short data records.   Be sure to choose a site that has enough data for your analysis.   To keep this simple, lets just look at one site in the region.   For Seattle, I am going to use the data from the “Seattle-Beacon Hill” site, which is one of the longest running air stations in the region.  Go to the plot and find the AQS ID code for Beacon Hill (530330080).   Now go back up to the list of sites by ID code and choose the code for Beacon Hill.   Now it will plot the data from just that one site.  But there is one final trick in getting the data.  Sometimes a site will have multiple ways they measure PM2.5.   Each different method is given its own “Parameter Code” or POC, so you may see multiple plots from one site.   That’s ok.  We’ll deal with that later.   So at this point you should see all the PM2.5 data for that one site, no other sites.  After you see the plots for only the one site you are interested in, scroll to the bottom and choose “Download CSV” and the datafile will be saved to your local computer. 

For this analysis we will use Microsoft Excel.   This is a powerful spreadsheet program that can be used to do calculations (even complex calculations) and make graphics.    It is one of the most common tools used for calculations and plotting.    Other programs could be used as well.   For example, Google Sheets is a free program that can do similar tasks.    But if you have access to Excel (through a school or home computer), I recommend you use that.     (Send me an email if you want us to convert these instructions for Google Sheets!)  

Find the CSV file on your computer.  A CSV file is a simple datafile format that can be read into many different computer programs.   If you double click on the file, it may open directly with Excel, but maybe not.  It depends on how your computer is configured.   If it doesn’t open directly, then open Excel first and drag the CSV file onto a blank spreadsheet page.   You should see a lot of data.   On the first row, you will see the data labels starting with “date” in cell A1 and “AQSval” in cell V1.    We will only use the PM2.5 measurement column, which is labeled “Daily Mean PM2.5 Concentration”.    This is a daily 24 hour average, from midnight to midnight local time.   For this analysis, we need to add in two columns for year and month.  To do this, highlight or right click on the first column (date) and then “insert” and everything will move over to the right.  Do this three more times and this will leave you with four blank columns.   Type “Year” in A1, “Month” in B1, “Season” in C1 and “DOY” in cell D1. In cells A2 type “=Year(D2)” and in cell B2, “=Month(D2)”.   This tells excel to read the data in column D2 and return the year or month, respectively.   In cell C2, type “=IF(B2<3.5,1,IF(B2<6.5,2,IF(B2<9.5,3,IF(B2<12.5,4,-999))))”.  This tells Excel to read the month and return the season as a number between 1-4.    Finally, DOY stands for day of year, which is a number that goes from 1-365 (or 366 in leap years).  The formula for this is “=F2-DATE(A2,1,1)+1”  Now use your mouse to copy all four formulas down to the bottom of the spreadsheet.    This copies the formula so that Excel is reading from the date column and returning the correct year, month, season and DOY in columns A-D. At this point your spreadsheet should look like this:

YearMonthSeasonDOYDateSourceSite IDPOCDaily Mean PM2.5 Concentration
19991121/2/99AQS530330080111
19991131/3/99AQS530330080120.8
19991161/6/99AQS530330080114.1

There are more columns to the right and a lot more data rows below.  Now we can start to play with the data to answer our questions.    In Excel the easiest way to do this is using a pivot table.   Start by using your mouse to highlight all of the columns from A to Y and all way down to the bottom of the spreadsheet.   While the sheet is highlighted, click “Insert” then “Pivot table”.    Put the pivot table on a new worksheet. 

Lets start with some easy plots using the pivot table.   First, we will plot the annual average and maximum daily value for each year.     Click on the pivot table and then choose “Show field list”.   This is the main controls on your pivot table.   Put the “Year” variable in the rows box and “daily mean PM2.5 concentration” in the values box.    Actually, put “daily mean PM2.5 concentration”  into the ”values” box twice.   From the Pivot table fields box, you can choose exactly what is being calculated.   For the first “daily mean PM2.5 concentration” click on it and choose “Value field settings” and then “average”.  For the second one, choose “maximum”.   Your pivot table should now look like this:

Row LabelsAverage of Daily Mean PM2.5 ConcentrationMax of Daily Mean PM2.5 Concentration
19999.02934782627.2
20009.08395833331.1
20018.38771551737.9
20028.4334008131.1
20038.10432043225.6
20048.75809443538.9
20058.1742180933.3
20067.99185185258.1
20077.65487465249.8
20086.56959314825.1
20097.02313769837.1
20105.69700315522.3
20115.91444043317.4
20126.0186594227.5
20136.45103092825.7
20145.94315789527.1
20156.39894514833.1
20165.38123711316.2
20176.8867768656.8
20186.62442105398.1
20195.27422037415.5
20206.131954023179.1
20214.36645702345.3
20226.876445396110.4
20235.79846938850
Grand Total7.062420794179.1

By the way, remember there are multiple measurements on some days.  By using the pivot table and averaging for every date, we simply averaged all available measurements at that station.    If you are super gung-ho, you can look at how well the different methods compare, but that’s not our focus here……

Pivot tables are super useful for slicing and dicing the data, but they don’t let you make a good graph.   For this, you have to copy the pivot table values onto a separate part of the spreadsheet and make your graph from there.    Here is a plot of these data:

Wow, there were some really high PM2.5 concentrations on some days in 2018, 2020 and 2022 (watch out 2024!).    No surprise that these were due to wildfire smoke.   Here are a couple of pictures from the Seattle Space Needle webcam.    September 14th, 2020 was the worst air quality day that Seattle has ever experienced!     A bad day to breathe.  Notice that I did not plot the last row of data (Grand total) as this would not make sense on this graph. 

Now while this graph is ok, lets make it better.  For starters, there are no units given and this is an important part of any scientific measurement.   Also, the annual average of daily means, the blue line, can’t be read or interpreted because the scale is too large.   Lets make a better graph by adding a second axis and labeling our axes with units.    Make sure each y axis is clearly labeled. 

Ok. I’m happy with this graph.  It shows two important things, first the average PM2.5 in Seattle has steadily been going down until about 2016.   After that it varied, depending mostly on how bad the wildfire season was in that year.    This is a great success story as it shows that our laws and regulations on industry, cars, trucks, etc have made good progress in reducing PM2.5 pollution.  On the other hand, wildfire smoke has now  become a major driver of poor air quality in Seattle and many other western U.S. locations.  In 2017, 2018, 2020 and 2022, some days have had very unhealthy air quality.

Lets move on to our questions:

  1. When is the air the dirtiest in the Pacific Northwest?
  2. Has the timing of air pollution changed in the PNW over the last decade?

There are various ways to answer these questions.  I want to use the simplest approach that gives approximately correct answers.   To start, go back to the main datasheet in Excel and right click on the first row of the spreadsheet and insert a new blank row at the top of the spreadsheet.   Then click on the E column (date) and insert another new blank column.   Be sure the entire column is highlighted before you do this.   In the now blank cell E3 type “=IF(J3>$E$1,1,0)”.   This will put a 1 in this cell if the PM2.5 exceeds the value in cell E1.   Now what value should we put in cell E1?   I am going to use two levels, first lets look at days greater than 35 µg m-3.  Later we can look at a second level.  So type 35 in cell E1. This corresponds to an Air Quality Index of 100.  PM2.5 concentrations over this correspond to air that is “Unhealthy for sensitive groups” according to the EPA (See https://www.airnow.gov/aqi/aqi-basics/).   (To see how to convert from scientific units (µg m-3) to AQI values, go here:  https://www.airnow.gov/aqi/aqi-calculator/).  Copy the formula in cell E3 to the bottom of the spreadsheet.

Now go to the pivot table and first right click anywhere in the table and hit “Refresh”.   This will refresh the data and add the new column to the analysis.   To get the days each year over 35 µg m-3, I set up my pivot table with both “Season” and “DOY” in the rows box. Be sure to put Season on top and DOY underneath in the rows box.   Put “Year” in the column box and then “Average of Exceedance” in the value.     Be sure to use Average here, not one of the other calculations in the pivot table.   Your pivot table should now look like this:

Average of ExceedanceColumn Labels
Row Labels199920002001200220032004
1000000
100000
2000000
300000
40000
500000
600000
700000

Where the first 1 (in blue) indicates the season (winter) and the numbers 1-7 indicate the day of year.   The years should continue over all the way to the right til 2023 and the DOYs should continue down until 366, which is only filled in for leap years.   To give yourself a bit more room on the sheet, right click on row 1 and insert at least 5 new  blank rows.    Now in the new space at the top of this sheet, you can easily add up all the days over 35 µg m-3 by year and by season. In row B2, you can type “=SUM(B13:B103)”, which will sum all the days in the first year(1999) for winter.    Use similar formulas to sum all the days in each season (winter, spring, summer and fall) for 1999.   Then just highlight and copy the formulas you already have for 1999, and paste into the cells for the other years.   The formulas will automatically update for the appropriate cells.   At this point, the top of this sheet will look like this:

199920002001200220032004
Winter000000
Spring000000
Summer000000
Fall002000.25

So this is telling us that in 2001, there were two days in fall that had more than 35 µg m-3 of PM2.5.   Ok then why is the value in 2004 a fraction 0.25?  What does that mean?    Well, this is due to the fact that we included multiple PM2.5 measurements for every day that were reported to the EPA.    So its likely that on this day, there were four measurements of PM2.5 but only one was above 35 µg m-3.    Looking down the pivot table, I see that this occurred on DOY 310 (in 2004) and that on this day, there were four measurements of PM2.5 with values of 22.2, 24.5, 38.9 and 29.4 µg m-3.  Now why were the values so different, I don’t know.   We could investigate but keep in mind all measurements have some degree of uncertainty.    Lets keep this exercise simple, so we wont go into this in any more detail.    We can just use the value of 0.25 in our calcs and it wont the change the overall results.  Now lets graph the results.

I highlighted everything from “Winter” to the end of 2023 and then chose my graph type (X-Y scatter) and here is what I got:

Lets make this graph a bit nicer by making the fonts darker, adding a title and changing the line colors.

Since we did this in the spreadsheet, its really easy to redo this for a different threshold.  Lets choose to look at days greater 15 µg m-3, which corresponds to “moderate” air quality, per the same EPA website we used before.   Do this by going back to your main data page and typing 15 into cell E1.   All the values will update.  In the pivot table hit “Refresh” again so the results will update for 15 µg m-3.   The graph should automatically update, but be sure to change the title in the graph to keep things clear.

That’s it on the calculations.  Now we have enough information to answer our two questions:

  1. When is the air the dirtiest in the Pacific Northwest?
  2. Has the timing of air pollution changed in the PNW over the last decade?

As we saw earlier, the annual average PM2.5 concentrations were declining until about 2016.  Since then, they have gone up and down, largely depending on the wildfire season.  The two graphs above show how the seasonal air quality has changed since 1999.   Unhealthy air (PM2.5 > 35 µg m-3) was relatively rare at the start of this period, with just a few days in Fall of 2001 (Nov. 10 and 11, 2001).   These late fall days are usually associated with high pressure and temperature inversions that trap local air pollution and can lead to unhealthy air.   Here is a picture that shows a good examples of a temperature inversion in Seattle.   I wont explain “inversion” here.  Google it!   On this day, PM2.5 was significantly elevated (19 µg m-3).

Seattle on January 22, 2013 (PM2.5 = 19 µg m-3).  Image by Long B. Nguyen. (https://www.flickr.com/photos/lbn/8412166831)

But besides these few high PM2.5 days in 2001, over 35 µg m-3, essentially all the bad air quality days are in summer and occurred in the bad fire years of 2017, 2018, and 2020.  Note that in 2022, bad air quality days occurred in fall, but early fall:  October 15, 19 and 20th.   These were also wildfire smoke events.   So our definition of fall (October-December) can include some very different pollution sources; both wildfire smoke (in early fall) and inversions, that trap local air pollution (in late fall).

Looking at the data for the 15 µg m-3 threshold, you can see that the number of days exceeding this level overall has gone done, mainly in fall, as local air pollution sources have been more tightly controlled.  But at the same time, the number of days over this threshold in summer (and some in fall) has clearly ramped up with the frequent occurrent of massive fire seasons in the PNW. 

So in summary:

  1. When is the air the dirtiest in the Pacific Northwest?

The air in the PNW west is usually dirtiest in the summer or early fall due to the frequent wildfires that occur in those time frames.

  • Has the timing of air pollution changed in the PNW over the last decade?

Yes!   In the early 2000s, air tended to be dirtiest in late fall (November-December) due to occasional strong inversions that trapped local pollution.   As regulations and emissions controls have ramped up, sources such as industry and vehicles have been reduced significantly.   Now, in most years, wildfires are often the most important air pollution source in the PNW.

TEACHERS:  

Access the raw datafile for this exercise here.

Access the complete Excel spreadsheet here.