.. EE 508 / DS 537: Data Science for Conservation Decisions
.. _lab1-1:
Countries and threatened species with QGIS
==========================================
In this lab, we explore how the average "richness" (i.e., the count of overlapping ranges) of threatened species varies across countries and administrative regions. Along the way, we will learn how to create maps and process data using :gui:`QGIS`, a free and open-source geographic information system.
.. admonition:: Deliverable
A map showing the global average threatened species richness (for birds, amphibians, and mammals) by first-level country subdivisions (states, provinces, etc.)
.. admonition:: Due date
Tuesday, September 9, 2025, by 6:00 p.m.
Get the vector and raster data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Political boundaries
--------------------
`Natural Earth `_ provides high-quality vector data of political boundaries at coarse scales (best suited for global maps).
- On their `1:10m Cultural Vectors `_ page, find **Admin 0 – Countries** and pick the version "without boundary lakes" (4.8 MB):
:file:`ne_10m_admin_0_countries_lakes.zip`
- Find **Admin 1 – States, Provinces** and pick the version "without large lakes" (14 MB):
:file:`ne_10m_admin_1_states_provinces_lakes.zip`
.. tip::
If you need more precise vector data for fine-scale projects: `GADM `_ (the Global Administrative Database) has high-quality vector data of country and country subdivision boundaries down to four levels of subdivision (e.g., country > states > counties > towns in the US). The full layer is 537 MB. GADM does *not* exclude large inland lakes (e.g., Great Lakes).
Many countries provide higher-quality vector data for national mapping purposes. In the United States, an excellent source for administrative and census-level subdivisions is `NHGIS `_, where you can download full-country shapefiles for counties (230 MB), subdivisions (580 MB), census tracts (723 MB) and more, with oceans and Great Lakes excluded.
Richness of threatened species
------------------------------
Clinton Jenkins maintains `BiodiversityMapping.org `_, where you can `download `_ raster data (GeoTIFFs) of aggregate species richness for different groups of species.
- We will use his global terrestrial maps in the GeoTIFF format. The original data is here:
:file:`BiodiversityMapping_TIFFs_2019_03d14_revised.zip`
- You can download the original file `here `_. However, the data **contains empty values and has convoluted file names**. I save you the renaming and zero-filling (this lab is long) and provide zero-filled data on the EE 508 drive. Download here:
`/data/processed/lab1/part1/ `_
- Rasters for threatened species have the suffix :file:`_thr`.
.. tip::
Directory structures in this course are inspired by the `Cookiecutter Data Science `_ project, a community template created by `Carl Boettiger `_ and colleagues at Berkeley.
Download and unzip
------------------
Download all data files into a folder for this lab, e.g. :file:`~/ee508/data/external/lab1/part1/`.
Unzip the :file:`.zip` files from Natural Earth.
- Write access and unzipping are important, because we will use :gui:`QGIS` to edit the files and add new columns to the vector data.
.. attention::
If this is the first time you are working with vector data in the *ESRI shapefile* format (:file:`.shp`), note that they become useless without their companion files of the same name (:file:`.cpg`, :file:`.dbf`, :file:`.prj`, :file:`.shx` - a caboodle?). You have to keep these files in the same folder, copy them as a group, etc. If you don't plan to edit them, it's easiest to leave them in their :file:`.zip` file, which :gui:`QGIS` and Python's ``geopandas`` can read *as is*.
Visualize the data
~~~~~~~~~~~~~~~~~~
Open :gui:`QGIS` and start a :gui:`New Empty Project`.
- Save it, e.g. :file:`~/ee508/reports/lab1/part1/world.qgz`.
Drag-and-drop the two political boundary layers into your :gui:`QGIS` project.
.. important::
QGIS automatically assigns the coordinate reference system (CRS) of the first imported layer to the entire project.
In essence, the CRS defines how we go from a 3D world (reality) to a 2D map. Because many of our operations end up happening in 2D (screens, geometric operations, etc.), knowing, choosing, and tracking CRS are essential tasks for geospatial data specialists. Read more: `What is a CRS? `_.
NaturalEarth uses the most common CRS for global mapping: latitude and longitude based on the WGS84 ellipsoid. This CRS is used by Google Maps, GPS, etc. and identified by its unique `EPSG `_ code **4326**. It is now also the CRS for your QGIS project. You can change the project CRS anytime in the Menu :gui:`Project` > :gui:`Properties...` > :gui:`CRS`.
Make the fill color of the political boundary layers transparent, so only outlines remain:
- :gui:`Layers` panel > right click on the vector layer > :gui:`Properties...`
- In :gui:`Symbology` > click on the :gui:`Color` bar > set :gui:`Opacity` to :input:`0.00%`. The color bar will show a grey-white checkerboard in the right half.
To better distinguish countries from subdivisions, you can give the latter a thinner outline or increase its transparency:
- In :gui:`Symbology` > click :gui:`Simple Fill` > reduce :gui:`Stroke Width` to :input:`0.05`.
- Alternative: click :gui:`Simple Fill` > click on the :gui:`Stroke color` > set :gui:`Opacity` to :input:`25%`.
- Alternative: click :gui:`Layer Rendering` > set :gui:`Opacity` to :input:`25%`. This makes the entire layer more transparent (including potential fill colors, labels, symbols, etc.).
Drag-and-drop the threatened species richness layers (:file:`_thr`) into your QGIS project.
Give all layers short and meaningful names:
- :gui:`Layers` panel > right-click on layer > :gui:`Rename Layer`.
- Alternative (OSX): :gui:`Layers` panel > left-click on layer > press Enter.
Make sure the political boundary layers are shown on top of the rasters by dragging and dropping them in the :gui:`Layers` panel until both political boundary layers are listed first.
Change the color mapping of the species raster data:
- Right-click on species raster layer listed on top > :gui:`Properties...`
- Under :gui:`Symbology` > :gui:`Band Rendering` > :gui:`Render type`, choose :input:`Singleband pseudocolor`.
- To make sure the visualization includes all values, open :gui:`Min / Max Value Settings`, make sure the :gui:`Min / max` option is selected and the :gui:`Statistics extent` is :input:`Whole Raster`. Set :gui:`Accuracy` to :input:`Actual (slower)`.
- Pick a :gui:`Color ramp` you like.
- In :gui:`Mode`, choose :input:`Equal Interval` (with 5 classes). Click :gui:`Classify`.
- If you do not like the shape of the slope of the color gradient, you can manually change the :gui:`Value` of each color step in the color ramp.
- Close the dialog box with :gui:`OK`.
- Repeat for each raster layer with a different color gradient.
Add basemaps
~~~~~~~~~~~~
In the :gui:`Browser` panel, right-click on :gui:`XYZ Tiles` and choose :gui:`New Connection...`
- If the :gui:`Browser` panel is not visible: Menu > :gui:`View` > :gui:`Panels` > :gui:`Browser`
- In Name, enter :input:`Google Satellite`.
- In :gui:`URL`, copy and paste the following URL, then click :gui:`OK`:
:input:`https://mt1.google.com/vt/lyrs=s&x={x}&y={y}&z={z}`
- Drag-and-drop the Google Satellite layer from :gui:`Browser` into the panel :gui:`Layers`. You should see the familiar Google Satellite view. (If the view appears to be incorrectly projected, zoom in a bit, and it will self-correct).
- Many other layers are made available by different providers in that way (e.g. Google Road, OpenStreetMap, etc.). See `this post `_ for the corresponding links.
.. admonition:: Explore
Zoom in a bit. Switch the various layers on and off.
How do global distributions of threatened species ranges differ?
In absolute counts and in spatial patterns?
What might be an explanation for that?
Calculate species richness stats by subdivision and species class
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Open the :gui:`Processing Toolbox`: Menu > :gui:`Processing` > :gui:`Toolbox`.
- Search for :input:`zonal`.
- Open the tool :gui:`Raster analysis` > :gui:`Zonal statistics` (double-click).
- In :gui:`Input layer`, choose the subdivisions (states/provinces) polygon layer.
- In :gui:`Raster layer`, choose one of your species raster layers.
- Choose an :gui:`Output column prefix` that identifies the species group you picked (e.g., :input:`amph_`, :input:`bird_`, or :input:`mamm_`).
- Next to :gui:`Statistics to calculate`, click the :gui:`...` button, uncheck everything except :gui:`Mean`, and close.
- Run the algorithm.
- Oh no! You're getting an error (if you used the original files from Natural Earth)
*Feature (439) [...] from `ne_10m_admin_1_states_provinces_lakes` *has invalid geometry. Please fix the geometry or change the "Invalid features filtering" option for this input or globally in Processing settings.*
- This means that the polygon layer has errors. This happens surprisingly often in practice.
- Fix it with the :gui:`Fix geometries` tool:
- Menu > :gui:`Processing`> :gui:`Toolbox` > search for :input:`fix` > :gui:`Fix geometries`
- Pick your layer and save to a new file. Run.
- Use the fixed vector dataset from here onwards.
- My recommendation is to delete the old one, so you don't mix them up.
- :gui:`QGIS` will create a new temporary vector layer, named :gui:`Zonal Statistics`, that has the same geometries and data as the original vector layer (subdivisions), with a new data column that contains the results from :gui:`Zonal Statistics`.
- You can examine the data by right-clicking on the layer > :gui:`Open Attribute Table`.
- Rename the layer :gui:`Zonal Statistics`, so that it is uniquely identified (we will run the algorithm two more times, and it tends to use the same name for the output layer).
- Create one vector layer with three columns of species richness, one for each species group by repeating the above steps. You can accumulate columns by using the output vector layer of each :gui:`Zonal Statistics` run as the :gui:`Input layer` for the next run.
- Once you're done, visualize the zonal statistics results with a choropleth map:
- :gui:`Layer` panel > right-click on the vector layer > :gui:`Properties...`
- In :gui:`Symbology`, choose :gui:`Graduated` from the top-level dropdown menu
- In :gui:`Value`, pick the data column you'd like to visualize. The ones you just generated should be at the bottom of the list, with the suffix :gui:`_mean`.
- In :gui:`Mode`, choose :gui:`Natural Breaks (Jenks)`. Set :gui:`Classes` to :input:`10`.
- Click the :gui:`Classify` button, then :gui:`OK`.
.. admonition:: "Choro... what?"
"Choro-" derives from the Ancient Greek χώρα (chōra), meaning "place, region, or area."
"-pleth" comes from the Ancient Greek πληθής (plēthēs), meaning "multitude" or "full, filled", and is related to πλῆθος (plēthos), meaning "a multitude, quantity, or crowd."
A Choropleth map is a map where regions (chōra) are filled (plēth-) with colors or patterns according to the values of a particular variable (like population density, temperature, election results, etc.).
The term was coined in English in the early 20th century by geographers and cartographers, based on these Greek roots, to describe this specific type of thematic map.
- Repeat the above steps for the other two species groups so you end up with all three columns (``amph_tr``, ``bird_thr`` and ``mamm_thr``) in the attribute table.
.. admonition:: Question
Why do you think some state polygons get dropped when you visualize the amphibian species richness?
Short questions like this one will appear throughout EE 508 labs. Their purpose is to invite you to you pause and to reflect about what you're doing. Unless otherwise noted, **you don't have to write up an answer** to these questions! Write-up instructions usually appear at the end of labs and tend to address bigger-picture items.
Calculate average threatened species richness across all classes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- :gui:`Layers` panel > right-click on the state/provinces vector layer > :gui:`Open Attribute Table`.
.. image:: img/lab1-1_field_calculator.png
:alt: Field Calculator
:align: right
- Open the :gui:`Field Calculator` (Ctrl + I or Command + I) or by clicking this menu button:
- Choose an :gui:`Output field name` (e.g., :input:`thr_mean`).
- In :gui:`Output field type`, choose the correct variable type (you have to know this).
- The middle window contains a list of functions and field names you can use.
- Click on :gui:`Fields and Values`, then click on one of the three fields you just calculated. Notice how helpful information is displayed on the right-hand side.
- Double click on each field name to add it to the :gui:`Expression` window. Notice that you need to use quotation marks to refer to your fields in the :gui:`Field Calculator`. Also notice that an error notice (*Expression is invalid*) appears at the bottom.
- Still in the :gui:`Expression` window, add :input:`+` signs between the three field names in quotes to signal that you would like to sum up the values. When your formula is done, press :gui:`OK`.
- Inspect the values. What happened to the values in the subdivisions for which we had no data for amphibians?
- Let's re-compute threatened species richness across groups - but set empty input values to :input:`0`.
- Re-open the :gui:`Field Calculator`. You can find your previous formula at the bottom of the middle window, under :gui:`Recent (fieldcalc)`. Double click on it.
- Replace each field name in the summation by
:input:`if("fieldname" is NULL, 0, "fieldname")`
where :input:`"fieldname"` is the name of the field you would like to add.
- This will set any empty values to zero.
- Your final expression should be a sum of the three :input:`if(...)` statements.
- Choose an :gui:`Output field name` and :gui:`Output field type` and press :gui:`OK`.
Make a map
~~~~~~~~~~
Visualize the new column. Choose a :input:`Natural Breaks (Jenks)` classification with 10 classes.
.. caution::
Make sure that the new classification is based on your new variable and covers its full range. The easiest way to guarantee that is to click the :gui:`Classify` button after switching the variable you want to classify. If you don’t do this and erroneously keep the classes of another variable, polygons with values outside the range of classes might not get displayed.
Take some time to inspect the results:
- Compare with the columns for individual species groups. Does one species group dominate the overall pattern? Why might that be the case?
- Do you consider this computation appropriate to visualize differences in threatened species richness across administrative units?
Save your map:
- Leave only the state/provinces layer visible, displaying average threatened species richness by state/province, summed across all three groups.
- Export your map as a PNG image to your class folder.
- Menu :gui:`Project` > :gui:`Import/Export` > :gui:`Export Map to Image...`
- In :gui:`Extent`, open the :gui:`Calculate from` :input:`Layer` dropdown, and select the layer.
- Increase the :gui:`Resolution` to :input:`200 dpi`.
- Click :gui:`Save` to save it as:
:file:`~/ee508/reports/lab1/1_threatened_species_richness_by_state.png`
.. attention::
Please use this exact directory structure and filename here and for the rest of the course (replacing :file:`~/ee508/` with your project folder). This helps me automatize some of the evaluation process, which in turn allows me to offer this class to so many students with individualized feedback.
**Nice job!**
You have already learned a few useful skills in QGIS:
- Visualizing raster and vector layers.
- Finding tools in the Processing toolbox and using them.
- Computing new values in vector data.
- Saving a map.
Reflect
~~~~~~~
- In a short paragraph (approx. half a page), consider this question:
Now that we have this raster data on species richness, (how) can we use it to identify priority areas where more (or less) conservation actions should occur, e.g. new regulations, land acquisitions, or conservation payments?
This is not about being correct. Use this opportunity to think about it and share your own thoughts (not those of an AI). Spend no more than ~10min on it.
- Save your writeup:
:file:`~/ee508/reports/lab1/1_writeup.docx`
Wrap up
~~~~~~~
Your folder :file:`~/ee508/reports/lab1` should now contain both deliverables:
| :file:`1_threatened_species_richness_by_state.png`
| :file:`1_writeup.docx`
- Compress the two files into a single :file:`.zip` archive.
- Find the Google Assignment with the lab title on the `Blackboard course website `_:
- Upload your :file:`.zip` archive.
.. admonition:: You're done!
That's it.
I hope you enjoyed your first steps in QGIS.
Bring your questions and opinions to class!
We'll continue in Python. Is your environment ready? If not, see :doc:`../../getting-started/environment`.