.. _lab1-5:

Spatial analytics exercise: ecosystem value mapping
===================================================

You just learned how to work with tabular, vector, and raster data in Python. 

Let's use your new skills and practice them by applying them to a new problem.

Pick **one** of these three challenges and solve it. I'll give you a headstart, but you need to figure out the rest by yourself. If you come across any issues, post on Piazza!

.. admonition:: Due date

   Tuesday, September 30, 2025, by 6:00 p.m.


Option 1: how much forest carbon was lost in the Colombian Amazon 2000-2024?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. admonition:: Deliverable

   -  A raster map of estimated loss of carbon due to deforestation in the Colombian Amazon 2000-2024 showing the quantity of carbon lost on each pixel.

      :file:`~/ee508/reports/figures/lab1/5_carbon_stock_loss.png`

      You can use per-pixel or per-hectare units. Just make sure the unit of the color mapping is explained on the plot. If using per-pixel estimates, name the size of the pixel (original, coarsened, figure pixel).

   -  An estimate of total loss of carbon stock in the same region and period. Name the number and its unit, together with any explanations you’d like to add, in a brief summary:

      :file:`~/ee508/reports/figures/lab1/5_carbon_stock_loss_writeup.docx`

      If you compute intermediary numbers, also mention them in your writeup.

   -  A Jupyter notebook containing the code you used to generate the above deliverables.

      The notebook should be self-contained: a user with access to the respective data folders (which can contain products from any earlier exercise) can start a new Jupyter session, run all cells from your notebook, and will end up with the deliverables. Before you deliver this notebook, you **must demonstrate that the notebook is self-contained** by running it from top to bottom (:gui:`Kernel` > :gui:`Restart & Run All Cells...`). Once the run has finished successfully, save the notebook together with all cell outputs. The carbon stock map needs to be visible below one of your notebook cells. This allows me to help you troubleshoot any issues in your code without extra delays.

      :file:`~/ee508/reports/figures/lab1/5_carbon_stock_loss_code.ipynb`


Getting started
---------------

The **geographic extent** for this exercise is the bounding box (rectangle) that contains the eight Colombian departments that contain part of Colombia's Amazon rainforest.

-  You can reuse the ``subdivisions`` layer from Lab 1.3 and select on the ``'iso_3166_2'`` index the departments with the following values:

   ::

      ISO_3166_2_AMAZON = [
          'CO-AMA',
          'CO-CAQ',
          'CO-GUA',
          'CO-GUV',
          'CO-MET',
          'CO-PUT',
          'CO-VAU',
          'CO-VID'
      ]

-  When you have a ``GeoDataFrame`` that only contains the selected departments (let’s call it ``departments``), plot it to be sure it looks right.

-  Get its bounds (and thus, the desired raster extent) with:

   ::
      
      departments.total_bounds

Our **estimates of carbon stocks** are based on Alessandro Baccini’s layer of `Aboveground Live Woody Biomass Density <https://data.globalforestwatch.org/datasets/e4bdbe8d6d8d4e32ace7d36a4aec7b93_0/explore>`_ for the year 2000 at 30m resolution from Global Forest Watch. You can download the data by tile:

-  Follow the above link and click on one of the squares in the map (id)
-  Copy the URL (https://...) next to :gui:`Mg_ha_1_download` (unit: Mg/ha) or :gui:`Mg_px_1_download` (unit: Mg/px - you pick) and paste it in a new browser tab.
-  Download the resulting data to :file:`~/ee508/data/external/lab1/part5`
-  Download all four tiles that cover parts of the Colombian Amazon (if in doubt, compare with your ``departments`` map).
-  You will need a good Internet connection, as these files require several GB of space. You can delete them after you're done with the exercise.

Our estimates of forest loss will be based on Hansen’s layer of `Global Forest Change <https://glad.earthengine.app/view/global-forest-change#bl=off;old=off;dl=1;lon=20;lat=10;zoom=3;>`_:

-  In the `download section <https://storage.googleapis.com/earthenginepartners-hansen/GFC-2024-v1.12/download.html>`_, study the :gui:`Dataset Details`.

   -  We’ll need the ``lossyear`` raster, which contains the year of loss in each cell.
-  Scroll down to :gui:`Download Instructions`)
-  Download all tiles ("granules""") for the Colombian Amazon.

The forest loss raster files contain values that are either :input:`0` (no loss) or :input:`1`-:input:`24` (loss year). We’ll need to convert them to  :input:`1`/ :input:`0` values (loss/no loss).

-  Load each :gui:`lossyear` raster into a numpy array (using ``r_open``).
-  Set all values >0 to 1: ``r[r>0] = 1``
-  Save the raster with ``r_open`` (see :ref:`Lab 1.4 <lab1-4>`).

Next, we need to mosaic the rasters of each group into one that is limited to the study region (smaller). ``gdalwarp`` lets you mosaic (combine) and coarsen/reproject all tiles in one go with a single command, by passing not one, but *several* ``srcfile`` (source file) arguments.

::

   command = [
       'gdalwarp',
       # ... other argument strings ...
       from_filepath1,
       from_filepath2,
       from_filepath3,
       to_filepath,
   ]

If you do not want to enter filenames manually, you can create a list of file paths with a list comprehension, then add that list to the list of arguments.

::

   from_filepaths = [... your list comprehension goes here ...]
   command = [
       'gdalwarp',
       # ... other argument strings
   ] + from_filepaths + [to_filepath]


Make sure your forest stock and forest loss rasters have the exact same extent.

We’ll make several **simplifying assumptions**:

-  If a pixels is marked as "lost" by Hansen et all, all its carbon stocks are gone.

.. important::

   This is normally not a defensible assumption that holds up in a real-life policy making context (e.g. carbon credits). Every remote sensing data product is an estimate and has errors and biases. To estimate deforested areas correctly, a sampling-based approach is needed (`Olofsson et al. 2014 <https://www.sciencedirect.com/science/article/pii/S0034425714000704?casa_token=HrvNy_XWKPIAAAAA:MkdB-f9Y7YppOTOfu27HbEJwfEG3vX4DO7e3E4g-GfPtabrhRk3LfBzzLaO_muq8hKX-TSrlA7A>`_).

-  50% of biomass are carbon (C).

   -  If you want to convert to carbon dioxide emissions: carbon has a molar mass of 12, gaseous oxygen (O\ :sub:`2`\ ) of 32.
-  For the total carbon loss estimate, I will simplify the steps and let you report the total loss across *the entire bounding box* that contains the Amazon municipalities (i.e., the extent rectangle) instead of only the loss *inside* the polygons. Make sure you report which area the estimate refers to!

   -  If you want to limit the estimate to exact area of the municipalities, you can either use ``zonal_stats`` to get the sum of pixel values (or similar), or you can use ``gdal_rasterize.py`` with ``-burn 1`` (`docs <https://www.sciencedirect.com/science/article/pii/S0034425714000704?casa_token=HrvNy_XWKPIAAAAA:MkdB-f9Y7YppOTOfu27HbEJwfEG3vX4DO7e3E4g-GfPtabrhRk3LfBzzLaO_muq8hKX-TSrlA7A>`_) to create a raster that's ``0`` outside and ``1`` inside any polygon, and then use that layer as a mask (e.g. via multiplication: ``r_agb_lost * r_amazon_mask``).

-  If you need to convert between per-area and per-pixel biomass estimates (depends on your setup), for the purpose of this lab, it's okay to assume that the 27.83m x 27.64m pixel size at the Equator applies to the entire study area rectangle.

-  Trying to process the entire raster dataset in memory is possible, but only on machines with lots of memory. To be efficient, you will likely have to coarsen both rasters in a meaningful way, or to process the rasters tile-by-tile (ideally only for the relevant subregion, by adapting the extents accordingly).

.. caution::

   When coarsening, watch out for the loss of information that comes with coarsening integer rasters in integer space.

   If you coarsen an integer raster to compute averages of integer values (e.g. average loss within a larger pixel), make sure you clarify that this computation occurs in ``float`` (not ``int`` space). Otherwise, you get 0 and 1 values where you actually want percentages of pixels lost.

   If you use ``gdalwarp``, you can force the output type to be float using the argument``-ot Float32`` (see `documentation <https://gdal.org/en/stable/programs/gdalwarp.html#cmdoption-gdalwarp-ot>`_).


I let you take it from here!


.. note::

   On Windows, I have found I need to turn off Dropbox' automatic synchronizing / updating. Otherwise, I run into errors while running :gui:`gdal` commands.

.. only:: updates_2026

   Lots of students struggle with this.

   -  Almost nobody can process the entire raster dataset in memory, so they are either coarsening or processing them tile-by-tile.

      -  Provide basic instructions for either.

   -  Students are definitely confused about working with the pixel values correctly (what's in the raster). A visual that shows how rasters are transformed through coarsening might be very helpful. And perhaps an introduction to rasters, clarifying what each pixel refers to.

   -  Perhaps you can recommend (``r > 0``) for the conversion of the raster.


.. only:: solution

   157,453,552 hectares in Amazon box.

   59,295,085 pixels were lost @ 0.00025 degree resolution.

   - 4,561,104 ha @ 27.83m / 27.64m
  
   - 5,336,558 ha @ 30m / 30m

   ---
   
   Per-pixel math:

   1,077,603,712 Mg biomass lost.

   538,801,856 Mg carbon lost.

   1,975,606,805 Mg (tons) of carbon dioxide.

   Assuming area lost: 4,561,104 hectares.

   Average carbon density on lost pixels: 236.259 Mg/ha

   ---

   Per-hectare math:

   Average carbon density on lost pixels: 238.428 Mg/ha

   Assuming area lost: 4,561,104 hectares.

   ~ 1,087,495,584 Mg biomass lost.

   ~ 543,747,792 Mg carbon lost.

   ~ 1,993,741,905 t CO2 emitted (if fully converted).

   Assuming area lost: 5,336,558 hectares.

   ~ 1,272,385,662 Mg biomass lost.

   ~ 636,192,831 Mg carbon lost.

   ~ 2,332,707,048 t CO2 emitted (if fully converted)


.. _lab1-5-option2:

Option 2: how do species distribution models affect conservation priorities?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This exercise builds on what we did in Lab 1.4 and asks:

.. admonition:: Question

   How would our results differ if we had attempted to account for the availability of habitat when we made the species distribution maps?

We will contrast the BioModelos maps we used in Lab 1.4 with the official maps of the **Red Book of Threatened Birds in Colombia** (Volume 1 and 2): they intersected estimated ranges with satellite-derived land cover maps to estimate the presence of habitat.

The books are an impressive piece of work. The PDFs are published by the Colombian Ministry of the Environment:

-  `Volumen I <https://www.minambiente.gov.co/wp-content/uploads/2021/10/Libro-Rojo-de-Aves-de-Colombia-Volumen-I.pdf>`_

-  `Volumen II <https://www.minambiente.gov.co/wp-content/uploads/2021/10/Libro-Rojo-de-Aves-de-Colombia-Volumen-II.pdf>`_

I have paper copies in my office that you are welcome to borrow (just ask).

.. admonition:: Deliverable

   A panel of three maps, arranged horizontally:

   -  A richness map based on the Biomodelos data, with the same specifications as the first map of Lab 1.4 (:file:`4_richness_equal_weights.png`), but considering only species for which we have both Biomodelos and Libro Rojo data.

   -  A richness map based on the Libro Rojo data (pixels with value :input:`2`, see below), with the same specifications as the first map.

   -  A map of the differences between the two maps.

      -  If you have ``numpy`` arrays ``r1`` and ``r2``, the difference is: ``r1 - r2``.
   
      :file:`~/ee508/reports/figures/lab1/5_bird_richness_biomodelos_vs_librorojo.png`


   -  A Jupyter notebook containing the code you used to generate the above deliverables.

      The notebook should be self-contained: a user with access to the respective data folders (which can contain products from any earlier exercise) can start a new Jupyter session, run all cells from your notebook, and will end up with the deliverables. Before you deliver this notebook, you **must demonstrate that the notebook is self-contained** by running it from top to bottom (:gui:`Kernel` > :gui:`Restart & Run All Cells...`). Once the run has finished successfully, save the notebook together with all cell outputs. The priority map panel needs to be visible below one of your notebook cells. This allows me to help you troubleshoot any issues in your code without extra delays.

      :file:`~/ee508/reports/figures/lab1/5_bird_richness_biomodelos_vs_librorojo_code.ipynb`


Getting started
---------------

You can find the Libro Rojo data in:

   `~/ee508/data/external/lab1/part5/libro_rojo_de_colombia.zip <https://drive.google.com/drive/folders/13vBbAqcBCd-OWBXh1O7GrYYTWTa423MY?usp=drive_link>`_


.. caution::

   **Please do not share this data.** The authors gave me permissions to use it in this class, but do not want it to be distributed publicly.


The Excel file :file:`Aves de Colombia Libro Rojo 2017.xlsx` contains all the information about the species in the first sheet. In particular, note that it helps you link species names to filenames in the two subfolders. You can read it into a ``DataFrame`` with ``pd.read_excel('path/to/file.xlsx')``

The distribution data is provided as :file:`.img` files, which can be read by ``rasterio`` in exactly the same way you read the :file:`.tif` files.

A first challenge is to prepare the data so that we can limit the comparison to the set of species that are contained in both datasets. If we keep the species identical, we can separate the effect of the spatial distribution modeling from differences in the sets of species that are covered.

We already know a few ways to find shared sets. Examples:

-  Use ``df1.join(df2)`` to join one ``DataFrame`` (``df2``) to another (``df1``) based on the values in their indices (recall you can set indices with ``.set_index()``). If you add the argument ``how='inner'``, the resulting ``DataFrame`` contains only index values that exist in both DataFrames.

-  Alternatively, you can use ``set`` operations to find the shared set of scientific species names, e.g.:

::

   shared_names = sorted(
       set(species_data1['speciesname']) & set(species_data2['speciesname'])
   )

This command obtains the columns to compare from each ``DataFrame``, casts each to a ``set``, and then uses the set operator ``&`` (intersection) to return the shared values of both sets.

Set operations (intersections, unions, differences) are very useful for comparing index values. Learn more about them `here <https://realpython.com/python-sets/>`_.

The ``sorted()`` command turns the ``set`` back into a (sorted) ``list``. You need a list, because pandas’ ``.loc`` accessor won't accept sets.

::

   species_data1.loc[shared_names]

The Libro Rojo rasters are in a different projection than the Biomodelos rasters.

The values in the Libro Rojo rasters have different meanings. We need to reclassify them:

-  **No data** or empty: this area is not covered by the analysis.

-  **0**: the species is believed to not exist in this location.

-  **1**: the species could exist here, but has no habitat, as determined by satellite-derived land cover maps (here: forest).

-  **2**: the species could exist here and has habitat.

-  **3**: cloud cover makes it impossible to verify if the species has habitat.

This is how the maps will look when you drag-and-drop them into :gui:`QGIS`:

.. image:: img/lab1-5_libro_rojo_map.png
   :alt: Example map from the Libro Rojo dataset
   :align: right


However, if you ``read()`` the raster into Python as a ``numpy`` array and plot it with ``ax.imshow()``, it will not display the same map.

-  Why? Plot the colorbar (as in previous labs) and look at the scale.

   If you read a raster with rasterio's ``.read()``, the nodata/empty values will not be automatically replaced with ``np.NaN`` (or other representations of "nothing here"). Instead, they will keep the default nodata value of the raster.

-  What is the nodata value of this raster? Recall ``r_file.nodata``.

   Knowing this allows you to change the value of nodata cells after reading the raster.

For the purpose of this analysis, we will only consider **2** as evidence of presence. To make the Libro Rojo maps comparable to the Biomodelos rasters, you will therefore need to reclassify the values in the raster array from ``nodata, 0, 1, 2, 3`` to ``0, 0, 0, 1, 0``.

One way to do this (after loading the raster ``r`` to a numpy array):

::

   # Set cells with values other than 2 to zero
   r[r!=2] = 0

   # Set cells that contain the value 2 to 1.
   r[r==2] = 1

If you want to double-check whether your conversion of cells has worked, you can plot a histogram of the raster values:

::

   plt.hist(r.ravel())

I let you take it from here!


Option 3: help the national cadaster agency of Guatemala (policy brief)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is a variation of Option 1.

You can help the Guatemalan cadaster agency (Registro de Información Catastral de Guatemala, RIC) map historical deforestation patterns inside `Aguateca <https://en.wikipedia.org/wiki/Aguateca>`_, a Mayan site and protected area in the tropical Petén department (home to Tikal, many Mayan sites, and astonishing biodiversity).

This ask for help comes from Juan Manuel Cano Herrera, a RIC official. We met when my PhD student Ana Reboredo Segovia and remote sensing colleague Eric Bullock completed a consultancy for the Inter-American Development Bank (IBD). The bank had given a grant to Guatemala for demarcating and clarifying land use rights in protected areas. Both IBD and RIC were curious whether these activities had affected deforestation patterns (answer: hard to say). I had the chance to meet and catch up with Juan Manuel in Guatemala city in January 2025.

Juan Manuel wrote on Sep 22, 2025, 10pm (less than 24h ago as I write this):

   .. container:: italic-section

      | Buena noche, Christoph.
      | Recientemente visité el Área Protegida Monumento Cultural Aguateca y, con mucha tristeza, tengo que contarte que el área está perdiendo cada vez más cobertura boscosa (te comparto fotos).
      | Quisiera pedirte tu ayuda: quiero evaluar la pérdida de cobertura desde 1995, año de creación del Área Protegida, hasta 2025, 30 años después.
      | Quizá me puedas ayudar, pues dispones de imágenes. Espero tus comentarios.

In English:

   .. container:: italic-section

      | Good evening, Christoph.
      | I recently visited the Protected Area Monumento Cultural Aguateca, and with great sadness I must tell you that the area is losing more and more forest cover (I’m sharing photos).
      | I’d like to ask for your help: I want to assess the loss of cover from 1995, the year the Protected Area was established, up to 2025, 30 years later.
      | Perhaps you can help me since you have access to imagery. I look forward to your thoughts.


.. image:: img/lab1-5_aguateca.jpg
   :alt: Fotos of recent deforestation in Aguateca.
   :align: right

.. admonition:: Deliverable

   -  A (raster-based) map of estimated loss of carbon due to deforestation in the *Monumento Cultural Aguateca* 2000-2024:

      :file:`~/ee508/reports/figures/lab1/5_carbon_stock_loss_aguateca.png`

   -  A barplot of deforested area, by year:

      :file:`~/ee508/reports/figures/lab1/5_carbon_stock_loss_by_year_aguateca.png`

   -  An estimate of total loss of carbon stock in the same region and period. Name the number and its unit, together with any explanations you'd like to add, in a brief summary:

      :file:`~/ee508/reports/figures/lab1/5_carbon_stock_loss_writeup_aguateca.docx`

   -  A Jupyter notebook containing the code you used to generate the above deliverables.

      The notebook should be self-contained: a user with access to the respective data folders (which can contain products from any earlier exercise) can start a new Jupyter session, run all cells from your notebook, and will end up with the deliverables. Before you deliver this notebook, you **must demonstrate that the notebook is self-contained** by running it from top to bottom (:gui:`Kernel` > :gui:`Restart & Run All Cells...`). Once the run has finished successfully, save the notebook together with all cell outputs. The carbon stock map needs to be visible below one of your notebook cells. This allows me to help you troubleshoot any issues in your code without extra delays.

      :file:`~/ee508/reports/figures/lab1/5_carbon_stock_loss_code_aguateca.ipynb`


Getting started
---------------

You can find the shapefile of the Monumento Cultural Aguateca here:

   `~/ee508/data/external/lab1/part5/monumento_cultural_aguateca.zip <https://drive.google.com/drive/folders/13vBbAqcBCd-OWBXh1O7GrYYTWTa423MY?usp=drive_link>`_

Refer to the instructions in `Option 1: how much forest carbon was lost in the Colombian Amazon 2000-2024?`_ for instructions on:

-  How to access Baccini et al.'s *Aboveground Live Woody Biomass Density* data
-  How to access Hansen et al.'s *Global Forest Change* data (2000-2024)
-  Simplifying assumptions for the carbon stock math.


Extending the timeframe
-----------------------

This is optional if you work alone (see note below).

-  If you want to go back to 1995, *Global Forest Change* (GFC) is not enough. You have to use the European Union's `Tropical Most Forest Deforestation layer <https://forobs.jrc.ec.europa.eu/TMF/data>`_ (TMF, 1990-2024). You can download them by tile in the `Data Download <https://forobs.jrc.ec.europa.eu/TMF/data#downloads>`_ section.

   -  Scroll down to "Download process" and click on the tile you need. Choose "Degradation year".

-  If you want to include the most recent observations (2025), neither GFC nor TMF are enough. Instead, you need to obtain the rasters from the `Global Analysis and Discovery (GLAD) Deforestation Alerts <https://glad-forest-alert.appspot.com/>`_, updated almost weekly:

   -  The :gui:`DOWNLOAD` menu link is in the upper right corner. Click on the tile you need.


.. important::

   Please write me a quick email (`chrnolte@bu.edu <mailto:chrnolte@bu.edu>`_) if you pick this project and let me know which change dataset (GFC, TMF, GLAD) you're most interested in. If there is more than one person interested, we can combine your powers: everyone can learn to work with a different dataset (GFC, TMF, GLAD) and you can author the policy brief to RIC as a group.