.. _lab1-4:

.. only:: notes

   Include an example to:

   -  Estimate areas (similar to Lab 1-5 carbon stocks)

   -  Explain coarsening, especially in integer space / averaging, ideally with an image.

   -  Separate the running ``Popen`` from the definition of the command. Some students didn't get that ``Popen`` is what **runs** the code.


Species priorities in Colombia: rasterio, numpy, and GDAL
=========================================================

This exercise will get us started with geo-referenced raster data (e.g., satellite images) in Python, including reading, writing, visualizing, stacking, and coarsening.


.. admonition:: Deliverables
   
   Four maps of species richness in Colombia, each assigning different weights (priority) to each individual species (equally, by threat, by rarity, etc.)


.. admonition:: Due date

   Friday, September 26, 2025, by 6:00 p.m.


Create a new notebook
~~~~~~~~~~~~~~~~~~~~~

Start Jupyter, open a new notebook (:file:`~/ee508/notebooks/lab1/lab1-4.ipynb`), and run the following ``import`` commands:


::

   %matplotlib inline
   import os
   import subprocess
   from pathlib import Path

   import geopandas as gpd
   import matplotlib.pyplot as plt
   import numpy as np
   import pandas as pd
   from rasterio import open as r_open
   from rasterio.plot import show as r_show


Adapt these filepaths to your filesystem:

::

   DIR_EE508 = Path("~").expanduser() / 'Dropbox' / 'Outputs' / 'Courses' / 'ee508'
   DIR_LAB1_1 = DIR_EE508 / 'data' / 'external' / 'lab1' / 'part1'
   DIR_LAB1_4_EXTERNAL = DIR_EE508 / 'data' / 'external' / 'lab1' / 'part4'
   DIR_LAB1_4_PROCESSED = DIR_EE508 / 'data' / 'processed' / 'lab1' / 'part4'
   DIR_FIGURES_LAB1 = DIR_EE508 / 'reports' / 'figures' / 'lab1'

   PATH_NE_COUNTRIES = (
       DIR_LAB1_1 / 'ne_10m_admin_0_countries_lakes' / 'ne_10m_admin_0_countries_lakes.shp'
   )


Get the species distribution data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We will be working with modeled species distribution raster data for 238 threatened, near-threatened, and range-restricted vertebrate species in Colombia. The data is produced by `BioModelos <http://biomodelos.humboldt.org.co/>`_, a collaborative mapping project led by the `Humboldt Institute <http://www.humboldt.org.co/es/>`_, Colombia's leading public biodiversity research institute.

The maps are created using `MAXENT <https://www.gbif.org/tool/81279/maxent>`_, a popular species distribution mapping algorithm. It predicts the likelihood that the species occurs in any given area by combining point observations of species with rasters of environmental variables (e.g., climate, land cover).

Download the :file:`.zip` archive of the maps (23.5M) from the EE508 drive:

   `/data/processed/lab1/part4 <https://drive.google.com/drive/u/0/folders/1W86fRgFqkIvUyBuCX7v_GvFcmWVaV5uB>`_

Unzip the archive. You get 238 :file:`.tif` files and a :file:`.csv` file.

You can find information about each species in :file:`species.csv`.

Read the :file:`.csv` table into a DataFrame called ``species``. Set the column ``'name'`` as its index.

Use ``.head()`` and ``.value_counts()`` to find out how many species there are of each ``class``, and what their threat ``category`` is, according to the International Union for the Conservation of Nature (IUCN).

The species in this dataset were selected because:

-  they had a BioModelos map (which means: sufficient observations to make a model) and because they were either
-  classified as threatened or near-threatened by IUCN or 
-  had a IUCN range size of less than 50,000km\ :sup:`2` (see column ``km2``).

In choosing these species, we're applying the same criteria as the researchers from whom we got the data:

   Velásquez-Tibatá J, Salaman P, Graham CH (2013) Effects of climate change on species distribution, community structure, and conservation of birds in protected areas in Colombia. *Reg Environ Chang* 13(2):235–248. `Download here <https://www.researchgate.net/profile/Jorge-Velasquez-Tibata/publication/257480429_Effects_of_climate_change_on_species_distribution_community_structure_and_conservation_of_birds_in_protected_areas_in_Colombia/links/00b49529f59ca1b78c000000/Effects-of-climate-change-on-species-distribution-community-structure-and-conservation-of-birds-in-protected-areas-in-Colombia.pdf>`_

.. admonition:: Question

   Can you find out how many species were included for which reason?

The function ``pd.crosstab`` allows you to cross-tabulate two variables (similarly to ``table`` in R):

::

   pd.crosstab(species['category'], species['km2'].lt(50000))

The variable ``category`` is ordered alphabetically because it is provided as strings. You can let ``pandas`` know that these strings have a sorting order, ranking from most to least threatened. You do so by casting the ``Series`` into a different variable type: an "ordered ``Categorical``\ ". After that, the output of ``pd.crosstab`` will be automatically ordered as well:

::

   species['category'] = pd.Categorical(
       species['category'],
       ordered=True,
       categories=['CR', 'EN', 'VU', 'NT', 'LC', 'DD'],
   )
   pd.crosstab(species['category'], species['km2'].lt(50000))


Note, for instance, that there are no "least concern" (LC) species with a IUCN range larger than 50,000 km2 in the dataset. (I excluded them.)

Instead of having to count the numbers of each category, you can also answer the question by turning the ‘category’ series into a True/False condition:

::

   pd.crosstab(
       species['category'].isin(['CR', 'EN', 'VU', 'NT']), species['km2'].lt(50000)
   )

Explore the raster data
~~~~~~~~~~~~~~~~~~~~~~~

Choose a species whose modeled distribution you'd like to visualize. For simplicity, we'll pick the first one here. If you're curious, Google a few species names and pick one that you are particularly curious about (check out *Saguinus leucopus*).

``rasterio``'s ``open`` command, which we imported at the beginning and gave the alias ``r_open``, creates a **file connection** to a raster file. You need this file connection for reading and writing. It allows you to get (in the case of reading) or define (in the case of writing) vital information about the raster file, such as projection information, resolution, format, compression, etc.

::

   r_file = r_open(DIR_LAB1_4 / 'Aburria_aburri_10p_cut.tif')

What do the following commands return?

-  ``r_file.crs``
-  ``r_file.shape``
-  ``r_file.bounds``
-  ``r_file.count``

Guess their meaning or look it up in the ``rasterio`` documentation: `Python Quickstart <https://rasterio.readthedocs.io/en/latest/quickstart.html>`_.

Also note ``r_file.transform``. This is an ``Affine`` object: a special kind of matrix that maps the coordinates of each raster pixel (by column and row; starting in the upper left corner) to the coordinates of the projection (x, y).

-  For instance, you can obtain the coordinates (here: in lat / long) of any pixel for which you know the row and column number:

   ::

      r_file.transform * (col, row)

.. admonition:: Question

   What are the CRS coordinates of the upper left pixel ``(0, 0)``?


The raster file has only one band (see ``r_file.count``). The command ``.read(n)`` reads the n-th band.


::

   r = r_file.read(1)

What kind of object is ``r``? Find out: ``type(r)``

``r`` is a ``numpy`` array (the class is called ``ndarray``). ``numpy`` is the most popular package for matrix computing in Python. It is the standard for processing arrays and matrices. It also forms the backbone of ``pandas`` and ``geopandas``. ``numpy`` has a vast library of functions, which you can apply to any raster array imported via ``rasterio``.

.. caution::

   The imported numpy array ``r`` does not have any CRS information, nor any other information needed to geo-locate the array correctly (extent, resolution, etc.).

   The geospatial metadata always has to be obtained from the ``rasterio`` file connection (``r_file``).


What dimensions (numbers of columns and rows) does ``r`` have? Find out with ``r.shape``. 

-  This should enable you to compute the number of pixels in the raster.

A little Python trick: ``r.shape`` returns a tuple with two entries. You can use a tuple (or lists or sets) to set multiple variables at once:

::

   n_rows, n_cols = r.shape
   n_rows * n_cols

.. tip::

   Did you know? You can use ``a, b = b, a`` to switch the values of two variables.

.. caution::

   Pay close attention to the order in which different packages return or expect the two dimensions of a grid (x and y, rows and columns).

   By default, ``r.shape`` returns the length of the axis in the **row / column** order ``(n_rows, n_cols)``. This is common in matrix computation. Note that this is also the order in which ``pandas.loc`` receives the dimensions (index, columns).

   In contrast, the order expected by most plotting functions is **first x, then y**. This is also the order expected by the above transformation matrix.

   Mixing up the two dimensions can happen **very easily**. Watch out for this issue, and always double-check your work.

What is the distribution of values in the raster? Inspect the histogram:

::

   plt.hist(r.ravel(), bins=50)


-  ``.ravel()`` returns a 1-dimensional ("flattened") view of a ``numpy`` array. It is needed because our raster data is 2-dimensional), but ``.hist()`` expects a 1D sequence of values.

Visualize the raster data
~~~~~~~~~~~~~~~~~~~~~~~~~

Let's make a map of the raster you just imported. You have two options:

-  Using ``rasterio``'s own ``show`` function (which we imported above as ``r_show``):

   ::

      r_show(r)

-  Or using ``matplotlib``'s ``ax.imshow`` command:

   ::

      fig, ax = plt.subplots()
      ax.imshow(r)

Both functions should show exactly the same map, because ``r_show`` also calls ``ax.imshow`` internally. ``r_show`` has the nice property that passing a transformation matrix will display the correct coordinates. This is good for quick mapping:

::

   r_show(r, transform=r_file.transform)

The same can be achieved by passing the raster extent (the coordinates of its outer boundaries) to ``ax.imshow``. Doing so requires reading and reordering the extent of the bounding box info provided by ``r_file``:

::

   xmin, ymin, xmax, ymax = r_file.bounds
   fig, ax = plt.subplots()
   ax.imshow(r, extent=(xmin, xmax, ymin, ymax))


.. caution::

   Once more, pay attention to the order in which different packages return or expect the four bounds of a raster extent (``xmin, xmax, ymin, ymax`` or ``left, right, bottom, top``). The first and last are usually in the same position, but the middle ones often switch positions.

   ``r_file.bounds`` returns the four bounds in the order ``xmin, ymin, xmax, ymax``.

   The ``extent`` argument of ``ax.imshow`` expects the four bounds in the order ``xmin, xmax, ymin, ymax``

   Again, it's very easy to mix up these different notations. Watch out for this issue.


You can now modify and save this plot like any other ``matplotlib`` plot.

-  Pass a different colormap to ``imshow`` with the ``cmap`` argument.

-  Change the size: ``fig.set_size_inches(15, 15)``

-  Add a title: ``ax.set_title('Aburria aburri', pad=15, fontsize=25)``


You can also add a colorbar legend. To do so, you need to save the ``AxesImage`` object returned by ``ax.imshow`` and pass it on to ``plt.colorbar()``:

::

   img = ax.imshow(r, extent=(xmin, xmax, ymin, ymax))
   plt.colorbar(img)


Using what you learned in :ref:`Lab 1.3 <lab1-3>`, import the boundary of Colombia from the Natural Earth countries layer and plot it on top of the distribution map, so that the boundaries are visible against the raster image but don't hide it (no fill).

-  Good news: the vector layer and the raster already have the same CRS, so we don't need to worry about reprojection at this stage.


Because the boundary of Colombia also includes its Caribbean islands (the species raster layers do not cover the islands), ``matplotlib`` automatically extends the upper boundaries of the plot.

You need to reset the extent of the plot to the raster extent by passing the correct end points:

::

   ax.set_xlim(xmin, xmax)
   ax.set_ylim(ymin, ymax)

The first line is actually not needed here, because the addition of Colombia only affected the y-axis.


Assessing memory usage
~~~~~~~~~~~~~~~~~~~~~~

How much space does the file we just imported require on your hard drive? Find out in your :gui:`Explorer` / :gui:`Finder`.

In contrast, how much memory does the loaded raster need (as the current ``numpy`` array)? You can find out the latter with the command ``r.nbytes``.

Given what you know about the number of pixels in this raster, can you find out how many bytes of memory ``numpy`` needs for every single pixel?

The loaded raster needs more than 50x as much space in memory than on the hard drive! Why is that? Two reasons:

-  The saved raster file was compressed. The compression works well here, because most of the values are ``0``, and the remainder are ``1``. Whenever you have large contiguous areas with the exact same value in a raster, compression tends to reduce file size considerably.

-  The loaded raster file uses a data type that consumes a lot of space. Which data type? Find out: ``r.dtype``

You can save memory by casting the raster into a data type that doesn't require 8 bytes per pixel. Given that all values are ``1`` or ``0``, you could cast it into a ``bool`` or ``byte`` data type, which should consume far less memory:

::

   r.astype(np.bool).nbytes

In this exercise, we will be using fractions of numbers, so let's keep our ``numpy`` arrays in a ``float`` datatype that needs only four bytes per pixel.

::

   r = r.astype(np.float32)
   r.nbytes

.. note::

   ``np.float16`` would only require 2 bytes per pixel. However, some versions of ``matplotlib`` throw an error if ``imshow`` is called with rasters of datatype ``np.float16``. If it works for you, feel free to use it.

.. admonition:: Question

   How much space would we need if we loaded all 238 rasters to memory, and casted them to a ``np.float32`` data type, given that they all have the same dimensions? Calculate that number.

Oops, that's quite a bit of memory.

We have to coarsen the rasters before we run zonal statistics, so we don't spend too much time waiting for those rasters to process. In the process, we'll learn how to use the powerful **GDAL** command line tools.


Coarsen rasters with GDAL
~~~~~~~~~~~~~~~~~~~~~~~~~

Coarsening a raster is a common geospatial operation. As with most common operations, we have multiple options to get it done. For instance, we can use :gui:`QGIS` to do the reprojection. However, when :gui:`QGIS` reprojects a raster, it does so by calling GDAL. Instead of going through :gui:`QGIS`, we can call GDAL directly.

`GDAL <https://gdal.org/en/stable/>`_ is the **Geospatial Data Abstraction Library**, the core data access and translation library that enables most spatial open-source software to read, write, and transform a vast range of raster and vector formats through a unified interface. It underpins tools like QGIS, Google Earth Engine, R and Python geospatial packages (e.g. ``rasterio``, ``geopandas``), and even parts of ArcGIS. Virtually all geospatial file reading/writing in Python uses GDAL under the hood.

GDAL's fast C/C++ library (``libgdal-core``) is already part of your ``ee508`` environment, because the other spatial packages need it. You can therefore use GDAL from the command line (:gui:`Terminal` or :gui:`Anaconda Prompt`) when your ``ee508`` environment is activated.

Look up GDAL's `"Traditional applications" <https://gdal.org/en/stable/programs/index.html#traditional-applications>`_ to get an overview of the range of raster utilities that you have at your disposal.

In `Raster programs <https://gdal.org/en/stable/programs/index.html#raster-programs>`_, skim the explanations for the following popular tools. Aim for a brief overview of what each tool can do. There's no need yet to understand the dozens of arguments:

-  ``gdalwarp``
-  ``gdal_rasterize``
-  ``gdal_merge``
-  ``gdal_proximity``

GDAL not only processes raster data. It also has powerful `vector programs <https://gdal.org/programs/index.html#vector-programs>`_. For instance, check out the documentation for `ogr2ogr <https://gdal.org/programs/ogr2ogr.html>`_.

``gdalwarp`` lets us reproject ("warp") rasters. Review its `documentation <https://gdal.org/programs/gdalwarp.html>`_. Try to figure out what the following options mean. We'll use them in a minute!

-  ``-t_srs``
-  ``-te``
-  ``-tr``
-  ``-r``
-  ``-of``
-  ``-co``
-  ``-overwrite``
-  ``srcfile*``
-  ``dstfile``


.. admonition:: New in GDAL 3.11

   GDAL is transitioning the syntax of its tools from the "Traditional applications" to the `"gdal" application <https://gdal.org/en/stable/programs/index.html#gdal-application>`_.

   The latter has a sleeker syntax similar to popular developer tools (``git``, ``docker``, etc.):

   -  ``gdalwarp`` > ``gdal raster reproject``
   -  ``gdal_rasterize`` > ``gdal raster rasterize``
   -  ``gdal_merge.py`` > ``gdal raster merge``
   -  ``gdal_proximity.py`` > ``gdal raster proximity``

   This change started with GDAL 3.11. However, QGIS still uses earlier versions of GDAL, and so does the EE508 environment (mine resolves to ``libgdal-core=3.10.3``). Therefore, this lab sticks with the older syntax.

   You are welcome to install a separate environment that supports GDAL 3.11+ to play with the new syntax.


Run GDAL in QGIS
----------------

If you want to understand how a fully written GDAL command looks like, you can run the same operation through QGIS and inspect the GDAL command that QGIS uses. Let's do this for ``gdalwarp``:

-  Open QGIS and start a New Project.

-  Drag-and-drop the same raster file we just worked with into the new project.

-  Menu > :gui:`Raster` > :gui:`Projections` > :gui:`Warp (Reproject)`

-  In :gui:`Input layer`, select the raster you just imported.

-  Set the :gui:`Resampling method to use` to :gui:`Average`.

-  In :gui:`Output file resolution in target georeferenced units`, choose :input:`0.025`

-  Open :gui:`Advanced parameters`, if it's not already open.

-  In :gui:`Georeferenced extents of output field to be created [optional]`, enter: :input:`-79.05, -66.75, -4.25, 12.5`. Note that these are in the order ``xmin, xmax, ymin, ymax``.

.. note::

   The coordinates form a rectangle around Colombia's boundaries, leaving a tiny buffer around the edges.

   If you want to come up with your own set of coordinates for a raster image, you can import the NaturalEarth country shapefile into :gui:`QGIS` and zoom to Colombia. Notice how the coordinates of your mouse cursor are displayed in the bar below the map. This allows you to quickly identify a suitable extent for the rectangle you want to display

-  You can also select the :gui:`CRS` of the target raster extent: :gui:`EPSG:4326`.

   -  This is optional: as we're not changing the CRS, the input file CRS will be used by default.

-  In :gui:`Reprojected`, select a filename to save the result to.

Note that, as you are going through these steps, QGIS starts to build a :gui:`GDAL/OGR console call` at the bottom of the dialog box. This is exactly the command that QGIS sends to GDAL for execution. Examine how ``-tr``, ``-r``, ``-te``, and -``of`` are set. Also note the relative position of the input and output file paths.

-  :gui:`Run` the command.

You can now see a new (coarser) reprojected raster in your QGIS map.

In the :gui:`Log` window, QGIS displays the command that you just executed. It should start with ``gdalwarp -overwrite -tr 0.025 0.025 -r average -te -79.05 -4.25 -66.75 12.5 -te_srs EPSG:4326 -of GTiff``, followed by two filenames.

.. caution::

   Look closer.

   Recall the different ordering conventions for the bounds of a raster / image / figure?

   Here we have another case of this issue: the :gui:`QGIS` dialog box receives the extent coordinates in a different order than ``gdalwarp``: the second and third argument have switched sides. This is very easy to miss.

-  Copy the entire ``gdalwarp`` command, including the filepaths (right-click > :gui:`Copy`).


Run GDAL in the Terminal
------------------------

You can take the GDAL command you just copied and run it verbatim in your conda environment (optional).

-  Delete the coarsened file you just created (overwriting is still off).

-  Open a new :gui:`Terminal` (or :gui:`Anaconda Prompt`) window.

-  Activate your Anaconda environment in that window.

-  Copy and paste the above GDAL command into the :gui:`Terminal`.

-  Run it. You should get the same output as you got through QGIS.

When QGIS reprojects a raster, it just calls GDAL!

That means we might as well skip QGIS and call GDAL directly.

This will come in handy when we need to coarsen **hundreds** of rasters. We can let Python build a command for each file, and run it as a independent process.

Python's ``subprocess`` package allows us to run programs in a way similar to how you would run them from the :gui:`Terminal`. We can call its function ``Popen`` to execute processes.

``Popen`` expects a command to be passed as a list of strings, with each string representing the parts of the command that are otherwise separated by spaces. For instance, ``'conda activate gis'`` becomes ``['conda', 'activate', 'gis']``.

Now you have all the information to run a raster reprojection from Jupyter.

Read this code and make sure to understand it. Then try it out.

::

   from_path = DIR_LAB1_4 / 'Aburria_aburri_10p_cut.tif'

   to_path = DIR_LAB1_4_PROCESSED / 'Aburria_aburri_coarse.tif'
   
   # Create output directory if it doesn't exist.
   if not to_path.parent.exists():
       to_path.parent.mkdir()

   # Build GDAL command
   command = [
       'gdalwarp',
       '-tr',
       '0.025',
       '0.025',
       '-te',
       '-79.05',
       '-4.25',
       '-66.75',
       '12.5',
       '-r',
       'average',
       '-of',
       'GTiff',
       '-overwrite',
       from_path,
       to_path,
   ]

   proc = subprocess.Popen(
       command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True
   )

   for line in proc.stdout:
       print(line, end='')

   proc.wait()


GDAL will run the raster reprojection and print out status updates below the Jupyter cell (because we added ``stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True``).

If the command executes successfully, the last command (``.wait()``) will return a ``0``. If it fails, it will return a ``1``.

.. note ::

   Did you notice? The command we passed doesn't include all of the parameters that QGIS used.

   This is because GDAL doesn't actually require passing the CRS (``-t_srs`` and ``-te_srs``), if we stay in the same CRS as the original raster file.

   ``_srs`` stands for Spatial Reference System, an alternative way to refer to a CRS.

The added option ``-overwrite`` ensures that you do not have to delete the output files before you re-run the command.

How large is the GeoTIFF (:file:`.tif`) file we just created? Open your :gui:`Finder` or :gui:`Explorer` to find out.

The file is **larger** (2.51M) than the original one (889K)! That's because ``gdalwarp`` doesn't compress files by default.

To compress the outputs, we have to add another argument to the ``gdalwarp`` command list (e.g., before ``from_path``).

::

   '-co',
   'compress=LZW',

What does this parameter do? Look up the ``-co`` argument in the `gdalwarp <https://gdal.org/en/stable/programs/gdalwarp.html#cmdoption-gdalwarp-co>`_ documentation to find out.

Run the command with the new parameters added.

How large is the file you just created? It should be much smaller than the previous one.

Open it with ``rasterio`` or in QGIS. Does it look good?

Fantastic.


Coarsen all rasters
-------------------

You now have all the tools at your disposal to coarsen the 238 rasters. Go ahead and code this up on your own. A few tips:

-  Probably the most straightforward approach is to create a ``for`` loop that iterates through ``species.index`` (e.g., ``for name in species.index:``)

-  You need to create the correct filenames in each iteration of the loop. If you used the original species names, they still have spaces (``' '``) where the filename has an underscore (``'_'``). Here's one way to build the filename dynamically:

::

   name_underscore = name.replace(' ', '_')
   from_path = DIR_LAB1_4 / f'{name_underscore}_10p_cut.tif'


-  Because running the full loop take a while (1-2 minutes), you probably want to first test the loop on a subset of rasters until you're happy with it (``for name in species.index[:5]:``). If you want to make sure your loop is still running, you can add a ``print`` statement to the beginning of the loop (e.g., ``print('.', end=''))``

-  Once you are done, you should have 238 coarsened raster files in your processed data folder, all of which are exactly in the same projection and transformation. This makes it easy and quick to do the next steps.


Create maps of weighted species richness
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Sum raster values pixel-by-pixel
--------------------------------

Once you have imported rasters as arrays, aggregating them (e.g., adding them) is easy.

This code reads and adds all coarsened rasters:

::

   r_sum = None

   for name in species.index:

       # See progress
       print('.', end='')

       # Define the path of the coarsened raster
       path = DIR_LAB1_4_PROCESSED / (name.replace(' ', '_') + '_coarse.tif')
       
       # Read the first band of the raster and cast it to float32
       r = r_open(path).read(1).astype(np.float32)
       
       if r_sum is None:
           # First iteration of `for` loop: initialize r_sum with r
           r_sum = r
       else:
           # Later iterations: add new raster array to cumulative sum
           r_sum = r_sum + r


An alternative approach is to initialize an empty raster with the correct shape. The code below assumes that you already have some raster ``r_ref`` with the right ``shape`` (tuple of numbers of rows and columns) in memory. If you don't, just load one of the coarsened rasters with rasterio's ``read()``.

::

   import numpy as np

   # Initialize empty raster with same shape as raster array `r_ref` (loaded)
   r_sum = np.zeros(r_ref.shape)

   for name in species.index:

       # See progress
       print('.', end='')

       # Define the path of the coarsened raster
       path = DIR_LAB1_4_PROCESSED / (name.replace(' ', '_') + '_coarse.tif')
       
       # Read the first band of the raster and cast it to float32
       r = r_open(path).read(1).astype(np.float32)

       r_sum += r


Note the last lines: ``r_sum + r`` is an elegant (short and sweet) ``numpy`` notation to add the values of one array to that of another cell-by-cell (and here, therefore: pixel by pixel).

Visualize the array ``r_sum`` with ``ax.imshow()``. Does this look like a map of species richness (for threatened, near-threatened and range-restricted species)?

If you would like to see a histogram of the (pixel) values of the array, you can pass a flattened (i.e., one-dimensional) version of your array to matplotlib's ``hist`` function: ``plt.hist(r_sum.ravel())``


Save the richness raster
------------------------

When you create a new raster file using ``rasterio``, you need to pass all the parameters required to define and geo-reference a raster dataset: file format, height, width, number of bands, data type, CRS, transformation, and compression.

-  Height, width, number of bands, and data type is also metadata of the ``numpy`` array you want to save.

-  We can copy CRS and the transformation from a reference raster that already has the geospatial metadata we need (CRS, transformation). I will use ``path`` to refer to the path to a reference raster.

   -  Note that, right after running the above loop, the name of the last coarsened raster is still saved in the object ``path``, and it has the correct geospatial metadata. We can use it *as is*:

::

   out_path = DIR_LAB1_4_PROCESSED / 'richness' / 'richness_equal_weights.tif'
   
   r_file_ref = r_open(path)
   
   r_open(
       out_path,
       'w',
       driver='GTiff',
       height=r_sum.shape[0],
       width=r_sum.shape[1],
       count=1,
       dtype=r_sum.dtype.name,
       crs=r_file_ref.crs,
       transform=r_file_ref.transform,
       compress='lzw',
   ).write(r_sum, 1)


Apply alternative weighting formula
-----------------------------------

What if we weighted each species differently, as a function of its threat status or the size of its range? You can find out by modifying your above code just slightly. Simply multiply each imported raster (numpy array) by a (constant) species-specific weight (e.g. ``r *= 10``) before adding it to ``r_sum``.

First, create an aggregate raster where each species is assigned a weight as a function of its threat status, as follows:

   +-------------------------------------------+--------------------------+
   | IUCN Status                               | Weight                   |
   +===========================================+==========================+
   | CR                                        | 10                       |
   +-------------------------------------------+--------------------------+
   | EN                                        | 4                        |
   +-------------------------------------------+--------------------------+
   | VU                                        | 1                        |
   +-------------------------------------------+--------------------------+
   | NT                                        | 0.5                      |
   +-------------------------------------------+--------------------------+
   | LC                                        | 0                        |
   +-------------------------------------------+--------------------------+
   | DD                                        | 0                        |
   +-------------------------------------------+--------------------------+

-  Remember that, in the ``for`` loop, you can obtain information for each species with ``species.loc[name]['variable_name']`` or ``species.loc['name', 'variable_name']``.

-  You can set those species weights in the ``for`` loop with multiple ``if`` statements. However, more elegantly, you can create an additional column in ``species`` that contains the species weights. Figure out how to do this by looking up the help page for `pandas.Series.map <https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html>`_ and notice how you can replace values with a dictionary. Once you have added that column, you can obtain the weight within the ``for`` loop with ``species.loc[name, 'name_of_weight_column']``.

-  Make sure you understand each step of the above ``for`` loop well enough before running it. Sometimes, students forget to weigh the first raster.

How does the raster look - very different from the previous one?

Second, create an aggregate raster where each species is assigned a weight as a function of its IUCN range. Use the inverse of the range area (1 divided by the range area in km2) as weights.

-  How does the map look? Are there any new locations highlighted that weren't quite as prominent in the previous maps?

.. note::

   This measure of rarity-weighted species richness is very similar to the one used by Jenkins et al. (2015). However, we're not considering baseline protection. There are good spatial protection datasets for Colombia, though, in case you want to pursue a more in-depth study of Colombia as part of your project.

Finally, create an aggregate raster where each species is assigned a weight inversely proportional to the size of its range *as modeled by BioModelos within our raster extent*.

-  A simple way to do this is to divide each raster by the mean of its cell values, e.g.: ``r = r / r.mean()`` or ``r /= r.mean()``.

-  How does the map look? Are there any new locations highlighted that weren't quite as prominent in the previous maps?

If you did everything correctly, there should be one site that really stands out. `This article <https://www.theguardian.com/environment/2013/nov/14/colombia-misty-mountain-nature-reserve>`_ might tell you a bit more about this special place. It is based on an analysis, not too dissimilar from ours, that was published in **Science** in 2013.


Wrap up
~~~~~~~

Time to deliver! Use ``ax.imshow()`` and the ``matplotlib`` commands from previous exercises to save a high-resolution map of each raster. 

-  Make sure each map has a meaningful title and shows the coordinates in projection units. Use the colormap ``'Spectral_r'``. Add a colorbar. Add the boundaries of continental Colombia in a way that is visible against the rasters but does **not** extend the plotting extent to the Caribbean. Save the figures under the following names:

   :file:`~/ee508/reports/figures/lab1/4_richness_equal_weights.png`
   :file:`~/ee508/reports/figures/lab1/4_richness_weighted_by_threat.png`
   :file:`~/ee508/reports/figures/lab1/4_richness_weighted_by_range_iucn.png`
   :file:`~/ee508/reports/figures/lab1/4_richness_weighted_by_range_biomodelos.png`

Do not delete the 238 coarsened raster files. We will need them later. If you need space, you can now delete the original rasters (226M unzipped).

Write up a short summary (~300 words) of your results: how does the analysts' choice of weighting and species data affect which areas are identified as priorities?

   :file:`~/ee508/reports/lab1/4_writeup.docx`

Copy all five deliverables into the same folder and compress them into a :file:`.zip` archive.

Find the Assignment with the lab title on the `Blackboard course website <https://learn.bu.edu/ultra/courses/_264297_1/outline>`_ and upload your :file:`.zip` archive.


**Congratulations!**

You made substantial progress in your knowledge of raster processing in Python.

You used ``rasterio`` to import, save, and visualize rasters, learned how to use the :gui:`GDAL` command line utilities, and used simple ``numpy`` additions and multiplications to create aggregate rasters.

These skills have now given you the ability to quickly verify how species priority maps can change as a result of changes in weights.


.. admonition:: How good are these distribution maps?

   It is hard to tell without digging further into the methods and data behind each distribution model.

   As many distribution maps, they might be biased towards locations where ecologists and other nature lovers like to travel to collect data.

   Because some locations might be difficult to access (for reasons of safety or missing infrastructure) or just less pleasant to visit, species presence is often systematically underreported in remote or challenging landscapes.