.. EE 508 / DS 537: Data Science for Conservation Decisions
.. _lab2-1:
Getting to know Marxan: the case of Tasmania
============================================
`Marxan `_ is one of the most widely used conservation planning software tools globally.
It helps solve a classic conservation planning problem.
.. admonition:: Question
How can I create a protected area network that **represents every species** (e.g. by minimum range area, habitat area, population size, or similar) while minimizing the **opportunity cost** of conservation (disallowing benefits from alternative private or public uses)?
Mathematically, this is the **minimum set** problem. It consider **complementarity**: the only acceptable solutions are those that cover all species targets.
Marxan finds **near-optimal** solutions to the minimum set problem using `simulated annealing `_, an optimization algorithm inspired by the physical process of cooling metals. Simulated annealing searches for a global optimum by allowing both improvements and occasional "worse" moves, which help escape local optima. Over time, the probability of accepting worse solutions decreases, mimicking gradual cooling, and guiding the system toward a near-optimal or optimal solution.
In Lab 2, we will learn to implement a complete Marxan conservation planning workflow. First, we get to know how Marxan works, using a prepared dataset for Tasmania. Then we build our own Marxan dataset for Colombia, conduct a conservation planning analysis, and test the sensitivity of our results to relevant assumptions. On the way, we learn new functions for spatial data processing, e.g. how to access individual geometries of a ``GeoDataFrame`` for geoprocessing.
.. admonition:: Deliverable
Choropleth map of selection frequency of conservation planning units in Tasmania, generated with ``geopandas``.
.. admonition:: Due date
Tuesday, October 7, 2025, by 6:00 p.m.
Learn how Marxan works
~~~~~~~~~~~~~~~~~~~~~~
Before you start this lab, read the Introduction of the `Marxan User Manual `_.
- Within as little text as necessary, Sections 1.1 - 1.8 explain key terms, the minimum set problem, the math of the objective function / score calculation, and limitations of Marxan. This is required knowledge to interpret the lab results and do well on the midterm.
- If you need a reference or prefer a different introduction, you can refer to the author of Marxan himself: `Ball et al. (2009) Marxan and relatives: Software for spatial conservation prioritization `_
Check out the official Marxan website (`marxansolutions.org `_) to learn where and how Marxan is used.
- Explore conservation planning case studies using Marxan around the world:
`https://marxansolutions.org/community `_.
If you find one that interests you, perhaps you can request access to their data for your :ref:`project `?
- Play the conservation planning game at the bottom of the "Getting Started" page. How long does it take you to find a solution that comes within $2000 of Marxan's?
`https://marxansolutions.org/getting-started `_
It might seem silly at first, but it brings home exactly how Marxan works (and why it beats a human search). I recommend it.
.. _lab2-1-marxan-inputs:
Examine Marxan's inputs
~~~~~~~~~~~~~~~~~~~~~~~
If we want to use Marxan for a new conservation planning project, we first need to understand how Marxan datasets are structured.
Marxan expects the input data to be provided in a very specific format (and, unfortunately, is sensitive to mistakes).
Let's examine a dataset from a Marxan case study of Tasmania.
Download a copy of the Tasmania dataset from the class folders:
`~/ee508/data/external/lab2/tasmania.zip `_
Unpack the archive to a folder of your choice. What filenames and folders does it contain?
Most of the data is stored in text files with the extension :file:`.dat`. You can open and edit them with any :ref:`plain-text editor `. You can also read the data tables into a ``DataFrame``.
Launch your ``ee508`` environment and open a new Jupyter notebook. Import ``pandas``, ``geopandas``, and ``matplotlib.pyplot`` to their conventional aliases. Let ``DIR_TASMANIA`` be the ``pathlib.Path`` to your Tasmania dataset.
Examine the contents of the configuration file, :file:`input.dat`.
- It's easiest to open it in a :ref:`plain-text editor `. This also allows you to edit the parameters manually.
- Alternatively, use this code snippet to open the file, read its content, and display it:
::
with open(DIR_TASMANIA / 'input.dat') as file:
print(file.read())
You will note that :file:`input.dat` has a list of input parameters. Parameter names are written in all-caps letters, followed by their value. Marxan will ignore lines without all-caps letters in this file.
We will keep most of the parameters at their default. Note ``BLM``, ``NUMREPS``, and ``Input Files``. Note that all filenames and directories in the dataset are listed in this file. Find ``bound.dat`` and ``pu.dat``. For a full list of the parameters and their meaning, refer to the Marxan Manual.
Note that ``NUMREPS`` is set to 10. We'll want more repetitions than that. To change the value, open :file:`input.dat` in a :ref:`plain-text editor ` and set that value to 100. You can also use the function ``change_marxan_parameters()`` provided in the `Appendix`_ to change Marxan parameter values from Python.
Let's look at the input files. These are comma-separated value (CSV) text files, although they lack the :file:`.csv` extension. We can import them as ``pandas`` DataFrames:
::
pd.read_csv(DIR_TASMANIA / 'input' / 'pu.dat')
pd.read_csv(DIR_TASMANIA / 'input' / 'spec.dat')
pd.read_csv(DIR_TASMANIA / 'input' / 'puvspr.dat')
pd.read_csv(DIR_TASMANIA / 'input' / 'bound.dat')
The columns in the imported tables have the following meaning.
.. list-table::
:header-rows: 1
:widths: 26 72 100
* - **Filename (Parameter)**
- **Table**
- **Column names and meaning**
* - :file:`pu.dat`
(PUNAME)
- Planning Unit (polygon)
- **id**: Planning unit ID
**cost**: Cost to protect this planning unit
**status**: Protection status
- 0 if currently unprotected
- 2 if already protected
- 3 if impossible to protect
* - :file:`spec.dat`
(SPECNAME)
- Species (or Conservation Features)
- **id**: Species ID
**prop**: Protection target: proportion of species presence to be protected (as measured by **amount** in :file:`puvspr.dat`)
**spf**: Species penalty factor (for not achieving target)
**name**: Name of species / feature
* - :file:`puvspr.dat`
(PUVSPRNAME)
- Species presence in each planning unit (*ordered by planning unit ID*)
- **species**: Species ID
**pu**: Planning unit ID
**amount**: Amount of species presence in planning unit (many units are possible: range, size, presence/absence)
* - :file:`bound.dat`
(BOUNDNAME)
- Shared boundary
- **id1**: ID of first planning unit
**id2**: ID of second planning unit
**boundary**: length of boundary shared by the two planning units
The dataset comes with a shapefile of planning units: :file:`pulayer/pulayer.shp`, and an outline of Tasmania: :file:`pulayer/outline.shp`. These shapefiles are *not required* to run Marxan. Any information that Marxan needs to run its optimization is contained in the four files listed in the table. However, the vector geometries will be useful when we want to map the results.
Import the planning unit shapefile as a GeoDataFrame. What columns does it have?
We only need ``'puid'`` and ``'geometry'``.
Run Marxan from the Terminal
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once you have a dataset compiled in the required format (:file:`input.dat`, :file:`input` and :file:`output` folders, and input files as specified), running Marxan is straightforward. Simply copy the small Marxan program into the dataset folder and run it.
Download Marxan 4.0.6 from the official website (`marxansolutions.org/software `_ - you need to register) or the class folder:
`~/ee508/software `_
- Make sure you pick the right version for your OS and chip (Mac: x86 or M1).
Copy the program into the folder in which you keep the Tasmania dataset (in the same folder as :file:`input.dat`).
Open :gui:`Terminal` or :gui:`Command Prompt` and navigate to the folder of the Tasmania dataset.
Run Marxan:
- On Windows, simply type the program name:
::
Marxan_x64.exe
You can also double-click on the file in the :gui:`Explorer`, but the windows will close after running, so you don't get to inspect the outputs.
- On Mac, prefix the program name with "``./``", as in:
::
./marxan # if downloaded from Marxan website
./marxan_osx11_m1 # if downloaded from course website
Mac users: if you get a permission error (I did with the ``marxan`` executables from the Marxan website), use ``chmod +x marxan`` in :gui:`Terminal` (after changing to the folder that contains the file) to ensure the file is executable.
Marxan will run for a few seconds and produce a number of output files both in the root folder and its :file:`output` subfolder.
And yes, as you can call Marxan from :gui:`Terminal`, you can also call it from Jupyter. Scroll down to the `Appendix`_ for a handy function.
Examine Marxan's outputs
~~~~~~~~~~~~~~~~~~~~~~~~
Import the following files into Jupyter with ``pd.read_csv`` and have a look at them:
- :file:`output/output_ssoln.csv` gives you the selection frequency for each planning unit in the ``'number'`` column (``'planning_unit'`` is the ``'PUID'``).
- :file:`output/output_best.csv`: this is the best solution among all runs. The ``SOLUTION`` column contains a 1 for protected parcels, and 0 for unprotected parcels. (Note that Marxan capitalizes the column with the ID of the planning unit in the output file: ``'PUID'``)
- :file:`output/output_mvbest.csv`: for the best solution, this file gives you the information on what targets have been reached. The column ``'Target'`` indicates the target (in the units of the amount column in :file:`puvspr.dat`). The column ``'Amount Held'`` indicates how much of each feature is contained in the best solution.
- :file:`output/output_r0000XX.csv` and :file:`output/output_mv0000XX.csv` give you the solution and target information for all runs.
You *could* compute the selection frequency using these files. Thankfully, Marxan already does this for you (:file:`output/output_ssoln.csv`).
- :file:`output/output_sum.csv` contains aggregate information for each run, including ``'Score'`` (the value of the objective function: the lower, the better) and ``'Cost'`` (the cost of protecting these planning units). Can you find the best solution?
Map selection frequencies (independent coding)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: images/lab2_tasmania_selection_frequency.png
:alt: Selection frequency in Tasmania conservation planning
:align: right
Using what you've learned about the Tasmania dataset and mapping with ``geopandas``, reproduce this selection frequency map. To practice your mapping skills, make the map look as similar to this one as possible. Specifically:
- Planning units that were already protected should be clearly distinguishable from all others. (Recall: they have a ``'status'`` of ``2`` in :file:`pu.dat`.)
- The outline of Tasmania should be visible, but the outlines of planning units should not be.
- Your map also needs to show the coordinates (in the reference system of the provided shapefiles), have a meaningful title, a legend (colorbar), and be saved as a high-resolution version (≥150 dpi, 8 x 6 inches).
A few suggestions:
- ``pandas`` offers two functions for joining DataFrames and GeoDataFrames on columns: ``join`` and ``merge``. They are similar, but their usage is slightly different. Either will work for you.
- ``join`` joins on indices by default. The easiest way to join two DataFrames or GeoDataFrames is to give both the correct index (with ``set_index()``), and then join them with ``df = df1.join(df2)``.
- ``merge`` joins on columns by default. You have to provide column names as arguments. If the column names are identical, you can pass the name as the argument ``on``, otherwise, you need to pass the two column names as ``left_on`` and ``right_on``:
::
df = df1.merge(df2, left_on='planning_unit', right_on='pu_id')
- Recall that you can plot any subselection of a ``GeoDataFrame`` (select rows, then ``plot``). This is useful when you want to apply different color schemes to different subsets of polygons (e.g. for planning units that are already protected).
- To plot all polygons of a GeoDataFrame with the same color, call the ``plot`` function without a variable name, and add the argument ``color``.
- ``color`` accepts colors in a `wide variety of formats `_: single-letter codes (``'c'``), hex strings (``'#4adfcf'``), a tuple of RGB values (0-1) (``(0.5, 0.5, 1)``), or legal html names (``'lightgrey'``). Try out this last one. Does it look like a color you can find on the map?
- The shaded color map of the map above does not exist as a default colormap in Python. However, you can reproduce it:
::
from matplotlib.colors import LinearSegmentedColormap
CMAP_DICT = {
'red': [(0.0, 1.0, 1.0), (1.0, 0.0, 0.0)],
'green': [(0.0, 1.0, 1.0), (1.0, 0.6, 0.6)],
'blue': [(0.0, 1.0, 1.0), (1.0, 0.2, 0.2)],
}
my_cmap = LinearSegmentedColormap('', CMAP_DICT)
This colormap can be passed to any plotting function that accepts a ``cmap`` (``cmap=my_cmap``).
.. tip::
Understanding the ``CMAP_DICT`` code block (optional) allows you to create any color gradient:
``CMAP_DICT`` is a dictionary that contains, for each base color used in additive color mixing (``'red', 'green', 'blue'``), a ``list`` of **positions** (stops) on the color gradient.
Each position contains a ``tuple`` of three values:
- position on the gradient (value between ``0.0`` - ``1.0``)
- intensity of base color *before* the position (value between ``0.0`` - ``1.0``)
- intensity of base color *after* the position (value between ``0.0`` - ``1.0``)
In other words, the dictionary is interpreted as follows:
For ``red``, start at the beginning of the color scale (position ``0.0``) with 100% intensity (``1.0``), then move to the end of the color scale (position ``1.0``), gradually reducing ``red`` intensity to 0% (``0.0``). Repeat for green and blue, but with different end values. The resulting RGB values at the beginning of the gradient will be white (``(1.0, 1.0, 1.0)``), and the final RGB values will be ``(0.0, 0.6, 0.2)`` – which represent the color of the darkest green in the above map.
For more information, check out the reference page for `LinearSegmentedColormap `_.
Wrap up
~~~~~~~
Does your map look like the map above? Great!
Don't forget a title, coordinates, and the legend for the selection frequency (e.g., the default colorbar).
Save the high-resolution version in your class folder as:
:file:`~/ee508/reports/lab2/1_selection_frequency_tasmania.png`
Submit the file to the assignment on Blackboard, and you are done!
.. _lab2-1-appendix:
Appendix
~~~~~~~~
Here are two little helper functions to help you skip ahead.
``run_marxan``
--------------
You can run Marxan from Jupyter with this function. It accepts the filepath of the Marxan project, runs Marxan in that folder (assuming that the :gui:`Marxan` executable is already in the project folder), and prints outputs below the cell.
::
def run_marxan(dir_marxan, silent=False):
"""Run Marxan
Parameters
----------
dir_marxan : str or pathlib.Path
Root directory of Marxan project (with data and executable)
silent : bool
If False, prints all Marxan outputs.
Set to True to silence outputs when running lots of jobs.
"""
import os
import platform
import subprocess
# Pick the correct executable for OSX and chip
if platform.system() == 'Darwin':
if platform.processor() == 'i386':
marxan_command = './marxan_osx10_x86'
elif platform.processor() == 'arm':
marxan_command = './marxan_osx11_m1'
elif platform.system() == 'Windows':
marxan_command = 'Marxan_x64.exe'
elif platform.system() == 'Linux':
marxan_command = 'Marxan_x64'
# Change into Marxan folder
os.chdir(dir_marxan)
# Run Marxan
if silent:
proc = subprocess.Popen(
marxan_command,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
proc.wait()
else:
# Run with output printed under Jupyter cell
proc = subprocess.Popen(
marxan_command,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
)
for line in proc.stdout:
print(line, end='')
proc.wait()
# Usage example
run_marxan(DIR_TASMANIA)
``change_marxan_parameters``
----------------------------
You can change Marxan parameters with this function. It accepts the path to the parameter file :file:`input.dat` and a dictionary of parameters with new values.
::
def change_marxan_parameters(path, change_dict):
"""
Function to change the parameters in a Marxan parameter file
Parameters
----------
path : pathlib.Path
Complete path of the parameter file ('input.dat')
change_dict : dict
Dictionary of parameter values to be changed
Example: {'BLM': 0.1, 'NUMREPS': 100}
"""
import os
import re
# Check if configuration file exists
if not path.exists():
print(path, 'not found')
# Read configuration file
with open(path) as file:
old_text = file.read()
# Copy the text, keep the old one
new_text = old_text
# Loop through each dictionary item
for key, value in change_dict.items():
# Define REGEX search pattern string
search_regex = '\n' + key.upper() + ' +[0-9A-Za-z. -]+\n'
# If the parameter can be found, print warning and skip to next item
if re.search(search_regex, new_text) is None:
print('Parameter ' + key + ' not found')
continue
# Define replacement string
if type(value) == float and value < 0.001:
value = '{0:.10f}'.format(value)
replace_str = '\n' + key.upper() + ' ' + str(value) + '\n'
# Replace strings
new_text = re.sub(search_regex, replace_str, new_text)
# If a change has been made, write the file
if new_text != old_text:
with open(path, 'w') as file:
file.write(new_text)
print('Saved new parameters to ' + path.parts[-1])