2.2. Building a Marxan analysis from scratch

In this lab, we will build your own Marxan dataset, run Marxan to find representative reserve networks in Colombian municipalities, and check the robustness of your results against plausible modifications to underlying model assumptions. On the way, you will pick up new skills on DataFrame filtering, indexing, pivoting, and geometry processing.

Deliverables

One selection frequency map for Colombia:

Municipalities as planning units, constant cost per area.

A Jupyter notebook with the code needed to build the Marxan dataset and generate the selection frequency map.

Due date

Tuesday, Oct 14, 2025, by 6:00 p.m.

Even if the due date might be many days away, I recommend to start the lab as early as you can and advance rapidly, in line with class conversations.

Students from previous cohorts asked me to tell you: do not take progress on reading lab instructions as an indicator of how much work remains. Two bigger tasks are waiting for you in Lab 2.3: I will ask you to re-run all steps, but with some variations in data inputs.

You can reduce the impact of these extra runs on your time if you use this lab section to develop a smoothly running, reproducible, understandable workflow (frequent Kernel > Restart Kernel and Run All Cells…). That way, by the time you hand in Lab 2.2, the two added variations won’t take too much extra coding time (just some extra processing time).

2.2.1. Prepare data, folder structure, and notebook

Download colombia_municipios.zip from:

~/ee508/data/external/lab2

Create an empty folder for the new Marxan dataset for Colombia:

~/ee508/data/processed/lab2/colombia_municipios

This folder will contain the new input dataset for our first Marxan run in Colombia. The planning units are municipalities (municipios in Spanish).
I’ll use the shorthand <colombia_municipios> to refer to this folder for the rest of this lab.

Within <colombia_municipios>:

Create the subfolders
- input
- output
- pulayer
- temp
Make a copy of the input.dat configuration file (from the Tasmania dataset or the class folders, link below) and drop it in the project folder.

~/ee508/data/external/lab2/input.dat

Start a new Jupyter notebook

Import pandas, geopandas and matplotlib.pyplot to their usual aliases.
Import rasterstats.zonal_stats and subprocess as we did in Lab 1.4.

Take some time to revisit the table of required Marxan datasets and their columns in Lab 2.1: Examine Marxan inputs.

Question

Can you wrap your head around how to create each of these tables?

2.2.2. Create the planning unit files

For the purpose of this exercise, our planning unit are Colombia’s 1,120 mainland municipalities.

In real-world policy scenarios, you will not find situations in which decision makers make “yes/no” protection decisions for entire municipalities. Planning units are often much smaller, down to parcels. However, we can still ask the question: which combinations of municipalities, if protected, would cover representative habitat for all species most cost-effectively? Later in the lab, we will switch planning units from municipalities to hexagons for a “smoother” planning output.

We will use colombia_municipios.shp to create three files:

<colombia_municipios>/input/pu.dat

A table of planning units (municipalities) that contains the columns: 'id', 'cost', and 'status'.
<colombia_municipios>/pulayer/pulayer.gpkg

A geopackage of planning units (municipalities) that contains the same 'id' as pu.dat.
If you want to use zonal_statistics (and not rasterized zonals - see below):

<colombia_municipios>/temp/pulayer_4326_simplified.shp

A geopackage of planning units (municipalites) projected into EPSG:4326 with slightly simplified geometries for zonal statistics (faster).

In your Jupyter notebook, read the external file colombia_municipios.shp and save it as the GeoDataFrame mun. Make sure to use encoding='utf-8', so Spanish characters are correctly displayed.

Plot the layer. How does it look?

The GeoDataFrame mun still contains marine areas and islands. We do not have species distribution raster data for these areas. We need to remove these polygons from mun before proceeding. This is a good occasion to practice filtering with boolean indexing in geopandas.

Remove the unwanted rows from mun. There are different options to identify the unwanted rows.

Inspect the results of the following filters. You can use the ~ (tilde sign) operator to invert any boolean Series (flip False and True).

Marine areas: mun['NOMBRE_ENT'].eq('MARINO') or mun['COD_DEPART'].isnull().
- .ne() and .notnull() gives you the respective inverted series.
Islands: mun['DEPARTAMEN'].str.contains('SAN ANDRÉS') or mun['COD_DEPART'].eq('88').
Both marine areas and islands: mun['COD_DEPART'].isin(['88', None]).

Once you have filtered out the unwanted rows, plot the GeoDataFrame again. Do you get a map of Colombia without islands and marine areas? Excellent.

Inspect the columns in mun. The column 'MUN' seems to contain a unique identifier of the municipality. Does it have duplicates?

Yes, the column 'MUN' has duplicates.

This is because each row refers to a polygon, but some municipalities have multiple polygons (e.g. mainland and small islands). If we want one row per municipality, we need to dissolve them before we proceed:

mun = mun.dissolve(by='MUN')

Consult the help for mun.dissolve (using help()). What does the function do with the attribute data by default?

It retains the attributes from the first row of each group. You can change that by passing a different function (aggfunc). We won’t need to do that here.

You should be left with 1,120 planning units. Double-check with: len(mun)

Marxan needs an index for planning units that is named 'id', and that contains consecutive integer numbers starting at 1 (careful: not 0). How can you create such an index?

Python’s range command gives you an iterator of consecutively numbered integers:

range(1, len(mun) + 1)

Note

len(), when applied to a DataFrame, returns the number of rows.

To turn this into the index of your DataFrame, you can either create a new column first and use it as the index (two lines):

mun['id'] = range(1, len(mun) + 1)
mun = mun.set_index('id')

Or you can create it directly:

mun.index = pd.Index(range(1, len(mun) + 1), name='id')

Marxan also needs a 'cost' column. Unfortunately, there is no good cost layer for Colombia. For simplicity, we will start with ignoring differences in land value by making the assumption that the cost per area is constant, and, therefore, that the cost of protecting each municipality is proportional to its size. Notice how this is similar to Jenkin’s priority map in PNAS? We will modify these assumptions later.

Go ahead and create a new column named 'km2' that contains the area of each municipality in square kilometers (km²). Use the equal-area projection we used in Lab 1.3 for computing areas and recall in which units the areas are returned.

Create a new 'cost' column that contains the area in km² rounded to 2 decimal places: mun['cost'] = mun['km2'].round(2).

Check: the 'cost' value of the first row (municipality) should be 372.63 (km²).

Finally, Marxan requires a 'status' column that identifies planning units that are already protected (2) or cannot be protected (3). For simplicity, we ignore existing protected areas and pretend that we start from scratch.

mun['status'] = 0

We are now ready to save the above files.

Use to_csv() to save the columns ['cost', 'status'] alongside its index ('id') to:

<colombia_municipios>/input/pu.dat

Note that to_csv() (on a pandas dataframe) automatically saves the index as the first column in the output file, unless you pass the argument index=False.

Use to_file() to save mun as a shapefile to:

<colombia_municipios>/pulayer/pulayer.gpkg

mun[['geometry']].to_file(...)

Note that to_file() (on a GeoDataFrame) will only save the index as a column 1) if it is named, 2) if it is not an integer index, or 3) if it is a MultiIndex. If your index does not fulfil any of these conditions, you need to pass the argument index=True or use reset_index() before saving to ensure that geopandas saves the index column.

2.2.3. Create the species (features) file

Marxan needs a file that defines what conservation features (species) exist, what protection targets we have for each, and what penalty we incur for any target amount we miss (higher species penalty factors mean that you are more certain the target will be reached).

This should be easy, because we already worked with the species list in Lab 1.4.

Import the file into the DataFrame species.
Create an index with the name 'id' that contains consecutive integers, starting with 1.
Add a 'prop' column. Set it to 0.3 for all species.
Add a 'spf' column. Set it to 1 for all species.

Use to_csv() to save the columns 'prop', 'spf', and 'name' to:

<colombia_municipios>/input/spec.dat

As before, the index 'id' will be saved automatically as the first column. Make sure to not include any other columns, as it can lead to errors.

2.2.4. Create the planning-unit × species file

Marxan needs to know the amount of each conservation feature (species) in each planning unit.

The definition of “amount” depends on the units in which your targets are defined for each conservation feature (species): it can be the size of a range, the total number of individuals, the summed probability of occurrence across pixels, etc. Here, we use range size.

There are different ways to build this file. One straightforward approach is to build a DataFrame with a row for each planning unit and a column for each species (we’ll call it pu_x_sp here)

species

pu

1

2

…

1

“amount” of species 1 in planning unit 1

“amount” of species 2 in planning unit 1

…

2

“amount” of species 1 in planning unit 2

“amount” of species 2 in planning unit 2

…

3

…

Once you have this DataFrame, you can pivot (or stack) it with a simple command into the format that Marxan expects (we’ll do that later):

species	pu	amount
1	1	“amount” of species 1 in planning unit 1
2	1	“amount” of species 2 in planning unit 1
…	…	…
1	2	“amount” of species 1 in planning unit 2
2	2	“amount” of species 2 in planning unit 2
…	…	…

Create an empty DataFrame that uses the index from mun as an index, and the index from species as columns. We can already give these indices the same name that their columns will have in the final puvspr.dat file ('pu' and 'species'). Naming the indices also reduces the risk of mixing up planning units and species, both of which have numerical indices starting at 1.

pu_x_sp = pd.DataFrame(
    index=mun.index.rename('pu'), columns=sp.index.rename('species')
)

2.2.4.1. Rasterize the planning unit layer

We will use rasterized zonal statistics to calculate how much of each species’ range falls within each municipality. This approach is much faster than traditional vector-based zonal statistics (seconds vs. ~45 minutes) because it works entirely with raster data.

The key idea: instead of overlaying individual vector polygons on rasters repeatedly, we rasterize the planning units once to the same specifications as the species rasters, then perform grouped statistics on the aligned rasters, grouping the values of one raster (species range) by the unique values (planning unit IDs) of the other.

First, we need to create a raster where each pixel contains the ID of the planning unit (municipality) it belongs to. This raster must have exactly the same extent, resolution, and coordinate system as our species rasters.

# Save the planning unit layer in the raster projection (EPSG:4326)
pu_4326_path = DIR_COLOMBIA_MUNICIPIOS / 'temp' / 'pulayer_4326.gpkg'
mun.to_crs('epsg:4326').to_file(pu_4326_path)

# Get the exact raster specifications from the first species raster
_r_path = DIR_LAB1_4_PROCESSED / (
    species['name'].iloc[0].replace(' ', '_') + '_coarse.tif'
)
with r_open(_r_path) as r_file:
    r_xmin, r_ymin, r_xmax, r_ymax = r_file.bounds
    r_res = r_file.res[0]

# Rasterize the 'id' column of the planning unit layer with the exact same
# raster specifications as the species range rasters
r_puid_path = DIR_COLOMBIA_MUNICIPIOS / 'temp' / 'puid.tif'
command = [
    'gdal_rasterize',
    # Burn the 'id' attribute into raster
    '-a',
    'id',
    # Extent
    '-te',
    str(r_xmin),
    str(r_ymin),
    str(r_xmax),
    str(r_ymax),
    # Resolution
    '-tr',
    str(r_res),
    str(r_res),
    # Data type (supports up to 65,535 PUs)
    '-ot',
    'uint16',
    # Compression
    '-co',
    'COMPRESS=LZW',
    pu_4326_path,
    r_puid_path,
]

# Run gdal_rasterize with output printed under Jupyter cell
proc = subprocess.Popen(
    command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True
)
for line in proc.stdout:
    print(line, end='')
proc.wait()

# Read the new raster into memory
r_puid = r_open(r_puid_path).read(1)

# Sanity check: visualize the rasterized planning units
fig, ax = plt.subplots(figsize=(10, 8))
im = ax.imshow(r_puid, cmap='tab20')
plt.colorbar(im, ax=ax, label='Planning Unit ID')
ax.set_title('Rasterized Planning Units')
plt.show()

Note

The r_puid array is now loaded in memory and contains the planning unit ID for every pixel. Pixels outside planning units have a value of 0.

2.2.4.2. Loop through species and calculate range per planning unit

Now we’ll create a for loop that iterates through each species, loads its range raster, and calculates the total range size within each planning unit using rasterized zonal statistics.

You can pick one of three different approaches to iterate over species IDs and names in a for loop:

pandas DataFrames have the iterator iterrows(). For every row in a DataFrame, the command returns a tuple with 1) the index of that row (i.e., a single value, also called a scalar) and 2) the content of the row (as a Series). Try this code, and think about how you can apply it to this problem:
```
for species_id, species_data in species.iterrows():
    print(species_id, species_data['name'])
```
Do you remember that plt.subplots() returns a tuple of the figure and the axes, and we assigned them, respectively, to the objects we named fig, ax with a single equal sign?

The same thing happens here: in every iteration, the two elements of the tuple returned by species.iterrows() get directly assigned to the two variables that we named species_id and species_data, respectively.

Instead of the above, we can select the column 'name' as a Series, and use the Series iterator items():

for species_id, species_name in species['name'].items():
    print(species_id, species_name)

A third alternative is to iterate through the index and select the row with the index:

for species_id in species.index:
   print(species_id, species.loc[species_id, 'name'])

You have two options to compute the rasterized zonal statistics.

Using a DataFrame and groupby (more familiar from prior labs).

# Create a Dataframe with two columns: one containing the (flat) IDs of
# the planning unit ('puid'), the other the values from the species
# range raster ('species_range')
df = pd.DataFrame({'puid': r_puid.flatten(), 'species_range': r_species.flatten()})

# Select only pixels that contain a planning unit
df = df[df['puid'] > 0]

# Group DataFrame by 'puid' column, then compute the sum of the (pixel)
# values of the 'species_range' column for every group
result = df.groupby('puid')['species_range'].sum()

Using numpy and bincount (marginally faster).

# Use only valid pixels for computation
puid_valid = r_puid[valid_mask]
species_valid = r_species[valid_mask]

# Sum by planning unit
sums = np.bincount(puid_valid, weights=species_valid)

# Store directly (indices align with planning unit IDs)
result = pd.Series(sums[1:], index=range(1, len(sums)))

The result object is a Series with planning unit IDs as its index and summed species range values as its values. You need to assign these values to the column in pu_x_sp that corresponds to species_id. Remember that pu_x_sp uses planning unit IDs as row indices and species IDs as column names.

If you get any NaN values you can fill them with zeroes:

pu_x_sp = pu_x_sp.fillna(0)

Fallback: rasterstats

If you encounter issues with the rasterized approach or want to compare results, you can use traditional vector-based zonal statistics with rasterstats.zonal_stats. This approach is slower (~15-20 minutes), but you already know it.

If you go this route, I recommend simplifying the municipality polygon raster before running zonal_stats:

Create a copy of mun with mun.copy(), reproject it to the CRS of the species rasters. Call it mun_4326_simplified.

Simplify the geometries with a tolerance of 0.005.

mun_4326_simplified['geometry'] = mun_4326_simplified['geometry'].simplify_coverage(0.005)

Tolerance is given in CRS units. In EPSG:4326, it is given in degrees. For our case, 0.005 is a good trade-off between precision and the speed of zonal_stats:
Use the GeoDataFrame mun_4326_simplified for zonal_statistics, or save it as

<colombia_municipios>/temp/pulayer_4326_simplified.shp

and refer to the file.

Important

.simplify_coverage() simplifies geometries such that adjacent polygons (that share a boundary) continue to share a boundary. The topology is preserved. This is essential here.

.simplify() also exists, but does not provide that guarantee. Adjacent polygons can turn into partially overlapping polygons with gaps.

To make sure your for loop works as intended, you can test it with the first few species, by using .iloc:

for species_id, species_data in species.iloc[0:2].iterrows():
    print(species_id, species_name)
    # your code here

Recall that .iloc uses row and column numbers for selection, whereas .loc uses row and column labels. This can be confusing when the labels are numbers, as is the case here. However, row numbers (.iloc ) always start at 0. Our labels start at 1.

With the information you have, put together the complete for loop and make it run.

Map the result. The below should give you a map of the amount of the first species’ range in each municipality:

mun.join(pu_x_sp).plot(1, figsize=(7, 7))

If everything worked, this map will show the total size/amount of the species range in each municipality. It will look different than a visualization of the mean: other things equal, larger municipalities are more likely to have high values.

If you are happy with the results, save them in a temporary file with to_csv(), so you don’t lose them accidentally:

<colombia_municipios>/temp/pu_x_sp.csv

If you ever need to re-import this file with read_csv(), use set_index() to set the index back to 'id', and make sure the name of the column index is 'species':

pu_x_sp.columns = pu_x_sp.columns.rename('species')

You’re almost done! Now, you only need to stack() the DataFrame:

puvspr = pu_x_sp.stack()

Take some time to examine what stack() and unstack() do to this DataFrame. Both are very useful, similar to pivot tables in Excel.

stack() returns a Series with a MultiIndex. After renaming the Series and resetting its index, your can create a Dataframe that looks very similar to the one Marxan needs.

puvspr = pu_x_sp.stack().rename('amount').reset_index()

Did you notice that stack() automatically created the right column names for 'species' and 'pu'? That’s because we assigned these names to the index and columns of pu_x_sp earlier.

After a few additional cleanup steps, you are almost done with this file.

Reorder the columns to the order in which Marxan wishes to receive them. You can simply select them in the desired order (['species', 'pu', 'amount']).
Round the 'amount' column to one decimal place (see above for the .round() function)

If this fails, it is possible that the 'amount' column contains non-float values. You can fix that by converting the column to float before rounding: .astype(float)
Remove all rows where the 'amount' is zero.
Use .to_csv() to save your dataframe to:

<colombia_municipios>/input/puvspr.dat`

Make sure the index is not saved (index=False). Marxan will otherwise fail to read the file.

2.2.5. Create the boundary file

The last file we need to create is bound.dat.

For each pair of adjacent planning units, this file contains the length of the boundary shared by the pair. Marxan uses this information to compute the outer boundary length of the resulting protected area system. The file is essential if you want Marxan to consider the cost of boundaries. As we have seen before, assigning costs to boundaries tends to create more spatially aggregated (clumped) solutions.

The process of creating this file provides a good opportunity for learning more about how to work with vector geometries in geopandas and shapely.

If you have polygon layers that are topologically correct – in that adjacent polygons have exactly the same boundary lines – computing boundary lengths is straightforward, because the intersection of two polygons that share a boundary is that boundary (a line). If we want to compute the length of a boundary, we need to learn to access polygon geometries in a GeoDataFrame, intersect them, and then compute the length of resulting line.

Caution

As mentioned, the input GeoDataFrame for the process outlined below has to be topologically correct: shared boundaries of adjacent planning unit polygons should be exactly the same. In other words, don’t run this exercise with a GeoDataFrame that has gaps and overlaps (e.g. from using .simplify() instead of .simplify_coverage()). If you followed the instructions so far without changing anything, mun should still contain the GeoDataFrame with topologically correct geometries.

mun is currently in EPSG:4686 (the MAGNAS-SIRGAS projection, a standard for much of Colombia) and will return lengths in degrees. To compute lengths in meters, you need to reproject it to a projection that preserves distances well. Here, we’ll choose EPSG:21818, the UTM projection for Colombia (UTM Zone 18N).

Caution

Reprojections can mess up the topology of a vector file. Use them carefully. Here, the reprojection does not lead to topology errors.

As you know, the geometry of a GeoDataFrame is stored in the geometry column. If you want to select a single polygon, you need to provide its row and column as follows:

mun.iloc[0]['geometry']

By default, this command displays the geometry in Jupyter. What type of variable is it?

type(mun.iloc[0]['geometry'])

The geometry is a shapely Polygon. shapely is the Python module responsible for storing and processing 2D geometries in geopandas. Geometries have useful functions, such as set-theoretic operations union(), difference(), symmetric_difference(), and intersection(), but also buffer(), simplify(), and more. For more info, consult the Shapely User Manual.

To get the length of the shared boundary, we need to intersect polygons. Let’s try this:

geom1 = mun.iloc[0]['geometry']
geom2 = mun.iloc[1]['geometry']
geom_new = geom1.intersection(geom2)
geom_new

Hm… Jupyter doesn’t seem to display anything. Why is that?

geom_new.length

The length of the intersection is 0. How can that be?

geom_new.is_empty

The geometry is empty. This is the expected result of an intersection of geometries that do not intersect.

Try the same command with two geometries that do intersect. If you haven’t changed the order of the municipalities, mun.iloc[0] and mun.iloc[7] should have a shared boundary. Run the intersection.

What is the type() of the result? It’s not a polygon. (If it were a polygon, it would mean that the input polygons overlap.)
What is the area of the result of the intersection? geom_new.area
What is its length? geom_new.length. That is the value we are looking for!

2.2.5.1. Spatial indexing

You now have all the knowledge you need to create the boundary file. You will need to write an algorithm that 1) intersects every polygon in mun with every polygon in mun (other than itself), and 2) if the two share a boundary, save the length of the boundary, together with the ids of the pair of planning units. Go ahead and code it up. Here are a few useful hints:

The naive approach would use a nested for loop or combinations to check every possible pair of polygons. However, that approach has O(n²) complexity—if you double the number of planning units, you quadruple the processing time. For large datasets like ours, we can do much better using a spatial index.

A spatial index (specifically, an R-tree) allows geopandas to quickly identify which polygons are near each other without checking every single pair. Instead of testing all \(\frac{n(n - 1)}{2}\) combinations, you only test pairs whose bounding boxes actually overlap—typically reducing the number of checks by 10-100x.

Here’s how to implement this:

First, build the spatial index (geopandas does this automatically):

sindex = mun.sindex

Then, iterate through each planning unit and use the spatial index to find only its nearby neighbors:

bound_list = []

for idx, row in mun.iterrows():
    # Find only nearby polygons using spatial index
    possible_matches_idx = list(sindex.intersection(row.geometry.bounds))
    possible_matches = mun.iloc[possible_matches_idx]

    for other_idx, other_row in possible_matches.iterrows():
        if idx >= other_idx:  # Avoid duplicates and self-intersection
            continue

        boundary = row.geometry.intersection(other_row.geometry)

        if boundary.length > 0:
            bound_list.append([idx, other_idx, boundary.length])

bound = pd.DataFrame(bound_list, columns=['id1', 'id2', 'boundary'])

sindex.intersection(row.geometry.bounds) returns only the indices of polygons whose bounding boxes overlap with the current polygon’s bounding box. For most polygons, this eliminates 95%+ of potential neighbors from consideration. The if idx >= other_idx check ensures we don’t count each boundary twice and don’t intersect a polygon with itself.
The entire for loop should take a few seconds with the spatial index, compared to 5-15 minutes without it.

When you’re satisfied with the results, round the boundary column to whole numbers (zero decimal places), and save the DataFrame to <colombia_municipios>/input/bound.dat. You won’t need to save the index (use index=False).

If you ever create your own Marxan dataset with very large planning unit counts, remember that spatial indices are your friend - they’re the key to making spatial operations tractable at scale.

2.2.6. Check your datasets

You are done creating the datasets!

Check whether everything looks right by opening the four files in a text editor and verifying that they first rows look like this:

pu.dat

spec.dat

id,cost,status
1,372.63,0
2,510.3,0
3,292.49,0

id,prop,spf,name
1,0.3,1,Aburria aburri
2,0.3,1,Aglaiocercus coelestis
3,0.3,1,Akodon affinis

puvspr.dat

bound.dat

species,pu,amount
1,1,39.9
3,1,46.3
5,1,47.0

id,id2,boundary
1,8,4111.0
1,19,22094.0
1,41,2819.0

2.2.7. Run Marxan

We still need the input.dat file. Fortunately, we have modeled the folder structure and file names after the Tasmania example. Therefore, you only need to make a copy of input.dat from the Tasmania folder and put it into the mun folder.

If you want the next steps to run a little faster, set NUMREPS in input.dat to a lower number than before, e.g. 10.

In the mun folder, create the folder output.

Make a copy of the Marxan program you used in Lab 2.1 (Tasmania) and put it in the mun folder.

Run Marxan from Terminal / Prompt as you did in Lab 2.1. Does it work?

Great. You are now ready to run your analysis.

If we want to automate running Marxan optimizations and testing different parameters, we need to call it from Jupyter. Copy the run_marxan function from the Appendix of Lab 2.1, then run:

run_marxan(DIR_COLOMBIA_MUNICIPIOS)

2.2.8. Map selection frequencies

Using your code from Lab 2.1 (Tasmania), make a map of the selection frequency and the best solution.

Does it look either too clustered or too fragmented? If so, consider modifying the boundary length modifier (BLM).

To get a sense of how the BLM affects results, write a for loop that varies the BLM across a wide range of values (here: 10^-5 to 10⁰), runs a single Marxan run, and plots the solution.

This code example requires the functions change_marxan_parameters and run_marxan provided in the Appendix of Lab 2.1.

for blm_exp in range(-5, 0):
    blm = 10**blm_exp
    print(blm)

    change_marxan_parameters(
        DIR_COLOMBIA_MUNICIPIOS / 'input.dat', {'BLM': blm, 'NUMREPS': 1}
    )

    run_marxan(DIR_COLOMBIA_MUNICIPIOS, silent=True)

    sol = pd.read_csv(
        DIR_COLOMBIA_MUNICIPIOS / 'output' / 'output_best.csv'
    ).set_index('PUID')

    fig, ax = plt.subplots(figsize=(7, 7))
    mun.join(sol).plot(
        'SOLUTION',
        ax=ax,
        cmap=my_cmap,
        edgecolor='black',
        linewidth=0.1,
    )
    ax.set_title('BLM: ' + str(blm), fontsize=15, pad=10)
    colombia.plot(ax=ax, color='none', edgecolor='black', linewidth=0.5)
    plt.savefig(
        DIR_REPORTS_LAB2 / 'colombia_municipios' / f'solution_blm{blm}.png'
    )

For our first selection frequency map, let’s pick a BLM of 0.001. This pick is arbitrary for this “toy” problem: in which the relative importance of the boundary is not formally defined. However, picking a specific number makes our results comparable and gives you a reference for Lab 2.3.

Choose a NUMREPS value that gives you a good set of solutions (≥25). Run Marxan once more.

Create a high-resolution map of selection frequency from the results of your last run. Make sure that:

Your plot has a title.
The boundaries of mainland Colombia are visible but not the islands.
The legend of the classification scheme is shown (legend=True).

Save it as:

~/ee508/reports/lab2/2_selection_frequency_colombia.png

Save your code for this part of the lab as a Jupyter notebook that you can run from top to bottom without error and which will create the above deliverable. If you are not sure whether your code fulfills that expectation, click Kernel > Restart & Run All Cells…. Save the notebook as:

~/ee508/reports/lab2/2_code.ipynb

Copy the folder with the Marxan input dataset into your report folder,leaving just input.dat and the subfolders input and output with contents:

~/ee508/reports/lab2/2_dataset/

This helps me understand where you went wrong if your map is incorrect and assign partial credit.

2.2.9. Wrap up

Submit to the Blackboard Google Assignment a zipped archive of:

~/ee508/reports/lab2/

It should contain:

2_selection_frequency_colombia.png

2_code.ipynb

2_dataset/ - only input.dat and the subfolders input and output with contents.

Congratulations!

You just conducted your first Marxan analysis on a dataset that you created from scratch. In the process, you practiced data filtering, learned how to dissolve geometries, how to iterate through DataFrames, how to stack and unstack them, how to conduct geometric manipulations with shapely geometries, how to work with combinations (or nested for loops), and how to quickly find a good BLM.

Do these results hold up? Find out in Lab 2.3.