.. EE 508 / DS 537: Data Science for Conservation Decisions Installing the environment ========================== Create a project folder ~~~~~~~~~~~~~~~~~~~~~~~ Create a project folder for EE 508/DS 537. - Across EE 508, I will refer to this project folder as :file:`~/ee508/`. Replace as needed. :file:`~` can be replaced by the user folder in Python via ``os.path.expanduser``. - Make sure you have write access. We will edit the files using :gui:`QGIS`, :gui:`Python`, and :gui:`R`. .. tip:: Directory and file structures in this course will (loosely) follow the convention of the `Cookiecutter Data Science `_ project, a community template created by `Carl Boettiger `_ *et al.* at Berkeley. .. admonition:: Note for Windows users Windows users: EE 508 uses OSX/Linux notation for directories and files. Python usually handles the conversion from slashes :file:`/` to backslashes :file:`\\` in filepaths, but errors can still occur (e.g. if you inadvertently mix both in a filepath). Throughout this course: - When using a filepath provided on this course companion website, think :file:`\\` instead of :file:`/`. If you replace the slashes, note that the backslash is the default string escape character, so you have to use double backslashes :file:`\\\\`. A better-looking alternative is to add an ``r`` in front of the string. This allows you to use single backslashes :file:`\\` and saves you editing time when copy & pasting a filepath from somewhere else (e.g. your File Explorer). Install Miniconda ~~~~~~~~~~~~~~~~~ :gui:`Miniconda` is a lightweight Python package and environment manager. Using a package manager ensures that the versions of the dozens of Python packages we use in EE 508 will be compatible with each other. It also helps keep our EE 508 Python environment separate from our system's Python, avoiding interference. You're welcome to use a different tool for the job (such as the heavier Anaconda or the lightweight ``micromamba``), as long as you take care of any environment-related troubleshooting. Download the :file:`-latest-` Miniconda version for your operating system and CPU: ``_ Mac --- - Make sure to pick the right file for your CPU (Intel: :file:`x86_64`, M on Mac: :file:`arm64`, etc.). - The simplest way is to use the self-installing :file:`.pkg` files. - Alternatively, download the :file:`.sh` files, open :gui:`Terminal`, navigate to the folder that contains the file, and run this command, using the name of the file you downloaded: .. code-block:: bash bash Miniconda3-latest-MacOSX-x86_64.sh Windows ------- - Find the executable in the :gui:`File Explorer` and run it (double-click): :file:`Miniconda3-latest-Windows-x86_64.exe` - Open :gui:`Quick search` to find your freshly installed :gui:`Anaconda Prompt`. Consider :gui:`Pin to Start` or :gui:`Pin to taskbar`, as we will use this application every time we start working on a lab (instead of :gui:`Command Prompt` or :gui:`PowerShell`). - For the remainder of this course, whenever I suggest commands for the :gui:`Terminal`, use the :gui:`Anaconda Prompt`. Linux ----- - Open :gui:`Terminal` and run this command, using the name of the file you downloaded: .. code-block:: bash bash Miniconda3-latest-Linux-x86_64.sh Terms of Reference ------------------ In :gui:`Terminal`, read and accept the Terms of Services: .. code-block:: bash conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/msys2 Create an empty environment ~~~~~~~~~~~~~~~~~~~~~~~~~~~ In :gui:`Terminal`, initialize your Anazonda environment for EE 508: .. code-block:: bash conda create -n ee508 -c conda-forge -y mamba - ``-n ee508`` sets the name of your Anaconda environment. Give it any name you like. - ``-c conda-forge`` defines the Anaconda channel from which to pull the packages. - ``-y`` automatically confirms the list of packages (remove to review and confirm the package list before installing everything). We request only one package at this stage: ``mamba``. This solver is much faster than Anaconda's default. Activate the environment ~~~~~~~~~~~~~~~~~~~~~~~~ Still in :gui:`Terminal`, activate your environment: .. code-block:: bash conda activate ee508 Replace ``ee508`` with the name of your environment, if you picked a different name. .. important:: Remember this command! Activating the environment will be the first thing you'll do every time you work with Python in this course. The prompt should now start with ``(ee508)``. This tells you that the environment is active. Install packages ~~~~~~~~~~~~~~~~ Meet your required packages for EE 508, written in the `YAML `_ syntax: .. code-block:: yaml name: ee508 channels: - conda-forge dependencies: # Jupyter ecosystem - jupyter # Jupyter notebooks - jupyterlab_code_formatter # Auto-format code in JupyterLab - jupyterlab_execute_time # Show cell execution times # Code quality tools - black # Code formatter - isort # Import sorter - ruff # Fast linter # Core data science libraries - numpy # Arrays - pandas # Tabular data - pyarrow # Fast columnar data format # Geospatial analysis - geopandas # Vector data - shapely # Geometric operations - rtree # Spatial indexing - pyproj # Coordinate transformations - rasterio # Raster data I/O - rioxarray # Xarray integration for rasterio - rio-cogeo # Cloud Optimized GeoTIFF tools - rasterstats # Zonal statistics - pyogrio # Fast vector I/O # Machine learning - statsmodels # Statistical modeling - scikit-learn # General ML library - xgboost # Gradient boosting - lightgbm # Microsoft's gradient boosting - catboost # Yandex's gradient boosting # Visualization - matplotlib # Basic plotting - plotly # Interactive plots - altair # Grammar of graphics - folium # Interactive maps # File I/O - openpyxl # Excel files In :gui:`Terminal`, use this single-line install command to install all packages with ``mamba``. Your environment (``ee508``) must be active. .. code-block:: bash mamba install -c conda-forge --override-channels jupyter jupyterlab_code_formatter jupyterlab_execute_time black isort ruff numpy pandas pyarrow geopandas shapely rtree pyproj rasterio rioxarray rio-cogeo rasterstats pyogrio statsmodels scikit-learn xgboost lightgbm catboost matplotlib plotly altair folium openpyxl After resolving package dependencies, mamba will ask you whether you agree with the list. Confirm with :input:`y` (yes) and :input:`Enter` or skip by adding ``-y`` to the command. Installation of the packages can take a while, as mamba downloads and installs about 5.7 GB. Once you have saved your package, I recommend saving your fully resolved environment as another YAML file: .. code-block:: bash conda env export > ~/ee508/environment.yml Keep this file in your project folder. It will allow you to re-create the exact same environment, e.g. if you break yours, or get a new machine: .. code-block:: bash conda env create -f ~/ee508/environment.yml If you want to use the modern code formatter ``ruff`` (recommended), you also need to pip-install ``jupyter-ruff``, which makes ``ruff`` accessible in Jupyter. .. code-block:: bash pip install jupyter-ruff Launch and sanity‑check Jupyter ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Create a folder where you’d like to keep all of your Jupyter notebooks for EE 508, e.g.: :file:`~/ee508/notebooks` In :gui:`Terminal`, navigate to the folder (e.g., :file:`cd ~/ee508/notebooks`). With your ``ee508`` environment active, launch Jupyter notebook: .. code-block:: bash jupyter notebook After a litte wait, :gui:`Terminal` indicates that a Jupyter notebook is active and shares its URLs (you can copy & paste them into your browser). Your default browser will open, showing a directory listing of the folder from which you just called the Jupyter notebook (if you just created the folder, there's not much to see). If you lose the tab or close the browser, find the URL in :gui:`Terminal` and paste it in your browser. In the upper right corner of the Jupyter directory listing, select :gui:`New` > :gui:`Python 3 (ipykernel)`. A new browser tab will open, showing an empty and "Untitled" notebook. The notebook you see in the browser is also a **file**. Or rather, you are seeing an (interactive) HTML website generated from instructions in a KML (text) file saved with the file extension :file:`.ipynb` (iPython notebook - iPython is the engine behind Jupyter). Look for it in the folder from which you called Jupyter (:file:`~/ee508/notebooks`). The title of the notebook in your website is the filename: you change one, and the other changes, too. As you open the notebook, Jupyter also starts a new Python process (kernel) in the background that should now be active and is waiting for your input. Type the following code into the first cell and run it: .. code-block:: python import geopandas as gpd gpd.__version__ The text to the left of the cell should change from ``[ ]`` to ``[*]`` while Python executes your command. The first time you run this code will take a bit, as Python initializes your environment. After a short wait, the text should change to ``[1]`` and print the version of ``geopandas`` you installed. Your environment is now ready for your input. Fine-tune Jupyter ~~~~~~~~~~~~~~~~~ Return to the directory view (:gui:`Home`) by selecting its browser tab (if still open) or by clicking the Jupyter icon in any notebook. In the Menu > find :gui:`Settings` > :gui:`Settings Editor`. The :gui:`Settings Editor` is where you can fine-tune how the Jupyter interface in your browser appears and reacts. - **Code formatting**: your environment comes with several code formatters: ``isort`` and ``black``. We will use both code formatters religiously throughout the course: it makes both code and diffs (code comparisons) more readable. - Click :gui:`Jupyterlab Code Formatter`. You should see both :input:`isort` and :input:`black` listed as :gui:`default_formatter`. - Let's leave the settings as they are. We'll accept ``black`` opinion on 88-character line length, which breaks `PEP8 `_ convention (79 characters). - **Rulers**: I like to have two rulers in my code cells, so I can see how much space I have before the line break (72 characters for comments / docstrings, 88 for code). - Click :gui:`Notebook` in the left sidebar menu. - In the main window, find :gui:`Rulers`. :gui:`Add` one at :input:`72` for comments, one at :input:`88` for code.