Automating Workflows in GRASS GIS with PythonAutomating geospatial workflows can save hours of repetitive work, improve reproducibility, and make complex analyses tractable. GRASS GIS (Geographic Resources Analysis Support System) is a powerful open-source GIS with a comprehensive set of raster, vector, and temporal tools. Python, with its readability and strong ecosystem, is an excellent language for automating GRASS workflows—from simple batch tasks to complex pipelines that integrate multiple data sources and analyses. This article covers the why, what, and how of automation in GRASS GIS using Python, with practical examples, best practices, and troubleshooting tips.
Why automate GRASS GIS workflows?
- Reproducibility: Scripts provide a record of every step and parameter, allowing analyses to be rerun exactly.
- Efficiency: Batch processing large datasets or repeating tasks across regions/time saves time.
- Scalability: Automation enables leveraging powerful compute resources and integrating GRASS into larger processing pipelines.
- Integration: Python allows integration with other libraries (NumPy, pandas, rasterio, geopandas), web services, and task schedulers.
Getting started: installation and environment
Prerequisites:
- GRASS GIS installed (7.x or 8.x recommended).
- Python 3 (GRASS ships with its own Python; using the GRASS Python environment or configuring your system Python to work with GRASS is necessary).
- Optional useful libraries: numpy, pandas, rasterio, geopandas, matplotlib, shapely.
Starting GRASS from a shell and entering its Python environment:
- On Linux/macOS, start GRASS and open a Python console with
grass78
(orgrass
with version) and thenpython3
. - To run scripts outside the GRASS interactive session, use the GRASS Python startup script to set environment variables (see examples below).
Launching GRASS in a script (headless mode)
- GRASS needs a GISDBASE (directory for maps), LOCATION (coordinate system), and MAPSET (workspace). When scripting, you either create these beforehand or create them programmatically.
- Use the helper shell script
grass
with--text
or--exec
options, or programmatically set the environment variables and import GRASS Python libraries.
Example wrapper to start GRASS from a Python script (Linux/macOS):
#!/bin/bash # run_grass_script.sh GRASS_BIN=/usr/bin/grass78 LOCATION_PATH=/path/to/grassdata MAPSET=my_mapset $GRASS_BIN -c EPSG:4326 $LOCATION_PATH/$LOCATION -e >/dev/null 2>&1 $GRASS_BIN $LOCATION_PATH/$LOCATION/$MAPSET --exec python3 my_script.py
(Adjust paths and versions accordingly.)
Alternatively, inside Python, use the GRASS Python API by setting environment variables and importing grass.script and grass.script.setup.
Core Python APIs for GRASS
Two primary Python interfaces:
grass.script
(often imported asgscript
): a high-level wrapper to call GRASS modules and manage inputs/outputs. It’s the most commonly used for automation.grass.script.core
andgrass.script.setup
: lower-level functions to handle environment and session setup.pygrass
: an object-oriented Python API for GRASS (useful for more complex programmatic data manipulation).
Common patterns:
- Call modules with
gscript.run_command('r.mapcalc', expression='out = a + b')
orgscript.mapcalc('out = a + b')
. - Read lists of maps with
gscript.list_strings('rast')
orgscript.list_strings('vector')
. - Use
gscript.read_command
to capture module output as text, orgscript.parse_command
to get dictionaries from modules that print key=value pairs.
Example minimal script structure:
import os import grass.script as gscript import grass.script.setup as gsetup gisdb = '/path/to/grassdata' location = 'myloc' mapset = 'PERMANENT' gsetup.init(gisdb, location, mapset) # Example: list rasters rasters = gscript.list_strings(type='rast') print(rasters) # Run a GRASS module gscript.run_command('r.neighbors', input='elevation', output='elev_smooth', method='average', size=3)
Example workflows
Below are several real-world automation examples, from simple batch tasks to integrated pipelines.
1) Batch raster processing: smoothing and hillshade
Goal: For a directory of elevation rasters, import into GRASS, compute a 3×3 mean, generate slope and hillshade, then export results.
Outline:
- Loop through files in a directory.
- Import with
r.import
(orr.in.gdal
). - Smooth with
r.neighbors
. - Compute slope with
r.slope.aspect
. - Create hillshade with
r.hillshade
. - Export with
r.out.gdal
.
Key snippets:
import os import glob import grass.script as gscript from grass.script import run_command input_dir = '/data/elev' for fp in glob.glob(os.path.join(input_dir, '*.tif')): name = os.path.splitext(os.path.basename(fp))[0] run_command('r.import', input=fp, output=name, overwrite=True) run_command('r.neighbors', input=name, output=f'{name}_sm', size=3, method='average', overwrite=True) run_command('r.slope.aspect', elevation=f'{name}_sm', slope=f'{name}_slope', aspect=f'{name}_aspect', overwrite=True) run_command('r.hillshade', elevation=f'{name}_sm', shade=f'{name}_hill', overwrite=True) run_command('r.out.gdal', input=f'{name}_hill', output=f'/out/{name}_hill.tif', format='GTiff', overwrite=True)
2) Time-series analysis with space-time raster datasets (STRDS)
Goal: Automate import of a series of rasters as a space-time raster dataset, compute a temporal mean, and export.
Outline:
- Use
t.create
to create STRDS. - Import rasters with
t.register
ort.import
. - Use
t.rast.series
to compute mean. - Export result.
Snippet:
# create STRDS gscript.run_command('t.create', type='strds', temporaltype='absolute', output='elev_series', title='Elevation series') # register rasters with time stamps (assuming filenames include YYYYMMDD) for fp in sorted(glob.glob('/data/series/*.tif')): timestamp = extract_timestamp_from_filename(fp) # implement parsing gscript.run_command('t.register', input='elev_series', file=fp, start=time_str, increment='1 days', overwrite=True) gscript.run_command('t.rast.series', input='elev_series', output='elev_mean', method='average', overwrite=True) gscript.run_command('r.out.gdal', input='elev_mean', output='/out/elev_mean.tif', format='GTiff', overwrite=True)
3) Vector processing: batch clipping and attribute updates
Goal: Clip a set of vector layers to an administrative boundary and compute area attributes.
Outline:
- Import or list vectors.
- For each vector, use
v.overlay
orv.clip
to clip to boundary. - Add area column with
v.db.addcolumn
and fill withv.to.db
orv.report
.
Snippet:
boundary = 'admin_boundary' vectors = gscript.list_strings(type='vector') for v in vectors: out = f'{v}_clipped' gscript.run_command('v.overlay', ainput=v, binput=boundary, operator='and', output=out, overwrite=True) # add area column (sq meters) gscript.run_command('v.db.addcolumn', map=out, columns='area double precision') gscript.run_command('v.to.db', map=out, option='area', columns='area', unit='meters')
4) Integrating GRASS with geopandas/rasterio
Sometimes you want to combine GRASS processing with libraries like geopandas or rasterio for specialized tasks (e.g., advanced plotting, machine learning inputs).
Pattern:
- Export GRASS layers to temporary GeoTIFF/Shapefile with
r.out.gdal
/v.out.ogr
. - Read into geopandas/rasterio, perform operations, and optionally write back to GRASS.
Example:
# export vector to GeoPackage and read into geopandas gscript.run_command('v.out.ogr', input='roads', output='/tmp/roads.gpkg', format='GPKG', layer='roads', overwrite=True) import geopandas as gpd roads = gpd.read_file('/tmp/roads.gpkg') # perform geopandas operations...
Best practices for scripting
- Use
overwrite=True
consciously to avoid accidental data loss. - Organize scripts into modular functions (import, preprocess, analyze, export).
- Log operations and errors (use Python logging).
- Use temporary mapsets or workspaces for intermediate products, then clean up.
- Pin coordinate reference systems and check reprojection steps explicitly.
- Test scripts on a small dataset before scaling up.
Error handling and debugging
- Capture module output with
gscript.read_command
to inspect messages. - Check GRASS environment variables (GISBASE, GISDBASE, LOCATION_NAME, MAPSET).
- Use
try/except
around GRASS calls; GRASS-specific exceptions may be raised by the Python API. - Common error sources: missing projections, mismatched extents, null/NA values, memory limits for large rasters.
Example error handling pattern:
import logging logger = logging.getLogger('grass_script') try: gscript.run_command('r.mapcalc', expression='out=a+b', overwrite=True) except gscript.CalledModuleError as e: logger.error('GRASS module failed: %s', e) raise
Performance tips
- Set the computational region (
g.region
) tightly to the area of interest; many GRASS modules respect the region and process faster with smaller extents and coarser resolutions. - Use streaming/tiling approaches for very large rasters.
- Prefer GRASS native modules for heavy raster math—they’re optimized in C.
- When possible, use integer/categorical data types to reduce memory footprint compared to floating point.
- Use multiprocessing carefully: GRASS modules are not always thread-safe; launch separate GRASS sessions/processes for parallel tasks.
Packaging and deployment
- Turn scripts into command-line tools with argparse.
- Use containerization (Docker) to package GRASS, Python environment, and dependencies for reproducible deployments. A simple Dockerfile can install GRASS and required Python packages, then run your automation script.
- Schedule regular jobs using cron, Airflow, or other schedulers; ensure environment activation and GISDBASE paths are configured.
Minimal Dockerfile example:
FROM osgeo/grass:8.2 COPY requirements.txt /tmp/ RUN pip install -r /tmp/requirements.txt COPY my_script.py /app/ ENTRYPOINT ["python3", "/app/my_script.py"]
Example end-to-end script
A compact full example: import a directory of DEMs, create smoothed hillshades, and export.
#!/usr/bin/env python3 import os, glob import grass.script as gscript import grass.script.setup as gsetup GISDB = '/srv/grassdata' LOCATION = 'dem_loc' MAPSET = 'USER' gsetup.init(GISDB, LOCATION, MAPSET) in_dir = '/data/dems' out_dir = '/out/hill' os.makedirs(out_dir, exist_ok=True) for fp in glob.glob(os.path.join(in_dir, '*.tif')): name = os.path.splitext(os.path.basename(fp))[0] gscript.run_command('r.import', input=fp, output=name, overwrite=True) gscript.run_command('r.neighbors', input=name, output=f'{name}_sm', size=3, method='average', overwrite=True) gscript.run_command('r.slope.aspect', elevation=f'{name}_sm', slope=f'{name}_slope', aspect=f'{name}_aspect', overwrite=True) gscript.run_command('r.hillshade', elevation=f'{name}_sm', shade=f'{name}_hill', overwrite=True) gscript.run_command('r.out.gdal', input=f'{name}_hill', output=os.path.join(out_dir, f'{name}_hill.tif'), format='GTiff', createopt=['COMPRESS=LZW'], overwrite=True)
Troubleshooting common pitfalls
- Projection mismatches: ensure imported data uses the location CRS or reproject on import.
- Extent mismatches: use
g.region raster=name
to align region to a reference raster. - Large file I/O: prefer GDAL-backed modules (r.import, r.out.gdal) and consider compression.
- Permission issues: ensure GRASSDB directories are writable by the script user.
Further resources
- GRASS GIS manual pages for modules (r., v., t.*).
- GRASS Python API documentation and pygrass docs.
- Community mailing lists and GIS Stack Exchange for use-case specific help.
Automating workflows in GRASS GIS with Python unlocks faster, more reproducible geospatial analysis. Start with small scripts, adopt modular design and logging, and scale up with containers and schedulers when needed.