Top 10 RRDtool Commands Every Admin Should Know

This tutorial covers:

  • What RRDtool is and when to use it
  • Key concepts and components
  • Installing RRDtool
  • Creating an RRD database
  • Feeding data into an RRD
  • Retrieving and visualizing data (graphing)
  • Typical use cases and best practices
  • Troubleshooting and tips

What is RRDtool and when to use it

RRDtool stores time-series data (metrics measured at time points) in a round-robin fashion: older data is consolidated and overwritten as the database reaches its fixed size. This design keeps storage constant and efficient.

Use RRDtool when you need:

  • Compact, bounded storage for long-running metrics
  • Built-in consolidation (average, min, max, last) over time
  • Fast graph generation and simple command-line usage
  • Integration with monitoring systems (MRTG, Cacti, collectd, Munin, and many others)

Key concepts

  • RRD (Round-Robin Database): the file where data and configuration are stored.
  • DS (Data Source): a single metric definition (name, type, heartbeat, min/max).
    • Common DS types: GAUGE, COUNTER, DERIVE, ABSOLUTE.
  • RRA (Round-Robin Archive): stores consolidated data points for a specific resolution and consolidation function (AVERAGE, MIN, MAX, LAST).
  • Step: primary time resolution in seconds for incoming data.
  • Heartbeat: maximum allowed interval between updates before data is considered unknown (NaN).
  • Consolidation: how values are aggregated when moving from higher resolution to lower (longer) resolution.

Installing RRDtool

On Debian/Ubuntu:

sudo apt update sudo apt install rrdtool 

On CentOS/RHEL:

sudo yum install epel-release sudo yum install rrdtool 

On macOS (Homebrew):

brew install rrdtool 

Bindings are available for many languages (Perl, Python, PHP, Ruby). For Python:

pip install pyrrd # or use rrdtool Python bindings if packaged for your platform: pip install rrdtool 

Creating an RRD database

Design decisions:

  • Choose a step (e.g., 60s for per-minute samples).
  • Define DS entries for each metric.
  • Define RRAs to keep multiple resolutions (e.g., 1-minute for 1 day, 5-minute for 7 days, hourly for months).

Example: create a database for a single gauge (e.g., CPU usage) sampled every 60 seconds with a heartbeat of 120 seconds, storing:

  • 1-minute resolution for 1 day (1440 rows),
  • 5-minute resolution for 7 days,
  • 1-hour resolution for 1 year.

Command:

rrdtool create cpu.rrd  --step 60  DS:cpu:GAUGE:120:0:100  RRA:AVERAGE:0.5:1:1440  RRA:AVERAGE:0.5:5:2016  RRA:AVERAGE:0.5:60:8760 

Explanation:

  • DS:cpu:GAUGE:120:0:100 defines a GAUGE named cpu, heartbeat 120s, min 0, max 100.
  • RRA:AVERAGE:0.5:1:1440 stores 1440 primary values at 1-step resolution.
  • RRA:AVERAGE:0.5:5:2016 stores 2016 rows where each row is average of 5 primary values (5-minute resolution).
  • RRA:AVERAGE:0.5:60:8760 stores 8760 rows where each row is average of 60 primary values (hourly resolution).

Feeding data into an RRD

Use rrdtool update to add samples. Each update is timestamp:value. Timestamps can be Unix time or relative times.

Example single update (current time):

rrdtool update cpu.rrd N:23.5 

Example with explicit timestamp:

rrdtool update cpu.rrd 1693500000:18.2 

Batch updates: Create a file updates.txt:

1693499940:20.1 169349, 1693500000:23.5 1693500060:22.0 

Then:

rrdtool update cpu.rrd --template cpu < updates.txt 

Notes:

  • Use the –template option if updating multiple DS in a single file.
  • If intervals exceed the heartbeat, the value becomes unknown (NaN).
  • Counter types: for COUNTER and DERIVE, rrdtool calculates rates automatically; ensure you understand wrap/overflow behavior and set min/max appropriately.

Retrieving data and graphing

RRDtool’s graphing is powerful and scriptable. Graphs are created with rrdtool graph and support DEF, CDEF, VDEF, LINE, AREA, GPRINT and many other directives.

Example: simple CPU usage graph for last 24 hours:

rrdtool graph cpu-day.png  --start -86400 --end now  --title "CPU Usage — Last 24 Hours"  --vertical-label "%"  DEF:cpu=cpu.rrd:cpu:AVERAGE  LINE2:cpu#00FF00:"CPU usage"  GPRINT:cpu:AVERAGE:"Avg: %6.2lf %%" 

Explanation:

  • DEF:cpu=cpu.rrd:cpu:AVERAGE reads the AVERAGE consolidation for the cpu DS.
  • LINE2 draws a line with thickness 2 and color.
  • GPRINT prints a statistics value on the graph. Use escaped colon and percent in format strings.

Using CDEF to compute derived values. Example convert bytes to bits:

CDEF:cpu_pct=cpu,100,* 

(For arithmetic, CDEF uses Reverse Polish Notation.)

Multiple data sources and stacked areas:

DEF:in=net.rrd:in:AVERAGE DEF:out=net.rrd:out:AVERAGE AREA:in#00FF00:"In traffic" AREA:out#0000FF:"Out traffic":STACK 

Annotations, thresholds, and custom ticks are supported. Example draw a red line at 80%:

HRULE:80#FF0000:"80% threshold" 

Typical use cases and integrations

  • Network bandwidth monitoring (MRTG, Cacti historically use RRDtool).
  • System metrics (CPU, memory, disk I/O) collected by collectd, munin.
  • Application-specific metrics where bounded storage and predictable performance are desired.
  • Combining with cron, SNMP polls, or agent daemons to feed data.

Integrations:

  • collectd has a native RRDtool plugin.
  • RRD stored files can be read by many graphing layers or exported.
  • Web front-ends like Cacti or LibreNMS simplify graph templates and dashboards.

Best practices

  • Plan RRAs to match retention needs: high resolution for recent history, consolidated for long-term trends.
  • Choose heartbeat slightly larger than your expected collection interval (e.g., 2x).
  • Use DS types appropriately: GAUGE for instantaneous values, COUNTER for monotonically increasing counters.
  • Set sensible min/max to catch anomalies; use U (unknown) for unbounded where appropriate.
  • Use filesystem snapshots or backups if you need to archive historical detail before RRD overwrites it (RRD is fixed-size).
  • Keep time sources synchronized (NTP) to avoid spurious spikes or UNKNOWN intervals.

Troubleshooting & tips

  • If graphs show UNKNOWN values, check update timing vs heartbeat and ensure timestamps are monotonic.
  • For counter wrap (32-bit counters), use COUNTER/DERIVE with appropriate consideration or use 64-bit counters if available.
  • Use rrdtool dump to export an RRD to XML for inspection or migration:
    
    rrdtool dump cpu.rrd > cpu.xml 
  • To restore or migrate, use rrdtool restore.
  • Test graph commands interactively; small syntax errors in DEF/CDEF are common sources of broken graphs.
  • If performance is an issue with many RRD files, batch graph generation or aggregate metrics upstream.

Example end-to-end workflow

  1. Create RRD:
    
    rrdtool create server.rrd --step 60  DS:cpu:GAUGE:120:0:100  DS:mem:GAUGE:120:0:U  RRA:AVERAGE:0.5:1:1440  RRA:AVERAGE:0.5:5:2016  RRA:MAX:0.5:60:8760 
  2. Feed data (cron or agent):
    
    rrdtool update server.rrd N:12.3:45.6 
  3. Generate daily graph:
    
    rrdtool graph server-day.png --start -86400  DEF:cpu=server.rrd:cpu:AVERAGE  DEF:mem=server.rrd:mem:AVERAGE  LINE2:cpu#FF0000:"CPU"  AREA:mem#0000FF:"Memory":STACK 

RRDtool remains a reliable choice when you need predictable storage, efficient archival of metrics, and scriptable graphing. Its learning curve centers on understanding DS/RRA design, the step/heartbeat model, and the RPN-like CDEF expressions — once you grasp those, RRDtool is a powerful component for monitoring pipelines.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *