Turn your operational procedures into executable, reproducible Markdown workflows.
From incident response to infrastructure automation, Spry keeps your runbooks reliable and your systems healthy.
Traditional runbooks sit in wikis, slowly becoming outdated with every system change.
Spry transforms operational procedures into living documentation that executes, validates, and proves itself every time it runs.
Each step documents why it exists, what it checks, and how to interpret results.
Dependencies ensure proper execution order. Everything is Git-versioned and auditable.
Runbooks that execute regularly never become outdated. Every procedure validates itself with real system interactions.
Every team member runs procedures the same way. No more "works on my machine" or forgotten manual steps.
Every execution is logged and traceable. Compliance, post-mortems, and process improvements become straightforward.
Spry works across the entire DevOps lifecycle. With built-in task orchestration, you control execution flow with dependencies, parallel processing, and conditional logic.
Monitor & alert
Gather evidence
Apply fixes
Confirm resolution
Auto-captured
Use the --dep flag to declare dependencies between tasks.
Spry handles parallel execution for independent operations and ensures proper ordering for dependent ones.
# Parallel health checks (no dependencies)
## Check API
```bash --dep=none
curl -f https://api.example.com/health
```
## Check database
```bash --dep=none
pg_isready -h db.example.com
```
## Check cache
```bash --dep=none
redis-cli ping
```
## Generate status report (depends on all checks)
```bash --dep=1,2,3
echo "All systems operational"
```Automated diagnosis and remediation steps for common incidents. Reduce MTTR with proven procedures.
Regular system health validation with automatic evidence capture and alerting.
Standardized release processes with pre-flight checks, rollout steps, and rollback procedures.
Scheduled backups, vacuum operations, index rebuilds, and migration scripts with validation.
Document and execute infrastructure setup with Terraform, cloud CLIs, and configuration management.
Automated security audits, vulnerability scans, and compliance validation with evidence trails.
Tested recovery runbooks that work when you need them most. No surprises during incidents.
First-responder guides that execute diagnostics and suggest remediation steps automatically.
Write once, run anywhere. No proprietary formats.
Every change tracked. Roll back with confidence.
Open source. Your runbooks, your control.
Bash, SQL, Python, JSON. Use the right tool.
Here's how a real incident response runbook looks in Spry. Notice how documentation, diagnostics, and remediation live together:
# Database Performance Degradation Runbook
**Alert**: Query response times > 5s for 5+ minutes
**Owner**: Database SRE Team
**Last Updated**: 2025-01-15
## Step 1: Check current load
```sql --dep=none
SELECT
COUNT(*) as active_queries,
MAX(now() - query_start) as longest_running
FROM pg_stat_activity
WHERE state = 'active';
```
## Step 2: Identify slow queries
```sql --dep=1
SELECT pid, usename, query, now() - query_start as duration
FROM pg_stat_activity
WHERE state = 'active'
AND now() - query_start > interval '5 seconds'
ORDER BY duration DESC
LIMIT 10;
```
## Step 3: Check for locks
```sql --dep=1
SELECT blocked_locks.pid AS blocked_pid,
blocking_locks.pid AS blocking_pid,
blocked_activity.query AS blocked_query
FROM pg_locks blocked_locks
JOIN pg_stat_activity blocked_activity ON blocked_locks.pid = blocked_activity.pid
JOIN pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
WHERE NOT blocked_locks.granted;
```
## Step 4: Analyze table bloat
```sql --dep=1
SELECT schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
```
## Step 5: Generate remediation report
```bash --dep=2,3,4
echo "=== Incident Summary ==="
echo "Time: $(date)"
echo "Investigation complete. Review above results."
echo ""
echo "Common remediations:"
echo "- VACUUM ANALYZE for bloated tables"
echo "- Terminate blocking queries if safe"
echo "- Add missing indexes if query patterns show scans"
```
## Post-Incident Actions
- [ ] Document root cause in incident log
- [ ] Update monitoring thresholds if needed
- [ ] Schedule index maintenance if identified
- [ ] Review query patterns for optimizationPro Tip: Run this runbook periodically in staging to ensure it stays valid. When an incident happens, you'll know exactly what to do.
Executable runbooks mean faster incident resolution. No fumbling with outdated procedures during critical moments.
Operational procedures become team assets, not individual expertise. Onboard new team members with confidence.
Every runbook execution creates an audit trail. Demonstrate process adherence for security and regulatory requirements.