Spry for DevOps & SRE

Operational Runbooks That Execute, Test, and Never Go Stale

Turn your operational procedures into executable, reproducible Markdown workflows.
From incident response to infrastructure automation, Spry keeps your runbooks reliable and your systems healthy.

What Are Executable Runbooks?

Traditional runbooks sit in wikis, slowly becoming outdated with every system change.
Spry transforms operational procedures into living documentation that executes, validates, and proves itself every time it runs.

Example: Incident Response Runbook

# Production API Health Check

## 1. Check service status

```bash --dep=none

curl -f https://api.example.com/health

```

## 2. Diagnose database connections

```sql --dep=1

SELECT COUNT(*) FROM pg_stat_activity;

```

## 3. Check error logs

```bash --dep=1

tail -n 100 /var/log/app/error.log

```

## 4. Verify remediation

```bash --dep=2,3

systemctl status app && curl -f /health

```

Each step documents why it exists, what it checks, and how to interpret results.
Dependencies ensure proper execution order. Everything is Git-versioned and auditable.

Always Tested

Runbooks that execute regularly never become outdated. Every procedure validates itself with real system interactions.

Always Reproducible

Every team member runs procedures the same way. No more "works on my machine" or forgotten manual steps.

Always Auditable

Every execution is logged and traceable. Compliance, post-mortems, and process improvements become straightforward.

From Incident Response to Infrastructure as Code

Spry works across the entire DevOps lifecycle. With built-in task orchestration, you control execution flow with dependencies, parallel processing, and conditional logic.

1. Detect

Monitor & alert

2. Diagnose

Gather evidence

3. Remediate

Apply fixes

4. Verify

Confirm resolution

5. Document

Auto-captured

Task Orchestration with Dependencies

Use the --dep flag to declare dependencies between tasks.
Spry handles parallel execution for independent operations and ensures proper ordering for dependent ones.

# Parallel health checks (no dependencies)
## Check API
```bash --dep=none
curl -f https://api.example.com/health
```

## Check database
```bash --dep=none
pg_isready -h db.example.com
```

## Check cache
```bash --dep=none
redis-cli ping
```

## Generate status report (depends on all checks)
```bash --dep=1,2,3
echo "All systems operational"
```

markdown

Built for Every Operational Scenario

Incident Response Runbooks

Automated diagnosis and remediation steps for common incidents. Reduce MTTR with proven procedures.

Health Check Automation

Regular system health validation with automatic evidence capture and alerting.

Deployment Procedures

Standardized release processes with pre-flight checks, rollout steps, and rollback procedures.

Database Maintenance

Scheduled backups, vacuum operations, index rebuilds, and migration scripts with validation.

Infrastructure Provisioning

Document and execute infrastructure setup with Terraform, cloud CLIs, and configuration management.

Security Compliance Checks

Automated security audits, vulnerability scans, and compliance validation with evidence trails.

Disaster Recovery Procedures

Tested recovery runbooks that work when you need them most. No surprises during incidents.

On-Call Playbooks

First-responder guides that execute diagnostics and suggest remediation steps automatically.

Why SREs and DevOps Teams Choose Spry

Traditional
Manual Runbooks

✗ Wiki pages that go stale
✗ Copy-paste errors in critical moments
✗ No validation until production fails
✗ Tribal knowledge, not team knowledge

Spry
Executable Runbooks

✓ Self-validating, always current
✓ Execute with confidence, every time
✓ Test in staging, run in production
✓ Git-versioned, reviewable, auditable

Markdown-Native

Write once, run anywhere. No proprietary formats.

Git-Versioned

Every change tracked. Roll back with confidence.

No Vendor Lock-in

Open source. Your runbooks, your control.

Multi-Language

Bash, SQL, Python, JSON. Use the right tool.

Realistic Example: Database Performance Incident

Here's how a real incident response runbook looks in Spry. Notice how documentation, diagnostics, and remediation live together:

# Database Performance Degradation Runbook

**Alert**: Query response times > 5s for 5+ minutes
**Owner**: Database SRE Team
**Last Updated**: 2025-01-15

## Step 1: Check current load

```sql --dep=none
SELECT 
  COUNT(*) as active_queries,
  MAX(now() - query_start) as longest_running
FROM pg_stat_activity 
WHERE state = 'active';
```

## Step 2: Identify slow queries

```sql --dep=1
SELECT pid, usename, query, now() - query_start as duration
FROM pg_stat_activity
WHERE state = 'active' 
  AND now() - query_start > interval '5 seconds'
ORDER BY duration DESC
LIMIT 10;
```

## Step 3: Check for locks

```sql --dep=1
SELECT blocked_locks.pid AS blocked_pid,
       blocking_locks.pid AS blocking_pid,
       blocked_activity.query AS blocked_query
FROM pg_locks blocked_locks
JOIN pg_stat_activity blocked_activity ON blocked_locks.pid = blocked_activity.pid
JOIN pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
WHERE NOT blocked_locks.granted;
```

## Step 4: Analyze table bloat

```sql --dep=1
SELECT schemaname, tablename, 
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
```

## Step 5: Generate remediation report

```bash --dep=2,3,4
echo "=== Incident Summary ==="
echo "Time: $(date)"
echo "Investigation complete. Review above results."
echo ""
echo "Common remediations:"
echo "- VACUUM ANALYZE for bloated tables"
echo "- Terminate blocking queries if safe"
echo "- Add missing indexes if query patterns show scans"
```

## Post-Incident Actions

- [ ] Document root cause in incident log
- [ ] Update monitoring thresholds if needed
- [ ] Schedule index maintenance if identified
- [ ] Review query patterns for optimization

markdown

Pro Tip: Run this runbook periodically in staging to ensure it stays valid. When an incident happens, you'll know exactly what to do.

Built for Operational Excellence

Reduce MTTR

Executable runbooks mean faster incident resolution. No fumbling with outdated procedures during critical moments.

Enable Knowledge Sharing

Operational procedures become team assets, not individual expertise. Onboard new team members with confidence.

Prove Compliance

Every runbook execution creates an audit trail. Demonstrate process adherence for security and regulatory requirements.

Build Reliable Operations with Spry

Transform your operational procedures into executable, reproducible workflows.
Start with one runbook today.

Operational Runbooks That Execute, Test, and Never Go Stale

What Are Executable Runbooks?

Example: Incident Response Runbook

Always Tested

Always Reproducible

Always Auditable

From Incident Response to Infrastructure as Code

1. Detect

2. Diagnose

3. Remediate

4. Verify

5. Document

Task Orchestration with Dependencies

Built for Every Operational Scenario

Incident Response Runbooks

Health Check Automation

Deployment Procedures

Database Maintenance

Infrastructure Provisioning

Security Compliance Checks

Disaster Recovery Procedures

On-Call Playbooks

Why SREs and DevOps Teams Choose Spry

Traditional Manual Runbooks

Spry Executable Runbooks

Markdown-Native

Git-Versioned

No Vendor Lock-in

Multi-Language

Realistic Example: Database Performance Incident

Built for Operational Excellence

Reduce MTTR

Enable Knowledge Sharing

Prove Compliance

Build Reliable Operations with Spry

Traditional
Manual Runbooks

Spry
Executable Runbooks