Skip to main content
When a resource fails, pragma captures the failure details and gives you the tools to diagnose, fix, and retry. This guide walks through identifying failures, understanding their causes, and getting your infrastructure back to a healthy state.

Identifying Failed Resources

Resources in a failed state appear with [FAILED] when listed:
pragma resources list
Output shows lifecycle states for all resources:
gcp/storage/data-lake [READY]
agno/agent/assistant [FAILED]
agno/team/my-team [PENDING]
To see details about a specific failed resource:
pragma resources describe agno/agent assistant

Automatic Recovery

Stuck Resource Detection

pragma-os automatically detects resources stuck in processing state. If a provider fails to respond within the expected timeframe, the platform:
  1. Detects the unresponsive operation
  2. Retries the operation automatically
  3. If retries are exhausted, moves the resource to failed state with an error message
You don’t need to manually intervene for transient failures — the platform handles retries for you.

Idempotent Apply

Re-applying a resource with the same configuration is safe. If the resource is already ready with identical config, pragma-os returns the existing resource without re-processing. This makes it safe to re-run pragma resources apply in scripts without side effects.

Common Failure Scenarios

Configuration Errors

The most common cause of failure is invalid configuration:
  • Missing required fields — A required config value is missing
  • Invalid values — A value doesn’t match the expected format or constraints
  • Permission errors — The provider doesn’t have access to create or modify the resource
Recovery: Fix the configuration in your YAML file and re-apply:
pragma resources apply fixed-resource.yaml

Dependency Failures

A resource can fail or wait if its dependencies aren’t satisfied:
  • Missing dependency — A referenced resource doesn’t exist yet (resource waits in pending)
  • Dependency failed — A dependency exists but is in failed state
  • Invalid field reference — A FieldReference points to a field that doesn’t exist in the dependency’s outputs
Recovery: Ensure all dependencies are in ready state:
# Check dependency status
pragma resources describe agno/anthropic-model claude

# If dependency is failed, fix and re-apply
pragma resources apply model.yaml
Once the dependency reaches ready, pending dependents are automatically triggered.

Provider Errors

Sometimes the underlying provider rejects the operation:
  • Quota exceeded — You’ve hit a service limit
  • Resource conflicts — A resource with that name already exists outside pragma
  • Service unavailable — Temporary provider outage
Recovery: Address the provider-specific issue, then retry the resource.

Using the Dead Letter Queue

When a resource operation fails after retries, it moves to the dead letter queue. This prevents failed operations from blocking other work and preserves the failure details for investigation.

List Failed Events

See all failed events:
pragma ops dead-letter list
Output shows a table with event details:
Event ID    Provider   Resource Type   Resource Name   Error Message              Failed At
evt_abc123  agno       agent           assistant       Permission denied: ...     2026-01-15 10:30:00
evt_def456  gcp        storage         backup          Quota exceeded: ...        2026-01-15 10:32:00
Filter by provider to focus on specific failures:
pragma ops dead-letter list --provider agno

Inspect Event Details

Get the full error message and context:
pragma ops dead-letter show evt_abc123
This returns the complete event data including:
  • The resource that failed
  • The full error message
  • When the failure occurred
  • The operation that was attempted

Retry Failed Events

After fixing the underlying issue, retry the failed operation:
pragma ops dead-letter retry evt_abc123
Or retry all failed events at once:
pragma ops dead-letter retry --all

Clear Resolved Events

Once you’ve addressed failures (or decided to abandon them), remove events from the queue:
# Delete a single event
pragma ops dead-letter delete evt_abc123

# Delete all events for a provider
pragma ops dead-letter delete --provider gcp

# Delete all events
pragma ops dead-letter delete --all

Dependency Failure Cascades

When a resource fails, it affects downstream resources:
  1. Failed resources stay failed — They don’t retry automatically (but stuck processing resources do get auto-detected)
  2. Pending dependents wait — Resources waiting for a failed dependency remain in pending until the dependency becomes ready
  3. Ready dependents are notified — When you fix a dependency and it reaches ready, all its dependents are automatically re-processed
Consider this dependency chain:
secret (READY) → model (FAILED) → agent (PENDING)
The agent resource can’t proceed because model is failed. To recover:
  1. Fix the model configuration
  2. Re-apply to retry
  3. Once model reaches ready, agent automatically proceeds

Recovery Workflow

When you encounter failures, follow this workflow:
1

Identify failures

pragma resources list
pragma ops dead-letter list
2

Investigate root cause

pragma resources describe <provider>/<resource> <name>
pragma ops dead-letter show <event-id>
3

Fix the issue

Update your YAML configuration, fix permissions, or address provider limits.
4

Retry

pragma resources apply fixed-resource.yaml
pragma ops dead-letter retry <event-id>
5

Verify

pragma resources describe <provider>/<resource> <name>
Confirm the resource reaches ready state.

Preventing Failures

Reduce failures by:
  • Using draft mode — Apply with --draft first to validate configuration, then re-apply without --draft to deploy
  • Checking dependencies — Use pragma resources describe to verify dependencies are ready
  • Applying in bulk — Apply all resources in a single YAML file; pragma-os resolves the dependency order automatically
  • Monitoring the dead letter queue regularly for early warning of issues

Next Steps

Common Issues

Solutions to frequent problems.

Resource Lifecycle

Understanding resource states.