Guardrails
Safety controls that ensure your AI behaves responsibly and stays compliant.
Guardrails are safety controls that ensure your AI behaves responsibly. They run automatically to protect against misuse and ensure compliance with your organization's policies.
Why Guardrails Matter
- Prevent harmful or inappropriate outputs
- Protect against prompt injection attacks
- Ensure compliance with regulations
- Maintain brand safety and consistency
Guardrail Types
Filters
Block inappropriate or harmful content in both inputs and outputs.
| Filter Type | What It Does |
|---|---|
| Toxicity | Blocks harmful, offensive, or abusive content |
| PII Detection | Identifies and redacts personal information |
| Profanity | Filters inappropriate language |
| Custom Rules | Your own content policies |
Prompt Attack Protection
Detect and prevent prompt injection attempts:
- Jailbreak detection
- Instruction override attempts
- Malicious prompt patterns
- System prompt extraction attempts
Denied Topics
Restrict discussion of specific topics:
- Competitors
- Legal advice
- Medical diagnoses
- Custom restricted topics
Grounding
Verify AI responses against trusted sources:
- Fact-checking against your knowledge base
- Citation requirements
- Source verification
- Hallucination detection
Safety Layers
Guardrails operate at multiple stages:
| Layer | When It Runs |
|---|---|
| Pre-Processing | Before input reaches the AI model |
| Runtime | During model execution |
| Post-Processing | Before output is returned |
| Audit | Logging all safety checks |
Configuring Guardrails
Set guardrails at different levels:
- Workflow level — Apply to specific workflows
- Workspace level — Default for all workflows in a workspace
- Organization level — Policies across all workspaces