Guardrails

Safety controls that ensure your AI behaves responsibly and stays compliant.

Guardrails are safety controls that ensure your AI behaves responsibly. They run automatically to protect against misuse and ensure compliance with your organization's policies.

Why Guardrails Matter

Prevent harmful or inappropriate outputs
Protect against prompt injection attacks
Ensure compliance with regulations
Maintain brand safety and consistency

Guardrail Types

Filters

Block inappropriate or harmful content in both inputs and outputs.

Filter Type	What It Does
Toxicity	Blocks harmful, offensive, or abusive content
PII Detection	Identifies and redacts personal information
Profanity	Filters inappropriate language
Custom Rules	Your own content policies

Prompt Attack Protection

Detect and prevent prompt injection attempts:

Jailbreak detection
Instruction override attempts
Malicious prompt patterns
System prompt extraction attempts

Denied Topics

Restrict discussion of specific topics:

Competitors
Legal advice
Medical diagnoses
Custom restricted topics

Grounding

Verify AI responses against trusted sources:

Fact-checking against your knowledge base
Citation requirements
Source verification
Hallucination detection

Safety Layers

Guardrails operate at multiple stages:

Layer	When It Runs
Pre-Processing	Before input reaches the AI model
Runtime	During model execution
Post-Processing	Before output is returned
Audit	Logging all safety checks

Configuring Guardrails

Set guardrails at different levels:

Workflow level — Apply to specific workflows
Workspace level — Default for all workflows in a workspace
Organization level — Policies across all workspaces

PreviousConfigurable Components

NextConfig & Secrets