Guardrails

Safety controls that ensure your AI behaves responsibly and stays compliant.

Guardrails are safety controls that ensure your AI behaves responsibly. They run automatically to protect against misuse and ensure compliance with your organization's policies.

Why Guardrails Matter

  • Prevent harmful or inappropriate outputs
  • Protect against prompt injection attacks
  • Ensure compliance with regulations
  • Maintain brand safety and consistency

Guardrail Types

Filters

Block inappropriate or harmful content in both inputs and outputs.

Filter TypeWhat It Does
ToxicityBlocks harmful, offensive, or abusive content
PII DetectionIdentifies and redacts personal information
ProfanityFilters inappropriate language
Custom RulesYour own content policies

Prompt Attack Protection

Detect and prevent prompt injection attempts:

  • Jailbreak detection
  • Instruction override attempts
  • Malicious prompt patterns
  • System prompt extraction attempts

Denied Topics

Restrict discussion of specific topics:

  • Competitors
  • Legal advice
  • Medical diagnoses
  • Custom restricted topics

Grounding

Verify AI responses against trusted sources:

  • Fact-checking against your knowledge base
  • Citation requirements
  • Source verification
  • Hallucination detection

Safety Layers

Guardrails operate at multiple stages:

LayerWhen It Runs
Pre-ProcessingBefore input reaches the AI model
RuntimeDuring model execution
Post-ProcessingBefore output is returned
AuditLogging all safety checks

Configuring Guardrails

Set guardrails at different levels:

  • Workflow level — Apply to specific workflows
  • Workspace level — Default for all workflows in a workspace
  • Organization level — Policies across all workspaces