Amazon Bedrock Guardrails

Implement safeguards customized to your application requirements and responsible AI policies

Build responsible AI applications with Guardrails

Amazon Bedrock Guardrails provides configurable safeguards to help safely build generative AI applications at scale. With a consistent and standard approach used across all supported foundation models (FMs), Guardrails delivers industry-leading safety protections:

  • Uses Automated Reasoning to help prevent factual errors from hallucinations – being the first and only generative AI safeguard to do so
  • Blocks up to 85% more undesirable and harmful content
  • Filters over 75% hallucinated responses from models for Retrieval Augmented Generation (RAG) and summarization use cases

Bring a consistent level of safety across gen AI applications

Guardrails is the only responsible AI capability offered by a major cloud provider that helps you build and customize safety, privacy, and truthfulness safeguards for your generative AI applications within a single solution. Guardrails helps evaluate user inputs and model responses based on use case–specific policies and provides an additional layer of safeguards on top of those natively provided by FMs. Guardrails works with a wide range of models, including FMs supported in Amazon Bedrock, fine-tuned models, and self-hosted models outside of Amazon Bedrock. User inputs and model outputs can be independently evaluated for third-party and self-hosted models using the ApplyGuardrail API.  Guardrails can also be integrated with Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases to build safer and more secure generative AI applications aligned with responsible AI policies.

UI screenshot

Detect hallucinations in model responses using contextual grounding checks

Customers need to deploy truthful and trustworthy generative AI applications to maintain and grow users’ trust. However, FMs can generate incorrect information due to hallucinations i.e. deviating from the source information, conflating multiple pieces of information, or inventing new information. Amazon Bedrock Guardrails supports contextual grounding checks to help detect and filter hallucinations if the responses are not grounded (e.g., factually inaccurate or new information) in the source information and irrelevant to a user’s query or instruction. Contextual grounding checks can help detect hallucinations for RAG, summarization, and conversational applications, where the source information can be used as a reference to validate the model response.

UI screenshot

Automated Reasoning checks help prevent factual errors from hallucinations and offer verifiable accuracy

Automated Reasoning checks (preview) in Amazon Bedrock Guardrails is the first and only generative AI safeguard that helps prevent factual errors from hallucinations using logically accurate and verifiable reasoning that explains why responses are correct. Automated Reasoning helps mitigate hallucinations using sound mathematical techniques to verify, correct, and logically explain the information generated—ensuring that outputs align with known facts and are not based on fabricated or inconsistent data. Developers can create an Automated Reasoning policy by uploading an existing document that defines the right solution space, such as an HR guideline or an operational manual. Amazon Bedrock then generates a unique Automated Reasoning policy and guides users through testing and refining it. To validate generated content against an Automated Reasoning policy, users need to enable the policy in Guardrails and configure it with a list of unique Amazon Resource Names (ARNs). This logic-based algorithmic verification process ensures that the information generated by a model aligns with known facts and is not based on fabricated or inconsistent data. These checks deliver provably truthful responses from generative AI models, enabling software vendors to enhance the reliability of their applications for use cases in HR, finance, legal, compliance, and more.

Video

Block undesirable topics in gen AI applications

Organizational leaders recognize the need to manage interactions within generative AI applications for a relevant and safe user experience. They want to further customize interactions to remain focused on topics relevant to their business and align with company policies. Using a short natural language description, Guardrails helps you to define a set of topics to avoid within the context of your application. Guardrails helps detect and block user inputs and FM responses that fall into the restricted topics. For example, a banking assistant can be designed to avoid topics related to investment advice.

UI screenshot

Filter harmful multimodal content based on your responsible AI policies

Guardrails provides content filters with configurable thresholds for toxic text and image content. The safeguard helps filter harmful content containing topics such as hate speech, insults, sex, violence, and misconduct (including criminal activity) and help protect against prompt attacks (prompt injection and jailbreak). The capability to detect and filter out undesirable and potentially harmful image content is currently available in preview for hate, insults, sexual, and violence categories, and is supported for all FMs in Amazon Bedrock that support images including fine-tuned FMs. Content Filters automatically evaluate both user input and model responses to detect and help prevent undesirable and potentially harmful content. For example, an ecommerce site can design its online assistant to avoid using inappropriate language, such as hate speech or insults.

UI screenshot

Redact sensitive information such as PII to protect privacy

Guardrails helps you detect sensitive content such as personally identifiable information (PII) in user inputs and FM responses. You can select from a list of predefined PII or define a custom sensitive-information type using regular expressions (RegEx). Based on the use case, you can selectively reject inputs containing sensitive information or redact them in FM responses. For example, you can redact users’ personal information while generating summaries from customer and agent conversation transcripts in a call center.

UI screenshot