Guardrails in mcp-scan
The mcp-scan proxy
command supports dynamic guardrailing for MCP servers, letting you enforce safety rules during tool use. It comes with a set of default guardrails and allows you to define custom behavior through a configuration file.
This chapter covers how to structure guardrail configuration files, write custom rules, and apply enforcement at the client, server, and tool levels.
Note
By default, the configuration file is located at ~/.mcp-scan/guardrails-config.yml
.
Get Started Directly
Just looking to get started quickly? The following example provides a quick way to get started with writing your very own config file. Just copy it into your config file and replace the client and server names.
<client-name>: # your client's shorthand (e.g., cursor, claude, windsurf)
<server-name>: # your server's name according to the mcp config (e.g., whatsapp-mcp)
guardrails:
secrets: block # block calls/results with secrets
custom_guardrails:
# define a rule using Invariant Guardrails, https://explorer.invariantlabs.ai/docs/guardrails/
- name: "Filter tool results with 'error'"
id: "error_filter_guardrail"
action: block # or 'log'
content: |
raise "An error was found." if:
(msg: ToolOutput)
"error" in msg.content
File structure
The configuration file defines guardrailing behavior hierarchically, scoped by client, server, and tool. Below is a structured overview of the YAML format:
<client-name>:
<server-name>:
guardrails:
<default-guardrail-name>: <guardrail-action>
...
custom_guardrails:
- name: <guardrail-name>
id: <guardrail-id>
action: <guardrail-action>
content: |
<guardrail-content>
...
tools:
<tool-name>:
<default-guardrail-name>: <guardrail-action>
...
enabled: <boolean>
...
...
Each configuration block defines a set of default, custom, and tool-level guardrails for a specific <client> / <server>
combination (e.g., a client like Cursor and a server like a Git MCP instance).
The ellipses (...
) in the schema indicate that multiple entries can be added at each level:
-
Multiple servers under a single client.
-
Multiple guardrails under each guardrails section.
-
Multiple tools under a server.
This structure supports flexible, fine-grained control across different environments and toolchains. The sections of the file are described below.
Default guardrails
Default guardrails are pre-configured and run by default with the log
action. Alternatively, their behavior can be overridden by specifying <default-guardrail-name>: <guardrail-action>
as shown above. The following default guardrails exist and can be configured:
Name | Requirements | Description |
---|---|---|
links |
None |
Detects links. |
secrets |
None |
Detects secrets, such as tokens or api keys. |
pii |
presidio , transformers |
Detects personally identifiable information, such as names, emails, or credit card numbers. |
moderated |
ENV: OPENAI_API_KEY |
Detects content that should be moderated, such as hate speech or explicit content. |
Note
Some default guardrails require optional extra packages to run and are disabled if not installed. You can install them with the argument --install-extras [list of extras]
to install a specific set of extras or --install-extras all
to install all extras.
Example: Overriding a default guardrail.
Custom guardrails
Custom guardrails allow you to define rules tailored to specific workflows, data patterns, or business logic. These guardrails offer flexible semantics and can detect complex or domain-specific behaviors that aren't covered by the built-in defaults.
To get started writing custom rules, refer to the rule writing reference to get started quickly with writing guardrails, or explore the rest of this documentation to learn about the concepts in depth, perhaps starting with this introduction.
Schema
A custom guardrail is defined using the following fields:
Name | Type | Description |
---|---|---|
name |
string |
A name for the guardrail. |
id |
string |
A unique identifier for the guardrail. |
action |
Guardrail Action |
The action to take when this guardrail is triggered. See below. |
content |
string |
A multiline string defining the semantics of this guardrailing rule. Please refer to the rest of this documentation to see how to write guardrails. |
Example: Defining a custom guardrail.
Below is a yaml
fragment that shows how a custom guardrail can be defined to ensure emails are only sent to trusted recipients.
...
custom_guardrails:
- name: "Trusted Recipient Email"
id: "trusted_email_check"
action: block
content: |
raise "Untrusted email recipient" if:
(call: ToolCall)
call is tool:send_email
not match(".*@company.com", call.function.arguments.recipient)
...
Tool-specific guardrails
Tool-specific guardrails allow you to override or refine guardrailing behavior for individual tools within a given server. This is useful when certain tools require stricter enforcement, relaxed rules, or even complete disabling.
Schema
The following fields are available for each tool
:
Name | Type | Description |
---|---|---|
<default-guardrail> |
Guardrail Action |
Overrides the behavior of <default-guardrail> for this tool specifically. |
enabled |
boolean |
Disables tool when set to False. If this field is not provided, it defaults to True. |
Note
While the tools section does not allow for defining custom guardrails, you can always use the call is tool:<tool_name>
syntax to select a specific tool, as shown in the example from the previous section.
Example: Disabling and overriding behavior for tools.
The yaml
fragment below demonstrates how to disable the send_message
tool and override the behavior for the secrets
default guardrail to be block
specifically for the read_messages
tool.
Guardrail actions
Guardrail actions define how the system responds when a guardrail is triggered. These actions let you tailor enforcement severity across different contexts - whether to silently monitor behavior, stop execution entirely, or pause enforcement temporarily.
The following actions are available:
Name | Description |
---|---|
block |
This will block the client outright, preventing it from executing what triggered the guardrail. |
log |
This lets the client execute what triggered the guardrail, but logs it in the console, so you can monitor the violation patterns without blocking the client's workflow. |
paused |
This pauses the enforcement of the guardrail. |
These actions are available for both default and custom guardrails.
Priority of rules
When multiple guardrail rules are defined for the same context (e.g., a default guardrail set at both the server and tool level), the system follows a strict priority hierarchy to determine which rule takes effect.
This ensures predictable behavior and allows for precise control, especially when configuring environments with overlapping scopes.
Note
Custom guardrails are not subject to this priority system. Once defined, they are always active and enforced with their specified action, regardless of other configurations.
Rule Priority (from highest to lowest)
- Tool-specific guardrails
Guardrails defined for a specific tool override all other configurations. - Client/server-specific guardrails
Guardrails set at the server level for a particular client are used unless overridden by a tool-specific rule. - Implicit default guardrails
If no explicit rule is defined, the default guardrails apply automatically with thelog
action.
Example: A simple config file with overrides.
To see how this hierarchy of precedence works, consider the following example configuration:
The resulting behavior of this configuration is:
-
secrets
is set to block for thetool
, because tool-specific guardrails have the highest priority. -
For all other tools on
server
, secrets is set topaused
, as defined at the client/server level. -
For any other client/server combinations not defined in the config,
secrets
will use the implicit default and be set tolog
. -
pii
is set to block across all tools underserver
due to the client/server setting. For all other combinations, it falls back to the implicit default (log
).
This precedence ensures that you can enforce strict guardrails where needed, while still benefiting from broad default coverage elsewhere.
Example file
Below is a complete example of a guardrail configuration file. It demonstrates how to define default and custom guardrails for specific clients and servers, apply tool-level overrides, and selectively enable or disable tools
cursor:
email-mcp-server:
# Customize the guardrailing for this specific server
guardrails:
pii: block
moderated: paused
# Define multiple custom guardrails
custom_guardrails:
- name: "Trusted Recipient Email"
id: "untrustsed_email_gr_1"
action: block
# Guardrail to ensure that we know all recipients
content: |
raise "Untrusted email recipient" if:
(call: ToolCall)
call is tool:send_email
not match(".*@company.com", call.function.arguments.recipient)
# Guardrail to ensure an email is not sent after
# a prompt injection is detected in the inbox
- name: "PII Email"
id: "untrustsed_email_gr_2"
action: log
content: |
from invariant.detectors import prompt_injection
raise "Suspicious email before send" if:
(inbox: ToolOutput) -> (call: ToolCall)
inbox is tool:get_inbox
call is tool:send_email
prompt_injection(inbox.content)
# Specify the behavior of individual tools
tools:
send_message:
enabled: false
read_messages:
secrets: block
weather:
guardrails:
moderated: paused
# Separate configurations on a per client/server basis
claude:
git-mcp-server:
tools:
commit-tool:
links: paused