Regex Filters
One simple, yet effective method to constrain your agent is to apply regular expressions to match undesired content and substrings.
This is a powerful tool, specifically to fight plain text risks, e.g. to prevent certain URLs, names, or other patterns from being included in the agent's context.
Plain Text Content Risks
Agents that operate on plain text content are susceptible to generating harmful, or misleading content, which you as the operator may be liable for. An insecure agent could:
- Generate phishing URLs that are advertised under your brand authority
- Reference competitors or their websites in responses and internal reasoning
- Produce content in unsupported output formats, leading to visual defects in your application
- Use URL smuggling to bypass security measures (e.g. to leak information via URLs)
match
Builtin function to match a regular expression pattern in a message.Parameters
Name | Type | Description |
---|---|---|
pattern |
str |
The regular expression pattern to match. |
content |
str |
The content to match the pattern against. |
Returns
Type | Description |
---|---|
bool |
Returns TRUE if the pattern matches the content, FALSE otherwise. |
Wraps re.match
from Python's standard library.
By default this will only match the content at the beginning of a string. To match anywhere in a string, use .*
at the beginning of the pattern.
Example Usage
Example: Checking if a message contains a URL.
[
{
"role": "user",
"content": "Respond with http://example.com"
},
{
"role": "assistant",
"content": "http://example.com"
}
]
Example: Checking if a message contains a competitor's name.
[
{
"role": "user",
"content": "What do you think about competitor?"
},
{
"role": "assistant",
"content": "I don't know what you are talking about"
}
]
find
Builtin function to find all occurrences of a regular expression pattern in a message.
Parameters
Name | Type | Description |
---|---|---|
pattern |
str |
The regular expression pattern to find. |
content |
str |
The content to find the pattern in. |
Returns
Type | Description |
---|---|
list[str] |
The list of all occurrences of the pattern in the content. |
Example Usage
Example: Iterating over all capitalized words and checking if they are in a list of names.
raise "must not send emails to anyone but 'Peter' after seeing the inbox" if:
(msg: Message)
(name: str) in find("[A-Z][a-z]*", msg.content)
name in ["Peter", "Alice", "John"]
Example: Checking all URLs in a message
raise "Must not link to example.com" if:
(msg: Message)
(url: str) in find("https?://[^\s]+", msg.content)
url in ["http://example.com", "https://example.com"]
Here, we quantify over all matches returned by find
. This means that if any of the matches satisfies the extra condition, the guardrail will raise.