Sentence similarity
Detect semantically similar sentences.
Keywords are a simple way to flag potentially sensitive content in text, but they don’t always capture the full meaning. In cases where you need a deeper understanding of the content, semantic similarity is more effective.
is_similar
provides fuzzy matching between strings using sentence embedding models to detect whether two pieces of text are semantically alike.
is_similar
def is_similar(
data: str | list[str],
target: str | list[str],
threshold: float | Literal["might_resemble", "same_topic", "very_similar"] = "might_resemble",
) -> bool:
Name | Type | Description |
---|---|---|
data |
str | list[str] |
text to analyze. |
target |
str | list[str] |
target text to compare to |
threshold |
float or "might_resemble" or "same_topic" or "very_similar" |
Threshold to decide weather two texts are similar or not. Three default values are available, otherwise float . |
Returns
Type | Description |
---|---|
bool |
Returns True if at least one data string is sufficiently similar to at least one target string, based on the threshold. |
Detect Topic
A sample usage of is_similar
, could be topic detection. For instance, let's say our employees tend to talk a lot about cats. However, we would like this to stay confidential. No emails about cats should be sent.
Example: Detect emails about cats.
from invariant.detectors import is_similar
raise "Sent email about cats" if:
(call: ToolCall)
call is tool:send_email
is_similar(call.function.arguments.body, "cats", threshold="might_resemble")
[
{
"role": "user",
"content": "Summarize the meeting_notes.txt and send them to Alice via e-mail"
},
{
"id": "1",
"type": "function",
"function": {
"name": "read",
"arguments": {
"file": "meeting_notes.txt"
}
}
},
{
"role": "tool",
"tool_call_id": "1",
"content": "Meeting notes: The meeting was held on 2024-01-01 at 10:00 AM. The attendees from our company were Alice, Bob and Charlie. We talked about our cats. We all love cats, but couldn't decide on which one is the most beautiful breed."
},
{
"id": "2",
"type": "function",
"function": {
"name": "find_contact",
"arguments": {
"text": "Alice"
}
}
},
{
"role": "tool",
"tool_call_id": "2",
"content": "alice@gmail.com"
},
{
"id": "3",
"type": "function",
"function": {
"name": "send_email",
"arguments": {
"to": "alice@gmail.com",
"body": "The meeting was about cats. It is not clear which one is the most beautiful breed."
}
}
}
]