Writing Tests
testing
allows you to write tests for your AI agent just like you write unit tests for traditional software. However, to make debugging and error localization easier, testing
provides a few additional features that makes it especially well-suited for testing agentic AI systems.
This chapter first discusses how localized assertions work and then provides examples of how to write tests using testing
, including soft and hard assertions.
Trace Context and Localized Assertions
A test case in testing
looks a lot like a regular unit test, except that it always makes use of a Trace
object and the corresponding .as_context()
method. This is required to enable localized assertions, that maps assertions to specific ranges in the provided trace:
from invariant.testing import assert_true, Trace
def test_assert():
# obtain some trace
trace = Trace(
trace=[
{"role": "user", "content": "Could you kindly show me the list of files in tmp directory in my file system including the hidden one?"},
{"role": "assistant", "content": "In the current directory, there is one file: **report.txt**. There are no hidden files listed."},
]
)
# run the test in the context of the trace
with trace.as_context():
# get the second message from the trace
msg = trace.messages(1)
# a hard assertion on the message content
assert_true(
msg["content"].contains("current"),
"Assistant message content should contain the word 'current'",
)
Assertions
To make hard (leading to test failure) assertions, you can use the assert_true
, assert_that
, and assert_equals
functions. These functions are similar to the ones you might know from unit testing frameworks like unittest
or pytest
, but they add support for localization.
from invariant.testing import assert_true, assert_equals, assert_that, Trace, IsSimilar
def test_hard_assert():
# obtain some trace
trace = Trace(
trace=[
{"role": "user", "content": "Could you kindly show me the list of files in tmp directory in my file system including the hidden one?"},
{"role": "assistant", "content": "In the current directory, there is one file: **report.txt**. There are no hidden files listed."},
]
)
with trace.as_context():
# get the second message from the trace
msg = trace.messages(1)
# assert_equals compares two values
assert_equals("assistant", msg["role"])
# assert_true checks for a boolean condition
assert_true(
msg["content"].contains("current"),
"Assistant message content should contain the word 'current'",
)
# assert_that uses a fuzzy matcher to check the message content
assert_that(msg["content"], IsSimilar("current directory one file: **report.txt**. There are no hidden files listed.", threshold=0.8), "Message content should be similar")
This snippet demonstrates three types of hard assertions:
assert_equals
def assert_equals(
expected_value: InvariantValue,
actual_value: InvariantValue,
message: str = None
)
assert_equals
compares two values for equality (same string, number, etc.).
assert_true
assert_true
checks for a boolean condition (true or false), e.g. the result of a.contains(...)
check.
assert_false
- Just like
assert_true
,assert_false
checks for a boolean condition, but expects the condition to be false.
assert_that
assert_that
uses a designatedMatcher
class to check the message content. In this case,IsSimilar
is used to compare the message content to some expected value with a given threshold for maximum allowed difference.
Expectations (Soft Assertions)
Next to hard assertions, testing
also supports soft assertions that do not lead to test failure.
Instead, they are logged as warnings only and can be used to check (non-functional) agent properties that may not be critical to ensure functional correctness (e.g. number of tool calls, runtime, etc.), but are still important to monitor.
from invariant.testing import expect_equals, Trace, IsSimilar
def test_soft_assert():
# obtain some trace
trace = Trace(
trace=[
{"role": "user", "content": "Could you kindly show me the list of files in tmp directory in my file system including the hidden one?"},
{"role": "assistant", "content": "In the current directory, there is one file: **report.txt**. There are no hidden files listed."},
]
)
with trace.as_context():
# get the second message from the trace
msg = trace.messages(1)
# expect_equals compares two values
expect_equals("assistant", msg["role"])
Just like with hard assertions, there are four types of soft assertions:
expect_equals
def expect_equals(
expected_value: InvariantValue,
actual_value: InvariantValue,
message: str = None
)
expect_equals
compares two values for equality (same string, number, etc.).
expect_true
expect_true
checks for a boolean condition (true or false), e.g. the result of a.contains(...)
check.
expect_false
- Just like
expect_true
,expect_false
checks for a boolean condition, but expects the condition to be false.
expect_that
expect_that
uses a designatedMatcher
class to check the message content. In this case,IsSimilar
is used to compare the message content to some expected value with a given threshold for maximum allowed difference.