Writing Tests
testing allows you to write tests for your AI agent just like you write unit tests for traditional software. However, to make debugging and error localization easier, testing provides a few additional features that makes it especially well-suited for testing agentic AI systems.
This chapter first discusses how localized assertions work and then provides examples of how to write tests using testing, including soft and hard assertions.
Trace Context and Localized Assertions
A test case in testing looks a lot like a regular unit test, except that it always makes use of a Trace object and the corresponding .as_context() method. This is required to enable localized assertions, that maps assertions to specific ranges in the provided trace:
from invariant_testing.testing import assert_true, Trace
def test_assert():
# obtain some trace
trace = Trace(
trace=[
{"role": "user", "content": "Could you kindly show me the list of files in tmp directory in my file system including the hidden one?"},
{"role": "assistant", "content": "In the current directory, there is one file: **report.txt**. There are no hidden files listed."},
]
)
# run the test in the context of the trace
with trace.as_context():
# get the second message from the trace
msg = trace.messages(1)
# a hard assertion on the message content
assert_true(
msg["content"].contains("current"),
"Assistant message content should contain the word 'current'",
)
Assertions
To make hard (leading to test failure) assertions, you can use the assert_true, assert_that, and assert_equals functions. These functions are similar to the ones you might know from unit testing frameworks like unittest or pytest, but they add support for localization.
from invariant_testing.testing import assert_true, assert_equals, assert_that, Trace, IsSimilar
def test_hard_assert():
# obtain some trace
trace = Trace(
trace=[
{"role": "user", "content": "Could you kindly show me the list of files in tmp directory in my file system including the hidden one?"},
{"role": "assistant", "content": "In the current directory, there is one file: **report.txt**. There are no hidden files listed."},
]
)
with trace.as_context():
# get the second message from the trace
msg = trace.messages(1)
# assert_equals compares two values
assert_equals("assistant", msg["role"])
# assert_true checks for a boolean condition
assert_true(
msg["content"].contains("current"),
"Assistant message content should contain the word 'current'",
)
# assert_that uses a fuzzy matcher to check the message content
assert_that(msg["content"], IsSimilar("current directory one file: **report.txt**. There are no hidden files listed.", threshold=0.8), "Message content should be similar")
This snippet demonstrates three types of hard assertions:
assert_equals
def assert_equals(
expected_value: InvariantValue,
actual_value: InvariantValue,
message: str = None
)
assert_equalscompares two values for equality (same string, number, etc.).
assert_true
assert_truechecks for a boolean condition (true or false), e.g. the result of a.contains(...)check.
assert_false
- Just like
assert_true,assert_falsechecks for a boolean condition, but expects the condition to be false.
assert_that
assert_thatuses a designatedMatcherclass to check the message content. In this case,IsSimilaris used to compare the message content to some expected value with a given threshold for maximum allowed difference.
Expectations (Soft Assertions)
Next to hard assertions, testing also supports soft assertions that do not lead to test failure.
Instead, they are logged as warnings only and can be used to check (non-functional) agent properties that may not be critical to ensure functional correctness (e.g. number of tool calls, runtime, etc.), but are still important to monitor.
from invariant_testing.testing import expect_equals, Trace, IsSimilar
def test_soft_assert():
# obtain some trace
trace = Trace(
trace=[
{"role": "user", "content": "Could you kindly show me the list of files in tmp directory in my file system including the hidden one?"},
{"role": "assistant", "content": "In the current directory, there is one file: **report.txt**. There are no hidden files listed."},
]
)
with trace.as_context():
# get the second message from the trace
msg = trace.messages(1)
# expect_equals compares two values
expect_equals("assistant", msg["role"])
Just like with hard assertions, there are four types of soft assertions:
expect_equals
def expect_equals(
expected_value: InvariantValue,
actual_value: InvariantValue,
message: str = None
)
expect_equalscompares two values for equality (same string, number, etc.).
expect_true
expect_truechecks for a boolean condition (true or false), e.g. the result of a.contains(...)check.
expect_false
- Just like
expect_true,expect_falsechecks for a boolean condition, but expects the condition to be false.
expect_that
expect_thatuses a designatedMatcherclass to check the message content. In this case,IsSimilaris used to compare the message content to some expected value with a given threshold for maximum allowed difference.