Arroyo is a new distributed stream processing engine written in Rust that allows users to build and execute stateful, real-time computations on high-volume streaming data. As is becoming increasingly standard, users of Arroyo build out their pipelines by writing SQL queries against their streaming data. Today I want to dive into the techniques we use to test our SQL processing logic.
After Arroyo parses and plans a SQL query, it generates Rust code that is then compiled into the final pipeline binary. This is a pattern seen in a number of other systems, including Spark, Presto, and Calcite1. Code generation results in wickedly fast computation, especially with a compiler as powerful as Rust's.
But it also introduces challenges for developing and testing our engine code. Because much of the code is generated at runtime, we can't take advantage of Rust's many compile-time correctness checks. In the worst case we may end up generating invalid code, leading to compilation errors when a user tries to run a query.
Fortunately, Rust has another trick up its sleeve: a flexible macro system which allows us to generate test cases at compile time from our generated code. By leveraging a procedural macro, we're able to easily write tests to validate that our generated code compiles and behaves correctly.
An Example Test
Let's look at one of the 92 different tests we currently have
committed
in arroyo-sql-testing, one_modulo_nullable_two
:
single_test_codegen!(
"one_modulo_nullable_two",
"non_nullable_i32 % nullable_i64",
arroyo_sql::TestStruct {
non_nullable_i32: 1,
nullable_i64: Some(2),
..Default::default()
},
Some(1i64)
);
This tests how our SQL engine handles the modulo operator %
between
null and non-null inputs.
The single_test_codegen macro goes through several steps to turn this into a useful test case.
Parsing the macro arguments
As a procedural macro, single_test_codegen takes a TokenStream, processes it, and returns a modified TokenStream. Initially, it extracts four input arguments:
test_name
: The name of the test case to be generated.calculation_string
: A string with the SQL expression whose implementation we wish to test.input_value
: The row we should execute the generated code against. For our tests we have TestStruct which has nullable and non-nullable for most of our data types.expected_result
: Some Rust expression that should be equal to the calculation_string called on the input value.
Creating an artificial SQL query
The macro calls get_test_expression()
which creates an artificial SQL
context, and plans a generated query2 using
DataFusion. We then pull
out the DataFusion expression for further processing.
Generating The Rust Code
The extracted expression3 is converted to an Arroyo Expression, from which we generate Rust code:
arg.nullable_i64
.clone()
.map(|right| arg.non_nullable_i32.clone() as i64 % right)
Constructing The Test Case
Finally, the generated code is used to build a test function, using the input_value as the initial context for the expression.
#[test]
fn generated_test_one_modulo_nullable_two() {
assert_eq!({
let arg = arroyo_sql::TestStruct {
non_nullable_i32: 1,
nullable_i64: Some(2),
..Default::default()
};
arg.nullable_i64
.clone()
.map(|right| arg.non_nullable_i32.clone() as i64 % right)
}, Some(1i64));
}
Conclusion
Despite the intricacy of this approach, there are several practical advantages. First, writing these tests is very easy, increasing the frequency that good tests are actually written. Second, because the tests are agnostic to the underlying implementation we can make significant changes to the engine without extensive test rewrites. Lastly, because of how much of the query path it executes it is able to detect regressions from unexpected places. For instance, this test caught a regression when I upgraded us to DataFusion 23.0.