Build real-time pipelines on Confluent with SQL
Deploy stateful streaming pipelines on top of your Confluent Kafka streams, with millisecond latency. Power real-time applications, compute ML features, and build real-time dashboards with SQL.
Why Arroyo
Easy integration
Integrating Arroyo with your Confluent cluster is as easy as entering your cluster information and clicking "Create." See the integration guide.
Arroyo includes full support for Avro and JSON, and can consume and publish to the Confluent Schema Registry.
SQL that just works
Optimized from the SQL planner to the storage layer for excellent, unsurprising SQL support. Build reliable, efficient streaming pipelines without specialized streaming knowledge.
Arroyo supports powerful analytical SQL features like joins, windows, aggregations, and UDFs. See the SQL guide for full details.
Stateful and consistent
Join, window, and aggregate your streams with with event time semantics. This means you get the correct results even when your data is late or out of order.
And with Kafka, you can enable exactly-once processing to ensure there are no missed events or duplicates in your output.
Built for every scale
From tens of events per second to tens of millions, Arroyo scales with your data. Arroyo is designed for modern cloud environments, and can seamlessly and consistently rescale pipelines in seconds.
Arroyo is fully open-source and can be self-hosted on your cloud with Kubernetes, or use Arroyo's serverless offering.
5 minutes to streaming
1Run locally
$ docker run -p 5115:5115 \
ghcr.io/arroyosystems/arroyo:latest
2Create a Confluent Connection
Connecting Arroyo to your Confluent cluster is easy. With the docker image running, navigate to create connection / Confluent.
In the first step, set up the connection to your Confluent cluster, including the Kafka bootstrap servers and the authentication information. You can also configure your Confluent Schema Registry here.
Next, create a new connection table, which lets you read and write to Kafka topics. You'll need to select a topic, and optionally load the schema from the Confluent Schema Registry. Give your connection a name and validate that everything is set up correctly.
See the integration guide for a complete walk-through.
3Write a query
Now you're ready to write your first query! Navigate to create pipeline.
In SQL, you'll refer to your connection by the name you gave it. Connections are available as tables in your SQL queries.
Here's a query that counts the number of orders per store over a 5 minute sliding window.
SELECT store_id, count(*) as count
FROM orders
WHERE amount > 10
GROUP BY
store_id,
hop(interval '5 seconds', interval '5 minutes');
See the SQL guide for full details on Arroyo SQL.