⚡️ Arroyo 0.15.0 is now available with 🧊 Iceberg support

cloud-native
stream processing

Transform, filter, aggregate, and join data streams by writing SQL, with sub-second results Scale from zero to millions of events per second No ops team required

Install open-source Read the docs

Trusted by teams from

Built by streaming experts from

Get started

Arroyo ships as a single, compact binary. Run locally on MacOS or Linux for development, deploy to production with Docker or Kubernetes.

Get Started

$ curl -LsSf https://arroyo.dev/install.sh | sh
$ arroyo cluster

Release 0.15.0|December 1, 2025Apache 2.0 License

Arroyo is a new kind of stream processing engine, built from the ground up to make real‑time easier than batch.

Analytical SQL

that just works

Designed for the

modern cloud

Scales easily

to any workload

Incredible

performance

Analytical SQL that just works

Arroyo was designed from the start so that anyone with SQL experience can build reliable, efficient, and correct streaming pipelines.

Data scientists and engineers can build end-to-end real-time applications... without a separate team of streaming experts.

=> SQL docs

CREATE VIEW tags AS (
      SELECT btrim(unnest(tags), '"') as tag FROM (
          SELECT extract_json(value, '$.tags[*].name') AS tags
       FROM mastodon)
  );
  
  SELECT * FROM (
      SELECT *, ROW_NUMBER() OVER (
          PARTITION BY window
          ORDER BY count DESC) as row_num
      FROM (SELECT count(*) as count,
          tag,
          hop(interval '5 seconds',
            interval '15 minutes') as window
              FROM tags
              group by tag, window)) WHERE row_num <= 5;

Designed for the modern cloud

Your streaming pipelines shouldn't page someone just because Kubernetes decided to reschedule your pods. Arroyo is built to run in modern, elastic cloud environments, from simple container runtimes like Fargate to large, distributed deployments on Kubernetes.

In short: Arroyo is a stateful stream processing engine that behaves like a stateless one.

=> Deployment docs

Scales easily to any workload

Arroyo is for everyone who needs to process data in real-time. Small use-cases can run with just a few MBs of RAM and a fractional vCPU.

For larger streams, Arroyo can rescale vertically and horizontally to process tens of millions of events per seconds while maintaining exactly-once semantics.

Incredible performance

Arroyo is fast. Really really fast. Written in Rust, a high-performance systems language, and built around the Arrow in-memory analytics format, its performance exceeds similar systems like Apache Flink by 5x or more.

Features

Time windows

Process data using sliding, tumbling, and session windows with watermark processing to determine when all data for a window has arrived.

Joins

Arroyo SQL covers a full set of streaming joins, including left, outer, inner, and full, which can be windowed or operate over updating data.

SQL Functions

Arroyo ships with over 300 SQL window, aggregate, and scalar functions, covering math, arrays, regex, json, and more.

Exactly-once

Exactly-once processing means no duplicated or dropped events, even with out-of-order data and machine failures.

Formats

Arroyo can natively read and write JSON, Avro, Parquet, and raw text and binary. Custom formats can be implemented with UDFs.

UDFs

Extend the built-in SQL by writing Rust user-defined scalar, aggregate, and async functions, with Python coming soon.

Web UI

Manage connections, develop and test SQL queries, and monitor pipelines from the powerful Arroyo Web UI.

REST API

Pipelines can be created, operated, and managed with the REST API, offering declarative orchestration at scale.

Real-time with Arroyo

With Arroyo, you can build streaming pipelines by writing the same analytical SQL queries you are already running in your data warehouse.

Mastodon is a federated microblogging platform, similar to Twitter. This query operates over the stream of all Mastodon posts via its Server-Sent Events API and finds the top 5 hashtags in each 15-minute window.

See the full tutorial.

This query finds potentially fraudulent users by detecting accounts that appear in multiple states within a single day.

CREATE TABLE mastodon (
    value TEXT
) WITH (
    connector = 'sse',
    format = 'raw_string',
    endpoint = 'http://mastodon.arroyo.dev/api/v1/streaming/public',
    events = 'update'
);

CREATE VIEW tags AS (
    SELECT btrim(unnest(tags), '"') as tag FROM (
        SELECT extract_json(value, '$.tags[*].name') AS tags
     FROM mastodon)
);

SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER (
        PARTITION BY window
        ORDER BY count DESC) as row_num
    FROM (SELECT count(*) as count,
        tag,
        hop(interval '5 seconds', interval '15 minutes') as window
            FROM tags
        group by tag, window)) WHERE row_num <= 5;

CREATE TABLE page_views (
    userId INT,
    state TEXT
)  WITH (
    connector = 'kafka',
    type = 'source',
    bootstrap_servers = 'localhost:9092',
    topic = 'page_view_events',
    format = 'json'
);

CREATE TABLE suspicious (
    user_id INT,
) WITH (
    connector = 'kafka',
    type = 'sink',
    bootstrap_servers = 'localhost:9092',
    topic = 'suspicious',
    format = 'json'
);

INSERT INTO suspicious
    SELECT "userId" as suspicious_id
        FROM (
            SELECT "userId",
                tumble(interval '1 day') as window,
                COUNT(distinct state) as states
            FROM page_views
            GROUP BY 1, 2)
            WHERE states > 4;

cloud-nativestream processing

Get started

Analytical SQL that just works

Designed for the modern cloud

Scales easily to any workload

Incredible performance

Features

Time windows

Joins

SQL Functions

Exactly-once

Formats

UDFs

Web UI

REST API

Well connected

Blackhole

Delta lake

Fluvio

Kafka

Mqtt

Nats

Polling HTTP

Redis

Server-Sent Events

Webhook

Confluent Cloud

FileSystem

Impulse

Kinesis

MySQL

Nexmark

Postgres

RedPanda

StdOut

Websocket

Real-time with Arroyo

Mastodon top hashtags

Suspicious users

Recent posts from the blog

Announcing Arroyo 0.15.0

Arroyo is joining Cloudflare

Announcing Arroyo 0.14.0

cloud-native
stream processing