Arroyo is joining Cloudflare logo Cloudflare to bring stream processing to everyone
Arroyo Logo

Blog

Updates from the Arroyo team

announcement

Arroyo is joining Cloudflare

Arroyo has been acquired by Cloudflare to bring serverless SQL stream processing to the Cloudflare Developer Platfrorm, integrated with Queues, Workers, and R2. The Arroyo Engine will remain open-source and self-hostable.

Micah Wylde
Micah Wylde CEO of Arroyo
Arroyo x Cloudflare
|
A fox in an alpine meadow, in early spring

Announcing Arroyo 0.14.0

announcement

Arroyo 0.14 is now available! This release introduces support for lookup joins, more powerful updating SQL, new syntax, structs in DDL, and more!

Micah Wylde
Micah Wylde CEO of Arroyo
A Fox juggling JSON tokens

Fast columnar JSON decoding with arrow-rs

engineering

JSON is the most common serialization format used in streaming pipelines, so it pays to be able to deserialize it fast. This post covers in detail how the arrow-json library works to perform very efficient columnar JSON decoding, and the additions we've made for streaming use cases.

Micah Wylde
Micah Wylde CEO of Arroyo
Illustration of a fox driving a front loader

Building a near-real-time data lake with the LOAD stack

tutorial

The LOAD stack (log storage/object storage/Arroyo/DuckDB) makes it easy to build an affordable real-time data lake with minimal operational overhead. This tutorial will guide you through the process of setting up a complete system on AWS.

Micah Wylde
Micah Wylde CEO of Arroyo
A fox skiing

Announcing Arroyo 0.13.0

announcement

Arroyo 0.13 is now available! This release introduces support for reading source metadata, a RabbitMQ connector, improved CDC support, operator chaining, along with many other improvements.

Micah Wylde
Micah Wylde CEO of Arroyo
Latency, Throughput, Fault Tolerance: Designing the Arroyo streaming engine

Talk: Latency, Throughput, Fault Tolerance

arroyo

Arroyo creator Micah Wylde recently spoke at P99Conf, discussing how Arroyo achieves low-latency and high-throughput while maintaining fault tolerance and fast recovery times

Micah Wylde
Micah Wylde CEO of Arroyo
A fox on the beach

Announcing Arroyo 0.12.0

announcement

Arroyo 0.12 is now available! This release introduces Python UDFs, Protobuf ingestion, JSON syntax, custom state TTLs, and many other features, improvements, and fixes.

Micah Wylde
Micah Wylde CEO of Arroyo
Arroyo + Warpstream

Serverless Arroyo pipelines on Fly.io

tutorial

Arroyo is the easiest way to build real-time data pipelines, and Fly.io is the easiest way to run them. This tutorial shows how to use the new pipeline cluster feature in Arroyo 0.11 to build a streaming pipeline and a web app that consumes it, all running on Fly's serverless infrastructure.

Micah Wylde
Micah Wylde CEO of Arroyo
A fox on the beach

Announcing Arroyo 0.11.0

announcement

Arroyo 0.11 is now available! This release introduces pipeline clusters for lightweight, self-contained job execution, and SQLite support for simplified deployments. It also brings a new configuration system, improved UI for pipeline creation and previewing, SQL enhancements, and more.

Micah Wylde
Micah Wylde CEO of Arroyo
A crab plugging a cable to a socket

How to build a plugin system in Rust

engineering

Software used by businesses often needs to be extensible. For Arroyo, a real-time SQL engine, that means supporting user-defined functions (UDFs). But how can we support dynamic, user-written code in a static language like Rust? This post dives deep into the technical details of building a dynamically-linked, FFI-based plugin system in Rust.

Micah Wylde
Micah Wylde CEO of Arroyo
A fox on Marks

Announcing Arroyo 0.10.0

announcement

Arroyo 0.10 is now available! This is our biggest release ever, featuring an entirely new SQL engine that's >3x faster and ships as a single binary. Plus NATS and MQTT connectors, more SQL features, and more.

Micah Wylde
Micah Wylde CEO of Arroyo
A fox shooting an arrow at the DataFusion logo

We built a new SQL Engine on Arrow and DataFusion

announcement

Arroyo 0.10 has an entirely new SQL engine built with Apache Arrow and DataFusion. It's much faster, smaller, and easier to run. Read on for why and how we're making this change.

Micah Wylde
Micah Wylde CEO of Arroyo
Arroyo + Confluent

Confluent & Arroyo: Partnering to Bring Real-time SQL to Kafka

announcement

We are excited to announce that Arroyo is now a Connect with Confluent Partner, making it easier than ever for Confluent customers to integrate with the Arroyo platform. Arroyo extends Kafka with powerful stateful stream processing support, enabling businesses to analyze their data in real-time using SQL.

Micah Wylde
Micah Wylde CEO of Arroyo
Engraving showing a fox next to complex gears on one side and simple flowing lines on the other

What is stateful stream processing?

explainer

Arroyo is a stateful stream processing engine—which means that it's able to remember information about previously seen events, enabling features like joins, windows, and aggregations. When should you choose a stateless or a stateful streaming system? And how do stateful engines like Arroyo and Flink mitigate the difficulty of dealing with large amounts of state?

Micah Wylde
Micah Wylde CEO of Arroyo
A fox in the arctic

Announcing Arroyo 0.9.0

announcement

Arroyo 0.9 is now available! This release introduces async UDFs, which allow users to use databases, services, and models from within their pipelines. It also brings support for joining update tables, more control over bad data handling, a redesigned connection profile editor, and more.

Micah Wylde
Micah Wylde CEO of Arroyo
A fox walking in San Francisco on New Years

2023 Year in Review

arroyo

It's been a big year for Arroyo! We launched the company, open-sourced the engine, and did 8 releases. Here's a look back at our very exciting 2023.

Micah Wylde
Micah Wylde CEO of Arroyo

Using Kafka with Rust

explainer

Apache Kafka is a distributed log that's a great fit for streaming applications, microservice architectures, and more. In this post, we will learn how to use Kafka with applications written in Rust.

Micah Wylde
Micah Wylde CEO of Arroyo

Parsing custom formats with UDFs

tutorial

User-defined functions (UDFs) allow users to extend Arroyo with new functionality by writing Rust code. In this tutorial, we'll walk through how to use UDFs to parse a custom data format: the Common Log Format used by Apache HTTP and other web servers.

Micah Wylde
Micah Wylde CEO of Arroyo
A fox in the desert, with the text 'Arroyo 0.7'

Announcing Arroyo 0.8.0

announcement

Arroyo 0.8 is now available, with a new FileSystem source, Delta Lake sink, Redis sink, Avro support, global UDFs, and more.

Micah Wylde
Micah Wylde CEO of Arroyo
Illustration of a stream with SQL queries floating down a river

What is streaming SQL?

explainer

What does it mean to apply SQL—a batch-oriented query language—to streams of data that are never complete? Read on for a deep dive into streaming SQL in Arroyo and other engines.

Micah Wylde
Micah Wylde CEO of Arroyo

Running Arroyo on EKS

tutorial

The easiest way to run a highly-scaled production Arroyo cluster is on Kubernetes. Setting up a Kubernetes cluster used to be a daunting task, but services like Amazon EKS have made it much easier. This post will walk through how to set up an EKS cluster and deploy Arroyo to it.

Micah Wylde
Micah Wylde CEO of Arroyo

Can you replace Prometheus with a stream processor?

arroyo

Recent versions of Arroyo have added support for HTTP sources, and treating individual lines of a response as streaming messages. So I wondered: could we use Arroyo to directly process metrics?

Micah Wylde
Micah Wylde CEO of Arroyo
A fox in the desert, with the text 'Arroyo 0.7'

Announcing Arroyo 0.7.0

announcement

Arroyo 0.7.0 is now available, with custom partitioning for s3 writes, message framing, unnest, union, state compaction, and more.

Micah Wylde
Micah Wylde CEO of Arroyo
A rusty crab in the desert

Rust is the best language for data infra

engineering

Arroyo is written in Rust, a modern systems language. We think it's become the best choice for writing high-performance systems like databases and stream processing engines. Read on for why we chose Rust, and what we've learned along the way.

Micah Wylde
Micah Wylde CEO of Arroyo
Arroyo 0.6

Announcing Arroyo 0.6.0

announcement

Arroyo 0.6 brings support for Google Cloud Storage, user-defined aggregate functions, SQL correctness tests, and more

Micah Wylde
Micah Wylde CEO of Arroyo
The S3 icon in a swirling galaxy

Streaming data to S3 is surprisingly hard

engineering

Arroyo 0.5 added the FileSystem connector, a high-performance, transactional sink that lets you write pipeline outputs to file systems and object stores like S3—and makes Arroyo a great tool for performing real-time ETL. This turns out to be surprisingly tricky to do well. Read on for a deep dive into how Arroyo solved this with a new checkpointing strategy and some clever Parquet tricks.

Jackson Newhouse
Jackson Newhouse CTO of Arroyo
Tutorial: Real-time Web Analytics with Arroyo

Real-time Web Analytics with Arroyo

tutorial

Working with real-time data can be daunting. We're working to solve that by building a new stream processing engine that's easy enough for anyone to use. So how easy is it to solve real-world streaming problems with Arroyo today? I decided to find out.

Micah Wylde
Micah Wylde CEO of Arroyo
Arroyo + Warpstream

Arroyo + Warpstream

arroyo

At Arroyo we're building a new stream processing engine to replace legacy Java systems like Flink and KSQL. So we were excited to see a project that's doing the same thing for Kafka. It's called WarpStream, and they're building a replacement for Kafka that's backed directly by S3.

Micah Wylde
Micah Wylde CEO of Arroyo
A fox in the clouds

Announcing Arroyo 0.5.0

announcement

Release 0.5 of Arroyo is all about connectors. We've added a high-performance transactional FileSystem sink, exactly-once Kafka support, a Kinesis connector, and more.

Micah Wylde
Micah Wylde CEO of Arroyo
The Flink Squirrel mascot

Why Not Flink?

arroyo

Flink is a mature and powerful streaming engine. So why didn't we build Arroyo on top of it?

Micah Wylde
Micah Wylde CEO of Arroyo

Announcing Arroyo 0.4.0

announcement

With the 0.4.0 release we've added Debezium support, a new REST API, and made the process of contributing connectors much easier

Micah Wylde
Micah Wylde CEO of Arroyo
A fox in space

Announcing Arroyo 0.3.0

announcement

The Arroyo 0.3.0 release adds UDFs, DDL statements, custom event time and watermarks, web UI improvements, and more.

Micah Wylde
Micah Wylde CEO of Arroyo

End-to-end SQL tests with Rust proc macros

engineering

Testing a complex system like Arroyo is hard. But with Rust's powerful proc macros, we're able to easily produce end-to-end tests of our SQL features.

Jackson Newhouse
Jackson Newhouse CTO of Arroyo

Announcing Arroyo 0.2.0

announcement

Arroyo 0.2.0 brings a number of improvements including native Kubernetes support, new SQL features, and many other fixes and improvements.

Micah Wylde
Micah Wylde CEO of Arroyo

Open-sourcing the Arroyo Streaming Engine

announcement

After launching our state-of-the-art cloud real-time data processor, we're opening up the technology that powers it: the Arroyo streaming engine

Micah Wylde
Micah Wylde CEO of Arroyo

10x faster sliding windows: how our Rust streaming engine beats Flink

engineering

Arroyo's Rust-based stream processing engine outperforms Apache Flink in sliding window queries due to its efficient algorithms that maintain near-constant throughput even with smaller slides and larger windows

Jackson Newhouse
Jackson Newhouse CTO of Arroyo