A Major Step Forward for Apache Pulsar: New Top-Level Apache Project

September 25, 2018

Matteo Merli

Karthik Ramasamy

The news that Apache Pulsar has graduated from the Apache Incubator to become a top-level project at The Apache Software Foundation marks a milestone transition in Pulsar’s journey.   

When I (Matteo) and the team at Yahoo! started work a number of years ago on what would ultimately become Apache Pulsar, we didn’t set out to create a new open source platform. We didn’t even start out intending to build something new at all. We started out trying to solve a problem we had that we saw was only going to get worse.

Our group was responsible for the messaging services that are relied on by a broad range of applications at Yahoo, everything from Mail to Finance, Flickr, the Yahoo ad serving systems and more. Those applications looked to our messaging services to provide them reliable, low latency access to the data. As the team providing that as an internal service within Yahoo, we needed it to be scalable to keep up with growing amounts of data, resilient so that always-on applications could rely on it, manageable so that our operations team could maintain their sanity, and last but not least: to never, ever lose data.

The problem we had was that our existing technologies–a combination of custom-developed and open source software–were starting to fall behind on those requirements as we grew. We took a look at the latest and greatest options on the market at the time, both ones that you’ve heard of and ones that you probably haven’t–after all, there is no shortage of technologies available, both open source and proprietary, for messaging. However, we couldn’t find anything that could provide the scale, performance, and features we required–some couldn’t scale the number of topics to what we needed, others required too much expensive hardware to meet throughput needs, and others put management burdens on our operations team when it came to resiliency, replication, scaling, multiple sharded clusters, etc. In short, messaging options had become too complex, fragile, and rigid for what we needed.

That’s what led us to create Pulsar. Starting from a clean slate and building on both our own experiences and learnings from the existing technologies, we were able to build a new messaging solution, based on a different architecture, that could meet those needs. You can read more about the technology details that went into that on the Yahoo! Blog.

After deploying and operating Pulsar at Yahoo! for multiple years, we realized that there was a broader community of people who needed the same things that Yahoo! needed. That was what led us to the decision to contribute Apache Pulsar to the Apache Incubator, the start of the process that led to Pulsar’s graduation to a top-level Apache project today. We saw that making Pulsar an open source project would not only enable broader adoption, but also accelerate innovation based on Pulsar’s core architecture.

That realization is also what led both of us and the other founders to create Streamlio. Working together with the growing community of Apache Pulsar contributors, we’ve helped to drive the rapid innovation in Pulsar that’s happened since it first joined the Apache Incubator. Pulsar has evolved to become more than just a messaging system–it’s become a modern, integrated platform for data in motion. Building on its unique architecture, Pulsar provides not only publish-subscribe messaging and queuing, it also natively provides stream processing and stream storage. That meets the needs of modern applications, which not only need a way to connect and move data (the focus of traditional messaging solutions), but also a way to process and transform that data as it moves from sources to applications, among applications, and to users.

Having that set of capabilities in a single, scalable and high-performance solution opens up a wide array of possibilities. From building a data fabric that connects data from the edge to the cloud to the datacenter on a common platform, to enabling real-time interactions with customers and partners, to handling demanding low-latency data processing and analytics on market data and transactions, Apache Pulsar is proving itself in companies both big and small. We’re excited to continue to work with the Apache community to continue to innovate and drive Pulsar forward.