Perspectives, tutorials, and technical deep dives on technology for streaming data
June 6, 2018
A technical look at how topic compaction works in Apache Pulsar to provide consumers an efficient way to read current state for message keys.
June 6, 2018
A dive into the new–and better–way for producers and consumers to keep current on the schema of data flowing through Pulsar.
May 31, 2018
Streamlio welcomes with open arms the most ambitious Pulsar release yet, replete with essential features and enhancements
May 17, 2018
Though typically seen as a pub-sub messaging system, Apache Pulsar’s powerful durable storage capabilities make it ideal as a message queue
April 20, 2018
A look at Apache Heron’s support for Hashicorp Nomad as a scheduler for Heron, and how to get up and running with Apache Heron + Nomad
March 26, 2018
Pulsar Functions is a stream processing capability for Apache Pulsar which makes it very easy for developers to write applications that need fast data processing.
March 16, 2018
Creating and running a new cross-platform messaging benchmark workload from scratch
March 9, 2018
At the recent Apache Pulsar meetup, Yahoo’s Joe Francis talked about how they use Pulsar. See what was discussed and watch the videos from the event.
March 6, 2018
An introduction to Pulsar Functions, the lightweight stream-native processing capability in Apache Pulsar
February 23, 2018
In this post we examine different delivery guarantees and explain how they are provided by Apache Pulsar (incubating), the messaging system at the heart of Streamlio’s intelligent platform.
February 22, 2018
In this post we talk with Josh Fischer of 1904Labs about how he got involved with the Heron community.
February 20, 2018
“Exactly-once” is a controversial term in the messaging landscape. In this post we’ll offer a detailed look at effectively-once delivery semantics in Apache Pulsar and how this is achieved without sacrificing performance.
February 13, 2018
This is the next post in our series about microservices and the enterprise. Microservices expert JJ Jeyappragash discusses the evolution of data in enterprise systems from systems of record to systems of action, as well as the importance of both the data layer and event layer for microservices-based applications.
January 23, 2018
Adding and removing nodes is a common operation when managing a distributed system. In this post and accompanying video we’ll show how easy it is to add and remove nodes in Apache Pulsar without the need to rebalance data like other messaging systems such as Apache Kafka.
January 18, 2018
In the past, architects and developers have needed to learn and deploy separate systems for queuing and streaming. Now, thanks to Apache Pulsar, the next-generation distributed pub-sub system developed and open sourced by Yahoo, there is a single system that can span both use cases, making it easy for enterprises to both develop and operate at scale. In this post and videos Sanjeev Kulkarni discuss the difference between queuing and streaming, and provide use case examples of when a developer would choose a work queue vs a stream.
January 10, 2018
Let’s face it: microservices are hot! Enterprises are increasingly looking to break up monolithic applications and adopt microservice-based architectures à la Netflix, Google, and Twitter. But there remain a lot of misperceptions and knowledge gaps to fill in. In this short post (with accompanying videos) JJ Jeyappragash explains what microservices are, offers some guidelines on how to do microservices right, and talks about security from a microservice perspective.
January 8, 2018
In this post and accompanying video tutorial, Streamlio engineer and co-founder Sijie Guo shows you how to migrate an Apache Kafka application to Apache Pulsar with no code changes using the Kafka API wrapper.
January 4, 2018
Single-tenant systems simply don’t cut it in large enterprises with exacting requirements and dozens of teams working on massive applications. In this post I’ll talk about what multi-tenancy is, why it’s so important, and why Apache Pulsar provides an extremely compelling model of multi-tenancy.
December 18, 2017
In this festive time of year, pour yourself a glass of eggnog, read this post, watch the videos, and learn how to code a queuing and streaming application using Apache Pulsar.
December 14, 2017
Total Order Atomic Broadcast (TOAB) is an essential property of any distributed system that wants to provide effectively-once semantics. Many systems can serve as a foundation for TOAB but here we’ll see why BookKeeper is by far the most scalable.
December 12, 2017
A new higher-level API for creating Heron topologies.
December 5, 2017
In this second post in a series, Sijie Guo compares Apache Pulsar and Apache Kafka’s storage and failure recovery models.
December 1, 2017
In this first post in a series, Sijie Guo compares Apache Pulsar and Apache Kafka’s messaging consumption, queuing, and streaming models.
November 28, 2017
How to deploy a Heron toplogy on a GKE cluster powered by Kubernetes
November 8, 2017
Examining experimental results from integrating the Heron stream processing engine with the InfiniBand network architecture
November 6, 2017
In this second of two blog posts on Apache BookKeeper Sijie Guo, co-founder of Streamlio talks about the features that provide I/O isolation, data distribution, scalability and operability.
November 3, 2017
In this post and accompanying video, Ivan Kelly, PMC member for Apache BookKeeper and engineer at Streamlio, discusses how the Apache Pulsar messaging system uses durably stored cursors (an enhanced form of offset) to provide fault-tolerant messaging.
October 24, 2017
In this first video in a 3-part series of a presentation given at Strata last month, Streamlio engineer Matteo Merli gives an overview of the Pulsar messaging system and what distinguishes it from other messaging systems from an enterprise standpoint, focusing on Pulsar’s scalability, multi-tenancy features, and geo-replication capabilities.
October 23, 2017
Deploy a multi-region Pulsar installation on Google Cloud Platform in just a few commands using an Ansible playbook.
October 20, 2017
Taking the network layer beyond Ethernet can yield major gains in distributed processing systems
October 13, 2017
In this post, we will take a look at different processing semantics for stream processing engines. We will examine what exactly-once processing semantics actually guarantees and the differences in the implementations of exactly-once processing semantics. Here at Streamlio, we have standardized on ‘effectively-once’ as our terminology, and we’ll explain why.
October 10, 2017
An overview of common multi-datacenter geo-replication patterns for the Pulsar messaging system (full-mesh, active-active, active-standby, and aggregation) as well as recommended practices for system monitoring, capacity planning, and throttling.
October 9, 2017
Streamlio has joined with industry leaders to announce OpenMessaging, a new project within the Linux Standards Foundation to deliver standards-based messaging interoperability to accelerate data-driven infrastructure.
October 5, 2017
Enterprises rely on product managers to come up with ideas and turn those ideas into revenue-generating applications to spur the next leg of growth and to create shareholder value. That’s a lot of pressure! It’s more essential now than ever before that product managers have the correct software stack to realize those ideas, which, with the world moving toward ever-faster performance and response times, means embracing an end-to-end real-time solution.
October 2, 2017
This post looks at the architecture of Heron, the real-time stream processing engine developed at and open-sourced by Twitter. The various modules in the system communicate between one another using well-defined communication protocols. As a consequence, Heron is highly extensible and allows the application developer, system administrator, or Heron contributor to create a new implementation of a specific module and plug it into the system without affecting other modules or the communication mechanisms between them.
September 27, 2017
A look at another enterprise-grade feature of Apache Pulsar: geo-replication. It is essential for enterprises today to have disaster avoidance and recovery strategies in place. Whereas other messaging systems can only replicate between two datacenters, Pulsar can scale out to as many as are needed. We will also look at the two types of geo-replication supported in Pulsar, synchronous and asynchronous, and when to use one vs the other.
September 26, 2017
How Apache Pulsar provides a messaging system fit for large, multifaceted enterprises
September 25, 2017
The real-time journey of Apache Pulsar, DistributedLog, and BookKeeper
September 18, 2017
Containerization makes all the difference
September 14, 2017
Why durable messaging, compute, and stream storage need to be understood as an inseparable whole
September 12, 2017
An in-depth overview of some of the compelling features and guarantees provided by BookKeeper
September 8, 2017
A scalable, fault-tolerant, and low-latency log storage service optimized for real-time workloads
September 5, 2017
Continuation of the key compelling and enterprise-grade features provided by Heron, the open source stream processing system originally developed by Twitter.
September 1, 2017
An overview of the key compelling and enterprise-grade features that Heron provides in the open source.
August 28, 2017
An overview of the key enterprise-grade features that Pulsar provides out of the box.
August 25, 2017
An overview of the key enterprise-grade features that Pulsar provides out of the box.
August 22, 2017
A high-level overview of Pulsar, a recently open-sourced pub-sub messaging platform built for massive scale and zero data loss.
August 15, 2017
We’re excited to announce the launch of Streamlio, with Series A funding from LightSpeed Venture Partners. We’re building an enterprise-grade, unified, end-to-end real-time platform, by the co-creators of best-of-breed open source technologies proven at Twitter and Yahoo.
August 8, 2017
Heron is a real-time stream processing engine first deployed at Twitter. This post provides an overview of Heron for architects and developers interested in learning about its architecture and terminology.
May 5, 2017
Data processing guarantees in streaming systems fall into three categories: at-most-once, at-least-once, and exactly-once. This blog post outlines the requirements and previews exactly-once guarantees in Heron, the next generation real-time stream processing engine.