Streamlio Blog

Perspectives, tutorials, and technical deep dives on technology for streaming data

image

Access Patterns and Tiered Storage in Apache Pulsar

October 15, 2018

Ivan Kelly

In this post we look at how Apache Pulsar handles the common access patterns of messaging and how that enables tiered storage.

image

Configuring Apache Pulsar Tiered Storage with Amazon S3

October 12, 2018

Jia Zhai, Ivan Kelly

This post walks through the steps in configuring Apache Pulsar tiered storage to offload data to Amazon S3.

image

Simple Event Processing with Apache Pulsar Functions

October 9, 2018

David Kjerrumgaard

Part 2 of our blog series on event-driven architecture covers simple event processing techniques and best practices using Apache Pulsar Functions

image

Event-Driven Architecture Using Streamlio

October 2, 2018

David Kjerrumgaard

Part 1 of our Event-Driven Architecture Blog series introduces a reference event-driven architecture based on technologies within the Streamlio platform.

image

A Major Step Forward for Apache Pulsar: New Top-Level Apache Project

September 25, 2018

Matteo Merli, Karthik Ramasamy

A look at the journey of Apache Pulsar, from solving problems at Yahoo! to a new solution for streaming and messaging.

image

Creating Work Queues with Apache Kafka and Apache Pulsar

August 29, 2018

Jesse Anderson

Jesse Anderson of Big Data Institute takes a look at use cases for work queues and a comparison of using Apache Pulsar and Apache Kafka to implement work queues.

image

Introducing Pulsar IO

August 20, 2018

Jerry Peng

Jerry provides an overview of Pulsar IO, a framework for moving data into and out of Apache Pulsar, explaining how it works and how to build connectors using Pulsar IO.

image

Building Data-Driven Applications with Apache Pulsar at STICORP

August 14, 2018

Daniel Ferreira Jorge

This guest post by Daniel Ferreira Jorge of STICORP explains why he and his team chose Apache Pulsar to support their data application.

image

Tiered Storage in Apache Pulsar

August 2, 2018

Ivan Kelly

A closer look at tiered storage, a new feature in Apache Pulsar 2.1 that allows Pulsar to offload cold data to low-cost external storage systems.

image

Solving the Challenges of IoT Analytics

July 12, 2018

Jon Bock

Success with IoT analytics projects can be elusive without the right technology solution for IoT data pipeline and processing. A look at how better technology can help.

image

Pulsar topic compaction

June 6, 2018

Ivan Kelly

A technical look at how topic compaction works in Apache Pulsar to provide consumers an efficient way to read current state for message keys.

image

The Pulsar schema registry

June 6, 2018

Dave Rusek

A dive into the new–and better–way for producers and consumers to keep current on the schema of data flowing through Pulsar.

Apache Pulsar logo

Apache Pulsar reaches 2.0

May 31, 2018

Matteo Merli

Streamlio welcomes with open arms the most ambitious Pulsar release yet, replete with essential features and enhancements

image

Using Apache Pulsar as a message queue

May 17, 2018

Luc Perkins

Though typically seen as a pub-sub messaging system, Apache Pulsar’s powerful durable storage capabilities make it ideal as a message queue

image

Apache Heron on Nomad

April 20, 2018

Jerry Peng

A look at Apache Heron’s support for Hashicorp Nomad as a scheduler for Heron, and how to get up and running with Apache Heron + Nomad

image

A developer's introduction to Apache Pulsar Functions

March 26, 2018

Sanjeev Kulkarni

Pulsar Functions is a stream processing capability for Apache Pulsar which makes it very easy for developers to write applications that need fast data processing.

image

The extensibility of OpenMessaging benchmarks

March 16, 2018

Matteo Merli, Luc Perkins

Creating and running a new cross-platform messaging benchmark workload from scratch

image

Apache Pulsar at Yahoo!

March 9, 2018

Christian Hasker

At the recent Apache Pulsar meetup, Yahoo’s Joe Francis talked about how they use Pulsar. See what was discussed and watch the videos from the event.

image

Introducing Pulsar Functions

March 6, 2018

Jerry Peng, Sanjeev Kulkarni, Sijie Guo

An introduction to Pulsar Functions, the lightweight stream-native processing capability in Apache Pulsar

image

How Apache Pulsar ensures no messages lost and no messages duplicated

February 23, 2018

Ivan Kelly

In this post we examine different delivery guarantees and explain how they are provided by Apache Pulsar, the messaging system at the heart of Streamlio’s intelligent platform.

image

Friend of the Heron Community: an interview with Joshua Fischer

February 22, 2018

Christian Hasker

In this post we talk with Josh Fischer of 1904Labs about how he got involved with the Heron community.

image

Effectively-once semantics in Apache Pulsar

February 20, 2018

Matteo Merli

“Exactly-once” is a controversial term in the messaging landscape. In this post we’ll offer a detailed look at effectively-once delivery semantics in Apache Pulsar and how this is achieved without sacrificing performance.

image

Microservices: from system of record to system of action

February 13, 2018

Christian Hasker, JJ Jeyappragash

This is the next post in our series about microservices and the enterprise. Microservices expert JJ Jeyappragash discusses the evolution of data in enterprise systems from systems of record to systems of action, as well as the importance of both the data layer and event layer for microservices-based applications.

image

How to add and remove nodes in Apache Pulsar

January 23, 2018

Sijie Guo

Adding and removing nodes is a common operation when managing a distributed system. In this post and accompanying video we’ll show how easy it is to add and remove nodes in Apache Pulsar without the need to rebalance data like other messaging systems such as Apache Kafka.

image

An introduction to unified queuing and streaming

January 18, 2018

Christian Hasker, Sanjeev Kulkarni

In the past, architects and developers have needed to learn and deploy separate systems for queuing and streaming. Now, thanks to Apache Pulsar, the next-generation distributed pub-sub system developed and open sourced by Yahoo, there is a single system that can span both use cases, making it easy for enterprises to both develop and operate at scale. In this post and videos Sanjeev Kulkarni discuss the difference between queuing and streaming, and provide use case examples of when a developer would choose a work queue vs a stream.

image

Microservices: perceptions and myths

January 10, 2018

Christian Hasker, JJ Jeyappragash

Let’s face it: microservices are hot! Enterprises are increasingly looking to break up monolithic applications and adopt microservice-based architectures à la Netflix, Google, and Twitter. But there remain a lot of misperceptions and knowledge gaps to fill in. In this short post (with accompanying videos) JJ Jeyappragash explains what microservices are, offers some guidelines on how to do microservices right, and talks about security from a microservice perspective.

image

How to migrate Apache Kafka applications to Apache Pulsar

January 8, 2018

Sijie Guo

In this post and accompanying video tutorial, Streamlio engineer and co-founder Sijie Guo shows you how to migrate an Apache Kafka application to Apache Pulsar with no code changes using the Kafka API wrapper.

image

Why you should adopt a multi-tenant solution for real-time applications

January 4, 2018

Luc Perkins

Single-tenant systems simply don’t cut it in large enterprises with exacting requirements and dozens of teams working on massive applications. In this post I’ll talk about what multi-tenancy is, why it’s so important, and why Apache Pulsar provides an extremely compelling model of multi-tenancy.

image

How Apache Pulsar saved the holidays: A story for the most magical time of the year

December 18, 2017

Sanjeev Kulkarni

In this festive time of year, pour yourself a glass of eggnog, read this post, watch the videos, and learn how to code a queuing and streaming application using Apache Pulsar.

image

Scaling out Total Order Atomic Broadcast with Apache BookKeeper

December 14, 2017

Ivan Kelly

Total Order Atomic Broadcast (TOAB) is an essential property of any distributed system that wants to provide effectively-once semantics. Many systems can serve as a foundation for TOAB but here we’ll see why BookKeeper is by far the most scalable.

image

Introducing Heron Streamlets

December 12, 2017

Jerry Peng, Luc Perkins

A new higher-level API for creating Heron topologies.

image

Comparing Pulsar and Kafka: how a segment-based architecture delivers better performance, scalability, and resilience

December 5, 2017

Sijie Guo

In this second post in a series, Sijie Guo compares Apache Pulsar and Apache Kafka’s storage and failure recovery models.

image

Comparing Pulsar and Kafka: unified queuing and streaming

December 1, 2017

Sijie Guo

In this first post in a series, Sijie Guo compares Apache Pulsar and Apache Kafka’s messaging consumption, queuing, and streaming models.

image

The Heron Stream Processing Engine on Google Kubernetes Engine

November 28, 2017

Chris Kellogg

How to deploy a Heron toplogy on a GKE cluster powered by Kubernetes

image

Low-latency streaming with Heron on InfiniBand (part 2 of 2)

November 8, 2017

Supun Kamburugamuve

Examining experimental results from integrating the Heron stream processing engine with the InfiniBand network architecture

image

Why Apache Bookkeeper? Part 2

November 6, 2017

Sijie Guo

In this second of two blog posts on Apache BookKeeper Sijie Guo, co-founder of Streamlio talks about the features that provide I/O isolation, data distribution, scalability and operability.

image

Cursors in Apache Pulsar

November 3, 2017

Ivan Kelly

In this post and accompanying video, Ivan Kelly, PMC member for Apache BookKeeper and engineer at Streamlio, discusses how the Apache Pulsar messaging system uses durably stored cursors (an enhanced form of offset) to provide fault-tolerant messaging.

image

Strata Presentation: “Messaging, storage, or both” (Part 1)

October 24, 2017

Matteo Merli

In this first video in a 3-part series of a presentation given at Strata last month, Streamlio engineer Matteo Merli gives an overview of the Pulsar messaging system and what distinguishes it from other messaging systems from an enterprise standpoint, focusing on Pulsar’s scalability, multi-tenancy features, and geo-replication capabilities.

image

Globally distributed Apache Pulsar quickstart

October 23, 2017

Ivan Kelly

Deploy a multi-region Pulsar installation on Google Cloud Platform in just a few commands using an Ansible playbook.

image

Low-latency streaming with Heron on InfiniBand (part 1 of 2)

October 20, 2017

Supun Kamburugamuve

Taking the network layer beyond Ethernet can yield major gains in distributed processing systems

image

Exactly once is NOT exactly the same

October 13, 2017

Jerry Peng

In this post, we will take a look at different processing semantics for stream processing engines. We will examine what exactly-once processing semantics actually guarantees and the differences in the implementations of exactly-once processing semantics. Here at Streamlio, we have standardized on ‘effectively-once’ as our terminology, and we’ll explain why.

image

Geo-replication in Apache Pulsar, part 2: patterns and practices

October 10, 2017

Sijie Guo

An overview of common multi-datacenter geo-replication patterns for the Pulsar messaging system (full-mesh, active-active, active-standby, and aggregation) as well as recommended practices for system monitoring, capacity planning, and throttling.

image

Announcing Support for OpenMessaging

October 9, 2017

Matteo Merli, Lewis Kaneshiro

Streamlio has joined with industry leaders to announce OpenMessaging, a new project within the Linux Standards Foundation to deliver standards-based messaging interoperability to accelerate data-driven infrastructure.

image

Helping Product Managers deliver real-time product innovations

October 5, 2017

Lewis Kaneshiro

Enterprises rely on product managers to come up with ideas and turn those ideas into revenue-generating applications to spur the next leg of growth and to create shareholder value. That’s a lot of pressure! It’s more essential now than ever before that product managers have the correct software stack to realize those ideas, which, with the world moving toward ever-faster performance and response times, means embracing an end-to-end real-time solution.

image

Heron: the extensible streaming engine

October 2, 2017

Karthik Ramasamy

This post looks at the architecture of Heron, the real-time stream processing engine developed at and open-sourced by Twitter. The various modules in the system communicate between one another using well-defined communication protocols. As a consequence, Heron is highly extensible and allows the application developer, system administrator, or Heron contributor to create a new implementation of a specific module and plug it into the system without affecting other modules or the communication mechanisms between them.

image

Geo-replication in Apache Pulsar, part 1: concepts and features

September 27, 2017

Sijie Guo

A look at another enterprise-grade feature of Apache Pulsar: geo-replication. It is essential for enterprises today to have disaster avoidance and recovery strategies in place. Whereas other messaging systems can only replicate between two datacenters, Pulsar can scale out to as many as are needed. We will also look at the two types of geo-replication supported in Pulsar, synchronous and asynchronous, and when to use one vs the other.

image

Multi-tenant messaging with Apache Pulsar

September 26, 2017

Matteo Merli, Sijie Guo

How Apache Pulsar provides a messaging system fit for large, multifaceted enterprises

image

Messaging, storage, or both?

September 25, 2017

Sijie Guo

The real-time journey of Apache Pulsar, DistributedLog, and BookKeeper

image

Experiences porting Heron to Kubernetes

September 18, 2017

John Crawford from ndustrial.io

Containerization makes all the difference

image

Why choose a unified real-time platform?

September 14, 2017

Luc Perkins

Why durable messaging, compute, and stream storage need to be understood as an inseparable whole

image

Why Apache BookKeeper? Part 1: consistency, durability, availability

September 12, 2017

Sijie Guo

An in-depth overview of some of the compelling features and guarantees provided by BookKeeper

image

Introduction to Apache BookKeeper

September 8, 2017

Sijie Guo

A scalable, fault-tolerant, and low-latency log storage service optimized for real-time workloads

image

Why Heron? Part 2

September 5, 2017

Karthik Ramasamy

Continuation of the key compelling and enterprise-grade features provided by Heron, the open source stream processing system originally developed by Twitter.

image

Why Heron? Part 1

September 1, 2017

Karthik Ramasamy

An overview of the key compelling and enterprise-grade features that Heron provides in the open source.

image

Why Apache Pulsar? Part 2

August 28, 2017

Matteo Merli, Karthik Ramasamy

An overview of the key enterprise-grade features that Pulsar provides out of the box.

image

Why Apache Pulsar? Part 1

August 25, 2017

Matteo Merli, Karthik Ramasamy

An overview of the key enterprise-grade features that Pulsar provides out of the box.

image

Introduction to the Apache Pulsar pub-sub messaging platform

August 22, 2017

Matteo Merli, Karthik Ramasamy

A high-level overview of Pulsar, a recently open-sourced pub-sub messaging platform built for massive scale and zero data loss.

image

Announcing Streamlio

August 15, 2017

Lewis Kaneshiro

We’re excited to announce the launch of Streamlio, with Series A funding from LightSpeed Venture Partners. We’re building an enterprise-grade, unified, end-to-end real-time platform, by the co-creators of best-of-breed open source technologies proven at Twitter and Yahoo.

image

Introduction to Heron

August 8, 2017

Karthik Ramasamy

Heron is a real-time stream processing engine first deployed at Twitter. This post provides an overview of Heron for architects and developers interested in learning about its architecture and terminology.

image

Heron going exactly-once

May 5, 2017

Sanjeev Kulkarni

Data processing guarantees in streaming systems fall into three categories: at-most-once, at-least-once, and exactly-once. This blog post outlines the requirements and previews exactly-once guarantees in Heron, the next generation real-time stream processing engine.