Streamlio Blog

Perspectives, tutorials, and technical deep dives on technology for streaming data

Blog Library

image

September 16, 2019

Jon Bock

Introducing Streamlio Cloud Preview

Streamlio is happy to announce Streamlio Cloud Preview, the easiest way to try out the capabilities of Streamlio Cloud and Apache Pulsar.

image

August 26, 2019

Jon Bock

Streamlio Cloud Awarded Silver in Network Products Guide’s 2019 IT World Awards

Streamlio continues to be recognized for innovation. In this post we discuss the most recent award for Streamlio Cloud from Network Products Guide.

image

July 18, 2019

Jerry Peng

Sentiment Analysis of Tweets using Apache Pulsar

A walk-through on how to use Pulsar and Pulsar Functions to create an application to perform sentiment analysis on Twitter streams.

image

June 17, 2019

Jerry Peng

Debugging Pulsar Functions in Java

How to test and debug Pulsar Functions in your local environment.

image

May 29, 2019

Karthik Ramasamy, David Kjerrumgaard

Introducing Streamlio Cloud, Apache Pulsar® as a Service

The simplest, fastest, and most cost effective way to run Apache Pulsar in the cloud.

image

May 3, 2019

Karthik Ramasamy

Streamlio Awarded Tech Startup of the Year Silver Award in American Business Awards

A reflection on Streamlio’s journey from idea to innovation in fast data to receiving a Silver Stevie Award.

image

March 19, 2019

David Kjerrumgaard

Announcing NiFi Processors for Apache Pulsar

An introduction to the newly released NiFi processors for Apache Pulsar, and how this new functionality enables ingesting data from multiple source systems into Apache Pulsar

image

February 13, 2019

Jesse Anderson

Saving Money with Apache Pulsar Tiered Storage

A comparison of event and message retention in Apache Kafka and Apache Pulsar and how Pulsar’s tiered storage feature provides significant cost savings.

image

December 19, 2018

Sijie Guo

Apache Pulsar Architecture: Designing for Streaming Performance and Scalability

A look at Pulsar’s multi-layer architecture and how it is fundamental to making Pulsar ideal for the performance, scalability and availability requirements of streaming use cases

image

November 13, 2018

David Kjerrumgaard

Real-Time Analytics with Pulsar Functions

Part 4 of our event-driven architecture blog series covers real-time analytics capabilities with Apache Pulsar Functions

image

November 7, 2018

Ivan Kelly

Comparing LogDevice and Apache Pulsar

In this post we take a technical look at how LogDevice’s distributed log compares to the distributed log used in Apache Pulsar.

image

October 31, 2018

Jerry Peng

Querying Data Streams with Apache Pulsar SQL

A look at the motivation, architecture and use cases for SQL querying to streaming event data in Apache Pulsar.

image

October 25, 2018

David Kjerrumgaard

Event Processing Design Patterns with Pulsar Functions

Part 3 of our Event-Driven Architecture series covers common event processing design patterns with Apache Pulsar Functions.

image

October 15, 2018

Ivan Kelly

Access Patterns and Tiered Storage in Apache Pulsar

In this post we look at how Apache Pulsar handles the common access patterns of messaging and how that enables tiered storage.

image

October 12, 2018

Jia Zhai, Ivan Kelly

Configuring Apache Pulsar Tiered Storage with Amazon S3

This post walks through the steps in configuring Apache Pulsar tiered storage to offload data to Amazon S3.

image

October 9, 2018

David Kjerrumgaard

Simple Event Processing with Apache Pulsar Functions

Part 2 of our blog series on event-driven architecture covers simple event processing techniques and best practices using Apache Pulsar Functions

image

October 2, 2018

David Kjerrumgaard

Event-Driven Architecture Using Streamlio

Part 1 of our Event-Driven Architecture Blog series introduces a reference event-driven architecture based on technologies within the Streamlio platform.

image

September 25, 2018

Matteo Merli, Karthik Ramasamy

A Major Step Forward for Apache Pulsar: New Top-Level Apache Project

A look at the journey of Apache Pulsar, from solving problems at Yahoo! to a new solution for streaming and messaging.

image

August 29, 2018

Jesse Anderson

Creating Work Queues with Apache Kafka and Apache Pulsar

Jesse Anderson of Big Data Institute takes a look at use cases for work queues and a comparison of using Apache Pulsar and Apache Kafka to implement work queues.

image

August 20, 2018

Jerry Peng

Introducing Pulsar IO

Jerry provides an overview of Pulsar IO, a framework for moving data into and out of Apache Pulsar, explaining how it works and how to build connectors using Pulsar IO.

image

August 14, 2018

Daniel Ferreira Jorge

Building Data-Driven Applications with Apache Pulsar at STICORP

This guest post by Daniel Ferreira Jorge of STICORP explains why he and his team chose Apache Pulsar to support their data application.

image

August 2, 2018

Ivan Kelly

Tiered Storage in Apache Pulsar

A closer look at tiered storage, a new feature in Apache Pulsar 2.1 that allows Pulsar to offload cold data to low-cost external storage systems.

image

July 12, 2018

Jon Bock

Solving the Challenges of IoT Analytics

Success with IoT analytics projects can be elusive without the right technology solution for IoT data pipeline and processing. A look at how better technology can help.

image

June 6, 2018

Ivan Kelly

Pulsar topic compaction

A technical look at how topic compaction works in Apache Pulsar to provide consumers an efficient way to read current state for message keys.

image

June 6, 2018

Dave Rusek

The Pulsar schema registry

A dive into the new–and better–way for producers and consumers to keep current on the schema of data flowing through Pulsar.

Apache Pulsar logo

May 31, 2018

Matteo Merli

Apache Pulsar reaches 2.0

Streamlio welcomes with open arms the most ambitious Pulsar release yet, replete with essential features and enhancements

image

May 17, 2018

Luc Perkins

Using Apache Pulsar as a message queue

Though typically seen as a pub-sub messaging system, Apache Pulsar’s powerful durable storage capabilities make it ideal as a message queue

image

April 20, 2018

Jerry Peng

Apache Heron on Nomad

A look at Apache Heron’s support for Hashicorp Nomad as a scheduler for Heron, and how to get up and running with Apache Heron + Nomad

image

March 26, 2018

Sanjeev Kulkarni

A developer's introduction to Apache Pulsar Functions

Pulsar Functions is a stream processing capability for Apache Pulsar which makes it very easy for developers to write applications that need fast data processing.

image

March 16, 2018

Matteo Merli, Luc Perkins

The extensibility of OpenMessaging benchmarks

Creating and running a new cross-platform messaging benchmark workload from scratch

image

March 9, 2018

Christian Hasker

Apache Pulsar at Yahoo!

At the recent Apache Pulsar meetup, Yahoo’s Joe Francis talked about how they use Pulsar. See what was discussed and watch the videos from the event.

image

March 6, 2018

Jerry Peng, Sanjeev Kulkarni, Sijie Guo

Introducing Pulsar Functions

An introduction to Pulsar Functions, the lightweight stream-native processing capability in Apache Pulsar

image

February 23, 2018

Ivan Kelly

How Apache Pulsar ensures no messages lost and no messages duplicated

In this post we examine different delivery guarantees and explain how they are provided by Apache Pulsar, the messaging system at the heart of Streamlio’s intelligent platform.

image

February 22, 2018

Christian Hasker

Friend of the Heron Community: an interview with Joshua Fischer

In this post we talk with Josh Fischer of 1904Labs about how he got involved with the Heron community.

image

February 20, 2018

Matteo Merli

Effectively-once semantics in Apache Pulsar

“Exactly-once” is a controversial term in the messaging landscape. In this post we’ll offer a detailed look at effectively-once delivery semantics in Apache Pulsar and how this is achieved without sacrificing performance.

image

February 13, 2018

Christian Hasker, JJ Jeyappragash

Microservices: from system of record to system of action

This is the next post in our series about microservices and the enterprise. Microservices expert JJ Jeyappragash discusses the evolution of data in enterprise systems from systems of record to systems of action, as well as the importance of both the data layer and event layer for microservices-based applications.

image

January 23, 2018

Sijie Guo

How to add and remove nodes in Apache Pulsar

Adding and removing nodes is a common operation when managing a distributed system. In this post and accompanying video we’ll show how easy it is to add and remove nodes in Apache Pulsar without the need to rebalance data like other messaging systems such as Apache Kafka.

image

January 10, 2018

Christian Hasker, JJ Jeyappragash

Microservices: perceptions and myths

Let’s face it: microservices are hot! Enterprises are increasingly looking to break up monolithic applications and adopt microservice-based architectures à la Netflix, Google, and Twitter. But there remain a lot of misperceptions and knowledge gaps to fill in. In this short post (with accompanying videos) JJ Jeyappragash explains what microservices are, offers some guidelines on how to do microservices right, and talks about security from a microservice perspective.

image

January 8, 2018

Sijie Guo

How to migrate Apache Kafka applications to Apache Pulsar

In this post and accompanying video tutorial, Streamlio engineer and co-founder Sijie Guo shows you how to migrate an Apache Kafka application to Apache Pulsar with no code changes using the Kafka API wrapper.

image

January 4, 2018

Luc Perkins

Why you should adopt a multi-tenant solution for real-time applications

Single-tenant systems simply don’t cut it in large enterprises with exacting requirements and dozens of teams working on massive applications. In this post I’ll talk about what multi-tenancy is, why it’s so important, and why Apache Pulsar provides an extremely compelling model of multi-tenancy.

image

December 18, 2017

Sanjeev Kulkarni

How Apache Pulsar saved the holidays: A story for the most magical time of the year

In this festive time of year, pour yourself a glass of eggnog, read this post, watch the videos, and learn how to code a queuing and streaming application using Apache Pulsar.

image

December 14, 2017

Ivan Kelly

Scaling out Total Order Atomic Broadcast with Apache BookKeeper

Total Order Atomic Broadcast (TOAB) is an essential property of any distributed system that wants to provide effectively-once semantics. Many systems can serve as a foundation for TOAB but here we’ll see why BookKeeper is by far the most scalable.

image

December 12, 2017

Jerry Peng, Luc Perkins

Introducing Heron Streamlets

A new higher-level API for creating Heron topologies.

image

December 5, 2017

Sijie Guo

Comparing Pulsar and Kafka: how a segment-based architecture delivers better performance, scalability, and resilience

In this second post in a series, Sijie Guo compares Apache Pulsar and Apache Kafka’s storage and failure recovery models.

image

December 1, 2017

Sijie Guo

Comparing Pulsar and Kafka: unified queuing and streaming

In this first post in a series, Sijie Guo compares Apache Pulsar and Apache Kafka’s messaging consumption, queuing, and streaming models.

image

November 28, 2017

Chris Kellogg

The Heron Stream Processing Engine on Google Kubernetes Engine

How to deploy a Heron toplogy on a GKE cluster powered by Kubernetes

image

November 8, 2017

Supun Kamburugamuve

Low-latency streaming with Heron on InfiniBand (part 2 of 2)

Examining experimental results from integrating the Heron stream processing engine with the InfiniBand network architecture

image

November 6, 2017

Sijie Guo

Why Apache Bookkeeper? Part 2

In this second of two blog posts on Apache BookKeeper Sijie Guo, co-founder of Streamlio talks about the features that provide I/O isolation, data distribution, scalability and operability.

image

November 3, 2017

Ivan Kelly

Cursors in Apache Pulsar

In this post and accompanying video, Ivan Kelly, PMC member for Apache BookKeeper and engineer at Streamlio, discusses how the Apache Pulsar messaging system uses durably stored cursors (an enhanced form of offset) to provide fault-tolerant messaging.

image

October 24, 2017

Matteo Merli

Strata Presentation: “Messaging, storage, or both” (Part 1)

In this first video in a 3-part series of a presentation given at Strata last month, Streamlio engineer Matteo Merli gives an overview of the Pulsar messaging system and what distinguishes it from other messaging systems from an enterprise standpoint, focusing on Pulsar’s scalability, multi-tenancy features, and geo-replication capabilities.

image

October 23, 2017

Ivan Kelly

Globally distributed Apache Pulsar quickstart

Deploy a multi-region Pulsar installation on Google Cloud Platform in just a few commands using an Ansible playbook.

image

October 20, 2017

Supun Kamburugamuve

Low-latency streaming with Heron on InfiniBand (part 1 of 2)

Taking the network layer beyond Ethernet can yield major gains in distributed processing systems

image

October 13, 2017

Jerry Peng

Exactly once is NOT exactly the same

In this post, we will take a look at different processing semantics for stream processing engines. We will examine what exactly-once processing semantics actually guarantees and the differences in the implementations of exactly-once processing semantics. Here at Streamlio, we have standardized on ‘effectively-once’ as our terminology, and we’ll explain why.

image

October 10, 2017

Sijie Guo

Geo-replication in Apache Pulsar, part 2: patterns and practices

An overview of common multi-datacenter geo-replication patterns for the Pulsar messaging system (full-mesh, active-active, active-standby, and aggregation) as well as recommended practices for system monitoring, capacity planning, and throttling.

image

October 9, 2017

Matteo Merli, Lewis Kaneshiro

Announcing Support for OpenMessaging

Streamlio has joined with industry leaders to announce OpenMessaging, a new project within the Linux Standards Foundation to deliver standards-based messaging interoperability to accelerate data-driven infrastructure.

image

October 5, 2017

Lewis Kaneshiro

Helping Product Managers deliver real-time product innovations

Enterprises rely on product managers to come up with ideas and turn those ideas into revenue-generating applications to spur the next leg of growth and to create shareholder value. That’s a lot of pressure! It’s more essential now than ever before that product managers have the correct software stack to realize those ideas, which, with the world moving toward ever-faster performance and response times, means embracing an end-to-end real-time solution.

image

October 2, 2017

Karthik Ramasamy

Heron: the extensible streaming engine

This post looks at the architecture of Heron, the real-time stream processing engine developed at and open-sourced by Twitter. The various modules in the system communicate between one another using well-defined communication protocols. As a consequence, Heron is highly extensible and allows the application developer, system administrator, or Heron contributor to create a new implementation of a specific module and plug it into the system without affecting other modules or the communication mechanisms between them.

image

September 27, 2017

Sijie Guo

Geo-replication in Apache Pulsar, part 1: concepts and features

A look at another enterprise-grade feature of Apache Pulsar: geo-replication. It is essential for enterprises today to have disaster avoidance and recovery strategies in place. Whereas other messaging systems can only replicate between two datacenters, Pulsar can scale out to as many as are needed. We will also look at the two types of geo-replication supported in Pulsar, synchronous and asynchronous, and when to use one vs the other.

image

September 26, 2017

Matteo Merli, Sijie Guo

Multi-tenant messaging with Apache Pulsar

How Apache Pulsar provides a messaging system fit for large, multifaceted enterprises

image

September 25, 2017

Sijie Guo

Messaging, storage, or both?

The real-time journey of Apache Pulsar, DistributedLog, and BookKeeper

image

September 18, 2017

John Crawford from ndustrial.io

Experiences porting Heron to Kubernetes

Containerization makes all the difference

image

September 14, 2017

Luc Perkins

Why choose a unified real-time platform?

Why durable messaging, compute, and stream storage need to be understood as an inseparable whole

image

September 12, 2017

Sijie Guo

Why Apache BookKeeper? Part 1: consistency, durability, availability

An in-depth overview of some of the compelling features and guarantees provided by BookKeeper

image

September 8, 2017

Sijie Guo

Introduction to Apache BookKeeper

A scalable, fault-tolerant, and low-latency log storage service optimized for real-time workloads

image

September 5, 2017

Karthik Ramasamy

Why Heron? Part 2

Continuation of the key compelling and enterprise-grade features provided by Heron, the open source stream processing system originally developed by Twitter.

image

September 1, 2017

Karthik Ramasamy

Why Heron? Part 1

An overview of the key compelling and enterprise-grade features that Heron provides in the open source.

image

August 28, 2017

Matteo Merli, Karthik Ramasamy

Why Apache Pulsar? Part 2

An overview of the key enterprise-grade features that Pulsar provides out of the box.

image

August 25, 2017

Matteo Merli, Karthik Ramasamy

Why Apache Pulsar? Part 1

An overview of the key enterprise-grade features that Pulsar provides out of the box.

image

August 22, 2017

Matteo Merli, Karthik Ramasamy

Introduction to the Apache Pulsar pub-sub messaging platform

A high-level overview of Pulsar, a recently open-sourced pub-sub messaging platform built for massive scale and zero data loss.

image

August 15, 2017

Lewis Kaneshiro

Announcing Streamlio

We’re excited to announce the launch of Streamlio, with Series A funding from LightSpeed Venture Partners. We’re building an enterprise-grade, unified, end-to-end real-time platform, by the co-creators of best-of-breed open source technologies proven at Twitter and Yahoo.

image

August 8, 2017

Karthik Ramasamy

Introduction to Heron

Heron is a real-time stream processing engine first deployed at Twitter. This post provides an overview of Heron for architects and developers interested in learning about its architecture and terminology.

image

May 5, 2017

Sanjeev Kulkarni

Heron going exactly-once

Data processing guarantees in streaming systems fall into three categories: at-most-once, at-least-once, and exactly-once. This blog post outlines the requirements and previews exactly-once guarantees in Heron, the next generation real-time stream processing engine.