Event-Driven Architecture Using Streamlio

October 2, 2018

David Kjerrumgaard

An Event Driven Architecture can be used to implement several real-world streaming use cases such as e-commerce, fraud detection, cybersecurity, financial trading, IoT, preventative maintenance, etc.

In this inaugural blog post in our solution architecture series, I will present an event-driven architecture based upon the core technologies within the Streamlio platform. I will cover in detail the role each component plays within an event processing framework, and show you how Streamlio provides a complete event processing framework.

What is Event-Driven Architecture (EDA)?

An Event-Driven Architecture (EDA) is a software architecture pattern built upon the production, detection, consumption, and reaction to events. An event-driven architecture consists of event producers that generate events, and event consumers that consume and react to events. An event is anything that occurs at a clearly defined time and can be recorded, e.g. a customer places an order, a user clicks on a webpage, or a sensor recorded a measurement.

Events are streamed in near real-time, so consumers can respond to events as soon as they occur. In an event-driven architecture, event producers and consumers are decoupled from one another, which allows for asynchronous processing. Your event processing framework serves as the platform that connects the event producers to the event consumers.

Figure 1: Event-Driven Architecture
Figure 1: Event-Driven Architecture

Since there are multiple types of event sources, an event processing framework needs to support both a streaming model and a pub/sub model based upon the source of your events.

  • Event Stream Model: Events are published to a central log, and are strictly ordered by their arrival time within a partition. Event consumers can read from any part of the log, but are responsible for keeping track of which messages they have processed. An event stream is a continuous flow of events, ordered by time.

  • Pub/Sub Model: In this model a broker keeps track of the subscriptions, and when an event is published, sends the event to each of the subscribers. Events are not retained, and cannot be replayed, nor can new subscribers see “old” events.

An event processing framework should provide the flexibility to support the development of both simple and complex event processing logic. While no formal definitions exist for these terms, we categorize these as follows:

  • Simple Event Processing: The arrival of the event immediately triggers an action within the event consumer, and these actions are typically either stateless, and perform all of their logic based on the event contents only, or stateful and can retain information across invocations in order to perform slightly more complex logic.

  • Complex Event Processing: This type of event consumer processes a series of events and performs much more sophisticated pattern analysis for the purpose of identifying meaningful patterns or relationships, and detects event correlation, causality, or timing. Typical use cases that require event stream processing are e-commerce, fraud detection, cybersecurity, financial trading, and other use cases that require immediate responses.

An EDA Reference Architecture

Implementing an event-driven architecture requires much more than simply installing an event messaging framework such as Apache Pulsar, or using serverless functions and microservices. An enterprise-grade event-driven architecture must provide the following capabilities that enable you to emit and process events, as shown below in figure 2:

These capabilities fall into the following categories:

  • Input/Output: This layer of the architecture serves as the interaction point for external systems that generate events such as sensors, devices, applications, etc. It also provides connectors to traditional databases, data warehouses, or legacy applications that consume and/or update their internal state based upon the content of the events to which they are subscribed.

  • Messaging: Messaging use cases can be divided into two categories.

    • Queueing — Unordered messaging, or point-to-point or shared messaging. These types of use cases are usually found in conjunction with stateless applications. Stateless applications don’t care about ordering, but they do require the ability to acknowledge/remove individual messages as well as the ability to scale the consumption parallelism as much as possible.

    • Streaming — Strictly ordered messaging, or exclusive messaging. Exclusive messaging means there is only one consumer per partition/topic. These types of use cases are usually associated with stateful applications that care about ordering and state because they have a direct impact on the correctness of the processing logic.

  • Event Processing: This layer encompasses all of the computational infrastructure necessary to perform both simple and complex event processing:

Figure 2: Event-Driven Reference Architecture
Figure 2: Event-Driven Reference Architecture
  • Stream Store: In this type of infrastructure there is a real-time, high-throughput, fault-tolerant, low-latency distributed transaction log used to record events as they enter the system. The “log stream” is the fundamental storage abstraction for this layer of the architecture, and the Stream Storage is responsible for the storage and serving of these log streams.

  • Analytics: In this type of architecture, the stream store serves as the distributed transaction log, tracking changes happening within it, and various analytical engines in your architecture, such as distributed key-value databases, machine learning model repositories, and distributed SQL query engines become the materialized views of this giant distributed log.

  • Tiered Storage: A modern EDA must retain event data for an indefinite amount of time, in its native format while simultaneously providing a cost-effective solution for doing so. In order to meet both of these requirements, the Stream Storage component must provide built-in support for tiered storage that allows you to seamlessly move your data onto and off of lower cost storage platforms.

The Streamlio Reference EDA

While you may not necessary need all of these capabilities for every application that you develop, it is important to build upon a framework that provides them. The Streamlio platform provides all of the core capabilites required for an event-driven architecture as shown below in Figure 3.

Figure 3: Streamlio’s Event-Driven Reference Arcitecture
Figure 3: Streamlio’s Event-Driven Reference Arcitecture

These capabilities are provided by the following software components:

  • Input/Output: This layer of the architecture is provided by the Apache Pulsar IO framework, which is an extensible framework for connecting Apache Pulsar with other systems and protocols. Currently, the bundle includes support for Aerospike, Cassandra, Amazon Kinesis, RabbitMQ, and more. We cover how to write your own Pulsar connectors in another blog post in this series.

  • Messaging: Apache Pulsar is a next generation distributed pub-sub messaging system, with enterprise features including multi-tenancy, multi-datacenter replication and strong durability guarantees. It is also a very flexible system, supporting both queuing and streaming. For an introduction to Apache Pulsar please check out this post.

  • Event Processing:

    • Simple Event Processing: Apache Pulsar Functions is a stream-native processing capability for Apache Pulsar. Pulsar Functions allows developers to create fast, flexible data processing tasks that operate on data in motion as it passes through Pulsar, without requiring external systems or add-ons.

    • Complex Event Processing: Apache Heron is a real-time stream processing engine first deployed at Twitter that allows you to develop real-time workflows and analytics use cases to identify more complex patterns of interest, and can be used to automate or trigger more sophisticated prescriptive actions.

  • Stream Storage: Apache BookKeeper, together with Apache DistributedLog, form the highly scalable log stream storage for real-time streaming log and event data. BookKeeper combines the abilities of storing past data and propagating future data in one storage system. For an introduction to Apache BookKeeper please check out this blog post.

  • Analytics: Apache Pulsar integrates with the Presto SQL query engine to allow SQL queries directly against data residing in Pulsar, thereby eliminating the requirement for a complex ETL process to move the data to another platform that supports SQL queries. The elimination of this additional step dramatically reduces the time analysts must wait before they can query the data.

  • Tiered Storage: Apache BookKeeper includes support for tiered storage including to Amazon S3 and Google Cloud Storage, with many more being added.

Conclusion

In this blog post we reviewed the event-driven architecture pattern, and presented a reference event-driven architecture based upon Streamlio core technologies. Hopefully, you can see that the Streamlio platform provides all of the capabilities you need for an event processing framework.

Over the course of this blog series, we will present how to utilize each of these capabilities within the Streamlio platform to deliver an event driven application. We will provide design patterns, best practices, and architectural guidance on how to implement solutions on top of these core technologies in subsequent blog posts in this series.