Apache Pulsar 101

Learn how to build a multi-region cryptocurrency quoting application

Description

In this course you will be introduced to Apache Pulsar, a next-generation distributed pub-sub system that was developed at Yahoo. It has run in production at massive scale for more than three years, powering applications such as Yahoo! Mail, Yahoo! Finance, Yahoo! Sports, Flickr, the Gemini Ads platform, and Sherpa, Yahoo’s distributed key-value store. You will learn how to deploy Apache Pulsar to AWS and Google Cloud, and how to use Apache Pulsar to build your own multi-currency quoting application that feeds pricing information to a cryptocurrency trading platform that is deployed around the globe.

Given the volatility of cryptocurrency prices, sub-second message latency is critical to traders. Equally important is ensuring that consistent quotes are available to all geographical locations, i.e. the price of Bitcoin shown to a user in the USA should be the same as it to a trader in Hong Kong.

Please note that here at Streamlio we are using this to teach Apache Pulsar as a demo application. We do not certify the application for actual use with trading cryptocurrencies.

Additional resources

Before getting started with this course please check out the following resources:

Course contents

In this overview we will go over all the content you will learn during the 7 segments this course comprises. By the end of this course you will have a working multi-region cryptocurrency demo application up and running. This course is designed for sequential learners to provide everything you need to know. However, if you feel like you know any given section feel free to proceed directly to a different segment. Timings are a guideline, so if you find it is taking a little longer don’t worry, keep on going. The payoff is worth it.

Segment 1: Understanding the use case

In this use case a developer has been tasked with building a multi-currency quoting application that feeds pricing information to a cryptocurrency trading platform that is deployed around the globe.

  • The first phase will attempt to prove the capability of the platform to identity and react to cryptocurrency arbitrage opportunities across the various cryptocurrency trading platforms, and exploit them for a profit. Due to financial regulations, these locations can only execute trades on exchanges within their respective geographical region. However, there will be arbitrage opportunities between exchanges in different geographical regions and this platform must be able to exploit those opportunities as well.
  • In the second phase, in order to minimize risk, the company does not want to retain any open positions (long or short) in a particular cryptocurrency. Therefore all arbitrage trades must result in a net zero position. This means that each arbitrage trade will consist of two trades (the buy and the sell) that must be completed concurrently.

We will review the technical specifications need to achieve these two phases.

Segment 2: Deploying Apache Pulsar on your cloud platform of choice

In this segment you will learn how to install Apache Pulsar and necessary components on bare metal on either Google Cloud or AWS, and how to get a cluster up and running.

  • Segment 2a: Getting started with Apache Pulsar on Google Cloud
  • Segment 2b: Getting started with Apache Pulsar on AWS

Segment 3: Preparing Apache Pulsar—properties, namespaces, topics and subscriptions

In this segment you will learn how to SSH into a Pulsar broker pod, and how to use the pulsar-admin CLI tool to create clusters and namespaces. We also will use pulsar-admin to display information about the Pulsar environment.

If you are not familiar with Apache Pulsar terminology and concepts you should watch this quick tutorial first:

Segment 4: Getting source data: Crypto Currency Data Feeds

In this section we will take a deep dive into the cryptocurrency data feeds and teach you how to configure them to get the data you need to drive the cryptocurrency quoting platform. We will be using the CryptoCompare API to provide live, real-time cryptocurrency pricing and trade data for this demo. It is the only data provided that has licensed its data under the Creative Commons license for use. We will review how to create a microservice to capture the cryptocurrency price data, and run the Docker image locally to confirm it is working as expected.

Segment 5: Ingesting data into Pulsar using Apache NiFi processors

In this segment you will learn how to use Apache NiFi to load the cryptocurrency price data into Pulsar. We will cover the steps necessary to create a Kubernetes pod based on the NiFi Docker image, and walk through the data flow from the cryptocurrency feeds into Pulsar.

Segment 6: Introduction to basic stream processing with Apache Pulsar functions

In this segment you will learn about the basics of Pulsar functions. We will review the code for the exponentially weighted moving average (EWMA) P-function, and build and deploy the jar file. You will then learn how to launch the P-function k8s pod and how to configure the P-function’s input and output Pulsar topics.

Segment 7: Visualization through crypto-currency trading dashboard

In this segment we will teach you how to use the web-socket API for Pulsar, and how to configure subscriptions to topics. We will then deploy the dashboard UI and wrap-up with a tour of the user interface. Once you complete this last segment you will have a fully-functioning multi-region crypto-currency quoting platform.

Wrap-up

In this course we introduced you to Apache Pulsar. You now know how to:

  • deploy Apache Pulsar to AWS or Google Cloud Platform
  • use Apache Pulsar to build your own multi-currency quoting application