How to migrate Apache Kafka applications to Apache Pulsar

January 8, 2018

Sijie Guo

Evaluating a next-generation distributed pub-sub system can be a time-consuming affair. The Apache Pulsar community has been working on simplifying the process of installing and deploying Pulsar to make sure that it’s easier to experiment with and evaluate Pulsar as well as migrating existing applications from one pub-sub system to another. The Apache Pulsar community introduced a Kafka API-compatible Java library beginning with Pulsar version 1.20.0, which was released in October 2017.

The Kafka API-compatible Pulsar client

The Kafka API-compatible Pulsar client for Java is comatible with Kafka 0.10.2.1. This Pulsar client library is available at Maven Central and can be installed using Maven, Gradle, and other build tools.

If you’re using Maven, you can replace the Kafka client dependency with the following Pulsar client dependency:

<dependency>
    <groupId>org.apache.pulsar</groupId>
    <artifactId>pulsar-client-kafka</artifactId>
    <version>1.21.0-incubating</version>
</dependency>

If you’re using Gradle, you can replace the Kafka client dependency with the following Pulsar client dependency:

dependencies {
    compile "org.apache.pulsar:pulsar-client-kafka:1.21.0-incubating"
}

Migration example: a log collection pipeline

A very common use case for Apache Kafka is as a log collection pipeline. A log collection pipeline is illustrated below:

In this diagram:

  • Applications → Kafka — Logs are sent from web servers, applications, and various systems and published to Kafka topics. This can be done in many ways. In the example diagram above, we’re using a Kafka SLF4J appender to append logs to Kafka topics.
  • Kafka → Elasticsearch — Logs are read from Kafka topics and sent to Elasticsearch for building indexes. In this example, we’re using Kafka Connect to connect Kafka topics and Elasticsearch indexes.

Through the remainder of this blog post, I’ll show you how to migrate your Kafka applications—such as the log collection pipeline illustrated above—to Pulsar. The demo consists of two parts:

  1. Modifying the application so that it sends log data to Apache Pulsar by using the pulsar-client-kafka library.
  2. Using Kafka Connect to collect log data from Apache Pulsar, rather than from Kafka, by replacing the Kafka client’s jar with the pulsar-client-kafka jar.

You can follow the steps below or watch this video:

Migration instructions

All of the repos used in this tutorial are available on GitHub:

Migrate the Kafka logger

The Kafka logger demo application is available on GitHub:

$ git clone https://github.com/merlimat/kafka-logger-demo

Initially, this Kafka logger demo application is configured to append log data to Kafka topics.

The pom.xml configuration file, for example, is configured to append log data to Kafka topics.

<dependencies>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>1.7.24</version>
    </dependency>

    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-core</artifactId>
        <version>2.9.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-slf4j-impl</artifactId>
        <version>2.9.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>1.0.0</version>
    </dependency>
</dependencies>

In addition, in src/main/resources/log4j2.xml, the application is configured to use a Kafka appender:

<Configuration status="INFO">
    <Appenders>
        <Console name="Console" target="SYSTEM_OUT">
            <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n" />
        </Console>
        <Kafka name="Kafka" topic="log-test">
            <JsonLayout />
            <Property name="bootstrap.servers">localhost:9092</Property>
        </Kafka>
    </Appenders>
    <Loggers>
        <Root level="info">
            <AppenderRef ref="Console" />
        </Root>
        <Logger name="demo" level="info">
            <AppenderRef ref="Kafka" />
        </Logger>
    </Loggers>
</Configuration>

Migrating the Kafka logger application is pretty simple. You can change the pom.xml dependency to the pulsar-client-kafka library and configure log4j2 to point to your Pulsar cluster. Here’s the updated pom.xml file:

<dependencies>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>1.7.24</version>
    </dependency>
    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-core</artifactId>
        <version>2.9.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-slf4j-impl</artifactId>
        <version>2.9.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.pulsar</groupId>
        <artifactId>pulsar-client-kafka</artifactId>
        <version>1.21.0-incubating</version>
    </dependency>
</dependencies>

And here’s the updated log4n2.xml file:

<Configuration status="INFO">
    <Appenders>
        <Console name="Console" target="SYSTEM_OUT">
            <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n" />
        </Console>
        <Kafka name="Kafka" topic="persistent://sample/standalone/ns1/log-test">
            <JsonLayout />
            <Property name="bootstrap.servers">pulsar://localhost:6650</Property>
        </Kafka>
    </Appenders>
    <Loggers>
        <Root level="info">
            <AppenderRef ref="Console" />
        </Root>
        <Logger name="demo" level="info">
            <AppenderRef ref="Kafka" />
        </Logger>
    </Loggers>
</Configuration>

Migrate Kafka Connect

The Kafka Connect Elasticsearch connector is available on GitHub:

$ git clone https://github.com/merlimat/kafka-connect-elasticsearch/tree/0.10.0.0

Similar to migrating the Kafka logger application, you’ll need to:

  • Delete the kafka-clients library from your Kafka connect lib directory and add the pulsar-client-kafka library instead.
  • Update your Kafka Connect configuration to point the bootstrap.servers to your Pulsar cluster

    # Comment out Kafka bootstrap servers
    # bootstrap.servers=localhost:9092
    
    # Add Pulsar bootstrap servers
    bootstrap.servers=pulsar://localhost:6650
  • Update your kafka-elasticsearch configuration to read from Pulsar topics instead of from Kafka topics

    # Comment out Kafka topic
    # topics=log-test
    
    # Add Pulsar topics
    topics=persistent://sample/standalone/ns1/log-test
    topic.index.map=persistent://sample/standalone/ns1/log-test:log-test

More on Apache Pulsar

If you want to learn more about Apache Pulsar, please visit the official website at https://pulsar.apache.org.

If you want to learn more about the differences between Apache Pulsar and Apache Kafka, please check out our series of comparison blog posts:

You can also participate in the Pulsar community via:

The Pulsar slack channel. You can self-register at https://apache-pulsar.herokuapp.com. The Pulsar email list.

For getting the latest updates about Pulsar, you can follow the projects on Twitter @apache_pulsar.