Skip to Main Content

Apache Doris

Apache Doris is an MPP-based, high-performance, real-time analytics database renowned for its rapid and user-friendly capabilities, delivering sub-second response times for extensive data queries. It supports both high-concurrency point queries and high-throughput complex analysis scenarios. Apache Doris is adept at meeting the demands of report analysis, ad-hoc queries, unified data warehouse construction, and data lake federation query acceleration. Users can leverage Apache Doris to build applications for user behavior analysis, AB testing platforms, log retrieval analysis, user profile analysis, and order analysis.

This article will guide you on how to use Apache Doris Routine Load to import data from AutoMQ into Apache Doris. For a comprehensive understanding of Routine Load, please refer to the Routine Load Principles documentation.

Environment Preparation

Prepare Apache Doris and Test Data

Ensure a functional Apache Doris cluster is ready. For demonstration purposes, we have deployed a test Apache Doris environment on Linux following the Docker Deployment for Doris documentation.

Create databases and test tables:



create database automq_db;
CREATE TABLE automq_db.users (
id bigint NOT NULL,
name string NOT NULL,
timestamp string NULL,
status string NULL

) DISTRIBUTED BY hash (id) PROPERTIES ('replication_num' = '1');

Prepare Kafka Command Line Tools

Download the latest TGZ package from AutoMQ Releases and extract it. Assume the extraction directory is $AUTOMQ_HOME. In this article, we will use the tools found under $AUTOMQ_HOME/bin to create topics and generate test data.

Prepare AutoMQ and Test Data

Refer to the official AutoMQ deployment documentation to set up a functional cluster, ensuring network connectivity between AutoMQ and Apache Doris.

Quickly establish a topic named example_topic in AutoMQ and input test JSON data by following these steps.

Create Topic

Utilize the Apache Kafka command-line tool to create the topic, ensuring access to a Kafka environment and verifying that the Kafka service is operational. Here is a sample command for creating the topic:


$AUTOMQ_HOME/bin/kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 127.0.0.1:9092 --partitions 1 --replication-factor 1

When executing the command, replace 'topic' and 'bootstrap-server' with the actual AutoMQ Bootstrap Server address.

After establishing the topic, use the following command to confirm if the topic has been successfully created.


$AUTOMQ_HOME/bin/kafka-topics.sh --describe example_topic --bootstrap-server 127.0.0.1:9092

Generate Test Data

Generate a sample JSON-formatted data entry, corresponding to the previous table.


{
"id": 1,
`"name": "Test User",`
"timestamp": "2023-11-10T12:00:00",
"status": "active"
}

Inserting Test Data

Utilize Apache Kafka®'s command-line tool or a programmatic approach to inject test data into a Topic named example_topic. Here's an example using the command-line tool:


`echo '{"id": 1, "name": "Test User", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 127.0.0.1:9092 --topic example_topic`

To view the recently inserted Topic data, use the following command:


sh $AUTOMQ_HOME/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --topic example_topic --from-beginning

When executing the command, you need to replace topic and bootstrap-server with the actual address of the AutoMQ Bootstrap Server.

Create a Routine Load Import Job

In the Apache Doris command line, set up a Routine Load job designed to continuously import JSON data from the AutoMQ Kafka topic.


CREATE ROUTINE LOAD automq_example_load ON users
COLUMNS(id, name, timestamp, status)
PROPERTIES
(
"format" = "json",
"jsonpaths" = "[\"$.id\",\"$.name\",\"$.timestamp\",\"$.status\"]"
)
FROM KAFKA
(
"kafka_broker_list" = "127.0.0.1:9092",
"kafka_topic" = "example_topic",
"property.kafka_default_offsets" = "OFFSET_BEGINNING"
);

For detailed parameters of the Routine Load, please consult the Doris Routine Load documentation.

When Issuing the Command, Replace kafka_broker_list with the Actual Address of the AutoMQ Bootstrap Server.

To verify the data import, first check the status of the Routine Load job to confirm that it is active.


show routine load\G;

Next, examine the relevant tables within the Apache Doris database to verify that the data has been successfully imported.


select * from users;
+------+--------------+---------------------+--------+
| id | name | timestamp | status |
+------+--------------+---------------------+--------+
| 1 | Test User | 2023-11-10T12:00:00 | active |
| 2 | Test User | 2023-11-10T12:00:00 | active |
+------+--------------+---------------------+--------+
2 rows in set (0.01 sec)