Skip to Main Content

StarRocks

StarRocks is a high-performance analytical data warehouse that leverages advanced technologies such as vectorization, MPP architecture, CBO, intelligent materialized views, and a columnar storage engine with support for real-time updates. It is designed to handle multidimensional, real-time, and high-concurrency data analysis efficiently.

This article provides a step-by-step guide on using the StarRocks Routine Load to import data from AutoMQ into StarRocks. For an in-depth understanding of Routine Load, please consult the Routine Load Fundamental Principles documentation.

Environment Preparation

Prepare StarRocks and Test Data

Ensure a functional StarRocks cluster is set up. For demonstration purposes, we use Deploying StarRocks Using Docker to install a cluster on a Linux machine.

Create a test table within a database using a primary key model:



create database automq_db;
create table users (
id bigint NOT NULL,
name string NOT NULL,
timestamp string NULL,
status string NULL
) PRIMARY KEY (id)
DISTRIBUTED BY HASH(id)
PROPERTIES (
"replication_num" = "1",
"enable_persistent_index" = "true"
);

Prepare AutoMQ and Test Data

Consult Deploy Locally▸ to set up AutoMQ, ensuring it can connect to StarRocks over the network.

Create a topic named example_topic in AutoMQ and write some test JSON data to it by following these steps.

Create Topic

Use the Apache Kafka® command line tools to create the topic. Ensure you have access to a Kafka environment and that the Kafka service is active. Here's an example command to create the topic:


./kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 10.0.96.4:9092 --partitions 1 --replication-factor 1

When executing the command, replace topic and bootstrap-server with the actual address of your Kafka server.

Once the topic is created, use the following command to confirm that the topic has been successfully established.


./kafka-topics.sh --describe example_topic --bootstrap-server 10.0.96.4:9092

Generate Test Data

Generate JSON-formatted test data that matches the table mentioned earlier.


{
"id": 1,
"name": "Test User",
"timestamp": "2023-11-10T12:00:00",
"status": "active"
}

Writing Test Data

Use Kafka's command line tool or a programmatic approach to write test data into a topic named example_topic. Here is an example using the command line tool:


`echo '{"id": 1, "name": "Test User", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-produducer.sh --broker-list 10.0.96.4:9092 --topic example_topic`

To view the data just written to the topic, use the following command:


sh kafka-console-consumer.sh --bootstrap-server 10.0.96.4:9092 --topic example_topic --from-beginning

When executing the command, replace topic and bootstrap-server with the actual Kafka server addresses being used.

Creating a Routine Load Job

In the StarRocks command line, create a Routine Load job to continuously import data from the AutoMQ Kafka topic.


CREATE ROUTINE LOAD automq_example_load ON users
COLUMNS(id, name, timestamp, status)
PROPERTIES
(
"desired_concurrent_number" = "5",
"format" = "json",
"jsonpaths" = "[\"$.id\",\"$.name\",\"$.timestamp\",\"$.status\"]"
)
FROM KAFKA
(
"kafka_broker_list" = "10.0.96.4:9092",
"kafka_topic" = "example_topic",
"kafka_partitions" = "0",
"property.kafka_default_offsets" = "OFFSET_BEGINNING"
);

When executing the command, replace kafka_broker_list with the actual Kafka server addresses being used.

Parameter Description

Data Format

The data format must be specified as JSON by setting "format" = "json" in the PROPERTIES clause.

Data Extraction and Transformation

To define the mapping and transformation relationship between source data and the destination table, configure the COLUMNS and jsonpaths parameters. In the COLUMNS section, column names should align with the names in the destination table, and the order of columns should match the sequence in the source data. The jsonpaths parameter is designed to extract the necessary field data from the JSON, similar to creating new CSV data. The COLUMNS parameter then temporarily assigns names to fields based on their order in the jsonpaths. For further details on data transformation, please refer to the Data Transformation during Import Documentation.

If each JSON object per line features keys that match the names and count of the columns in the destination table (order is not necessary), setting up COLUMNS becomes redundant.

Validate Data Import

Begin by checking the status of the Routine Load import job to confirm that it is operational.


show routine load\G;

Upon querying the StarRocks database, it is evident that the data has been successfully imported.


StarRocks > select * from users;
+------+--------------+---------------------+--------+
| id | name | timestamp | status |
+------+--------------+---------------------+--------+
| 1 | Test User | 2023-11-10T12:00:00 | active |
| 2 | Test User | 2023-11-10T12:00:00 | active |
+------+--------------+---------------------+--------+
2 rows in set (0.01 sec)