StarRocks is a high-performance analytical data warehouse that leverages advanced technologies such as vectorization, MPP architecture, CBO, intelligent materialized views, and a real-time updatable columnar storage engine. It supports multidimensional, real-time, and high-concurrency data analysis. This article will introduce how to use StarRocks Routine Load to import data from AutoMQ into StarRocks. For a detailed understanding of the basic principles of Routine Load, refer to the Routine Load Basic Principles documentation.Documentation Index
Fetch the complete documentation index at: https://docs.automq.com/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
Prepare StarRocks and Test Data
Ensure that a usable StarRocks cluster is already prepared. For demonstration purposes, we refer to Deploy StarRocks with Docker to install a demonstration cluster on a Linux machine. Create test tables for the database and primary key model:Prepare AutoMQ and Test Data
Refer to Deploy Multi-Nodes Cluster on Linux▸ to deploy AutoMQ and ensure network connectivity between AutoMQ and StarRocks. Quickly create a topic namedexample_topic in AutoMQ and write a test JSON data to it following these steps.
Create Topic
Use the Apache Kafka® command-line tool to create a topic. Ensure you have access to the Kafka environment and that the Kafka service is running. Below is an example command to create a topic:Generate Test Data
Generate test data in JSON format that corresponds to the table mentioned earlier.Writing Test Data
Use Kafka’s command-line tools or programming methods to write test data into a Topic namedexample_topic. Here is an example using the command-line tool:
Creating Routine Load Import Job
Create a Routine Load job in the StarRocks command line to continuously import data from the AutoMQ Kafka Topic.Parameter Description
Data Format
The data format needs to be specified as JSON in thePROPERTIES clause with "format" = "json".
Data Extraction and Transformation
If you need to specify a mapping and conversion relationship between the source data and the target table columns, you can configure theCOLUMNS and jsonpaths parameters. In COLUMNS, the column names correspond to the column names of the target table, and the order of columns corresponds to the order of columns in source data. The jsonpaths parameter is used to extract the necessary field data from the JSON data, similar to newly generated CSV data. Subsequently, the COLUMNS parameter will temporarily name the fields in the order specified by jsonpaths. For more information on data conversion, refer to Data Conversion Implementation During Import.