Skip to Main Content

Databend

Databend is a next-generation cloud-native data warehouse developed in Rust and designed for cloud architecture. It leverages object storage to provide enterprises with a unified lakehouse architecture, offering a big data analytics platform with separated compute and storage.

This article will introduce how to ingest data from AutoMQ into Databend using bend-ingest-kafka.

Prerequisites

Prepare Databend Cloud and Test Data

First, go to Databend Cloud to activate a Warehouse, and create a database and a test table in the worksheet.


create database automq_db;
create table users (
id bigint NOT NULL,
name string NOT NULL,
ts timestamp,
status string
)

Prepare AutoMQ and Test Data

Refer to Deploy Locally▸ to deploy AutoMQ, ensuring network connectivity between AutoMQ and Databend.

Quickly create a Topic named example_topic in AutoMQ and write a test JSON data into it, following the steps below.

Create Topic

To create a topic using the Apache Kafka® command-line tool, ensure that you have access to a Kafka environment and that the Kafka service is running. Here is an example command to create a topic:


./kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 10.0.96.4:9092 --partitions 1 --replication-factor 1

When executing the command, replace topic and bootstrap-server with the actual Kafka server address you are using.

After creating the topic, you can use the following command to verify that the topic has been successfully created.


./kafka-topics.sh --describe example_topic --bootstrap-server 10.0.96.4:9092

Generate Test Data

Generate JSON formatted test data that corresponds with the table mentioned earlier.


{
"id": 1,
"name": "test user"
"timestamp": "2023-11-10T12:00:00",
"status": "active"
}

Write Test Data

Write test data into a Topic named example_topic using Kafka command line tools or programmatically. Below is an example using command line tools:


```sh
```markdown
echo '{"id": 1, "name": "Test User", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 10.0.96.4:9092 --topic example_topic

When executing the command, make sure to replace topic and bootstrap-server with the actual Kafka server address.

Use the following command to view the data just written to the topic:


sh kafka-console-consumer.sh --bootstrap-server 10.0.96.4:9092 --topic example_topic --from-beginning

Create bend-ingest-databend Job

bend-ingest-kafka can monitor Kafka and batch write data into Databend Table. After deploying bend-ingest-kafka, you can start the data import job.


bend-ingest-kafka --kafka-bootstrap-servers="localhost:9094" --kafka-topic="example_topic" --kafka-consumer-group="Consumer Group" --databend-dsn="https://cloudapp:password@host:443" --databend-table="automq_db.users" --data-format="json" --batch-size=5 --batch-max-interval=30s

When executing the command, make sure to replace kafka-bootstrap-servers with the actual Kafka server address.

Parameter Description

databend-dsn

Databend Cloud provides a DSN for connecting to the warehouse, which can be referenced in this documentation.

batch-size

bend-ingest-kafka accumulates data up to the batch size before triggering a data synchronization.

Validate Data Import

Navigate to the Databend Cloud worksheet and query the automq_db.users table. You will see that the data has been synchronized from AutoMQ to the Databend table.