Documentation Index
Fetch the complete documentation index at: https://docs.automq.com/llms.txt
Use this file to discover all available pages before exploring further.
Resource Allocation Management
Recommendation 1: Set the Topic partition count appropriately to avoid throughput bottlenecks and wastage
The number of Kafka Topic partitions affects the production and consumption throughput a Topic can support. To ensure message order, messages with the same partition key are sent to the same partition, and each Kafka partition can only be processed by one consumer. AutoMQ is built on object storage, which, compared to Apache Kafka’s local file architecture, can support several times the partition performance under the same cluster scale. Additionally, AutoMQ does not charge for partitions. When allocating partitions using AutoMQ, there’s no need to consider costs. You only need to evaluate the appropriate number of partitions based on the estimated partition throughput performance (AutoMQ’s single partition write throughput limit is 4MB/s) to avoid having to expand partitions later when business volume increases. The number of partitions does not need to be a multiple of the number of nodes, as AutoMQ will automatically balance the partitions.Recommendation 2: For platform-based application scenarios, enable ACL to achieve strict upstream and downstream access control
In platform-based scenarios, such as real-time computing platforms, it is recommended to enable ACL when using Kafka to achieve fine-grained resource control. Once resource control is enabled, Kafka clients must authenticate to access specific Topics and Groups. The benefits of this approach are as follows:- It prevents business parties from arbitrarily creating new Topics and other resources, which can lead to resource abuse and governance issues.
- It helps avoid subscription chaos and impacts on load balancing caused by different business parties sharing Consumer Groups.
- By identifying Topics and Consumer Groups, it becomes easier to identify related subscribers and upstream and downstream business groups, facilitating business governance.
Producer Application
Recommendation 1: For Kafka Producer clients version 2.1 or lower, set the retry count
Producer applications need to check the SDK version. If the current version is less than 2.1, it is necessary to manually set the retry count to ensure automatic retries in case of message send failures. This helps avoid failures caused by server maintenance, self-balancing, etc. Retry parameter settings can be referenced as follows:Recommendation 2: Optimize the batch parameters of the Producer to avoid excessive QPS consumption by fragmented requests
Kafka is a stream storage system designed for high throughput scenarios. The most typical use case for Kafka is to improve the efficiency and throughput of data transmission through batching. In the process of sending messages by the Producer, it is necessary to set the batching parameters reasonably to avoid the following situations:- Avoid sending only one message per request: If the Producer sends only one message per request, it will generate a large number of Produce requests, consuming server-side CPU and degrading cluster performance.
- Avoid setting excessively long batching wait times: When setting batch parameters, the Producer needs to set a reasonable wait time to avoid delays in sending messages due to incomplete batching in low-traffic scenarios.
Recommendation 3: Set ack=all to ensure message durability before responding
Producers can adjust the ack parameter to balance between data durability and sending latency:- ack=all (default value): The server responds to the client only after the data has been persisted to cloud storage. In the event of a server crash, successfully acknowledged messages will not be lost.
- ack=1: In alignment with Apache Kafka®, the AutoMQ server responds to the client immediately after the message is received in memory. If the server crashes, messages that haven’t been persisted will be lost.
Consumer Application
Recommendation 1: For consumers using the Assign mode, upgrade to version 3.2 or above to ensure offset commit success
When a Kafka Consumer application consumes messages using the Assign mode, i.e., self-assigning partition load balancing mode, it is necessary to ensure that the SDK version is upgraded to version 3.2 or above. This is because earlier versions have a defect in the Assign mode where committed consumer offsets are not updated promptly, resulting in consumers failing to commit and update offsets in a timely manner. For detailed defect records, refer to Kafka Issue KAFKA-13563.Recommendation 2: Control the Consumer heartbeat timeout and messages polled per poll to avoid frequent rebalancing
Kafka Consumers are grouped and load-balanced through the same Group Id. If a single consumer experiences a heartbeat timeout, it will be expelled from the consumer group, and the remaining consumers will undergo self-balancing to reassign partitions. In production scenarios, program parameter errors that lead to unexpected heartbeat timeouts and self-balancing should be avoided; otherwise, it will cause the consumer group to keep changing and fail to consume messages. Causes of unexpected self-balancing:- Client versions before v0.10.2: Consumers did not have an independent thread to maintain the heartbeat. Instead, heartbeat maintenance was coupled with the poll interface. As a result, if the user’s consumption is delayed, it leads to a Consumer heartbeat timeout, triggering self-balancing.
- Client versions v0.10.2 and later: If the consumption time is too slow and exceeds a certain period (set by max.poll.interval.ms, default is 5 minutes) without polling for messages, the client will voluntarily leave the queue, triggering self-balancing.
Recommendation 3: In production scenarios, commit the Consumer offset, but avoid committing offsets too frequently
When consuming messages using Kafka, whether you use the standard Kafka Consumer SDK or frameworks like Flink Connector, it is advisable to commit consumer offsets. This allows you to monitor consumption lag and mitigate risks. There are two ways to commit consumer offsets: automatic and manual. The controlling parameter is:- enable.auto.commit: Whether to use the automatic offset commit mechanism. The default value is true, indicating that the automatic commit mechanism is used by default.
- auto.commit.interval.ms: The interval for auto-committing offsets. The default value is 1000, which is 1 second.
Recommendation 4: Avoid consumption blocking and accumulation in production scenarios
Kafka consumers process messages in the order of partitions. If a message causes a consumption block due to business logic, it will affect the consumption of subsequent messages in the current partition. Therefore, in production environments, ensure that the consumption logic does not lead to permanent blocking. If unexpected blocking occurs, it is recommended to follow the steps below:- Determine if it can be skipped: If the abnormal message can be skipped, stop the consumer first, then go to the AutoMQ Console to reset the consumption offset to the next message to achieve a skip.
- Determine if it cannot be skipped: Fix the consumption logic to handle the abnormal message.