Issue with bulk ingestion with few Query nodes hitting maximum memory limit #28682
-
Hi team, We are doing bulk ingestion of around 1 billion vectors for some scale testing, below are the configurations we used in milvus: Milvus Details : 2.3.1 version, Milvus cluster deployed in kuberenetes. Using external S3 for indexing and external Kafka (AWS MSK) (able to reproduce same issue in 2.3.3 version)
We have referred to milvus sizing tool for the setup we had, screenshot attached below Referring to this, We upscaled 54 query nodes of 12core cpu/ 64gb mem, 6 data nodes with 8core cpu/16gb mem. Note that we took a buffer and upscaled more nodes. We are hitting memory limits and seeing interruptions in ingestion with below error in query nodes:
The above error states memory quota is not enough, but I can see enough free capacity in kubernetes. I can see 1 query node reaching 95% mem consumption and it is throwing the above errors, while all other query nodes are at 62% mem consumption.
When we upscaled more query nodes (54 to 60) after 5 minutes I noticed the memory utilization reducing on the over-utilized node and ingestion continued. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 11 replies
-
Beta Was this translation helpful? Give feedback.
-
The primary key is not auto-generated: Did you assign different unique values to 'reviewer_id' of each entity when you call insert() to do bulk ingestion? |
Beta Was this translation helpful? Give feedback.
If shard_num = 4, there will be 4 "data channel" for the collection. And 4 query nodes as "leader", each of them consumes the data from one channel.
When you insert data, the proxy node hashes each "reviewer_id" to be an integer value. The hash value is mod with 4 to determine which channel the entity belongs to.
If all the values of "reviewer_id" are the same, all the data will be consumed by the same channel. That means only one channel is busy, others are idle.