You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
model="Qwen/Qwen2.5-72B-Instruct"
volume=/media/data_drive_0 # share a volume with the Docker container to avoid downloading weights every run
token=$HF_TOKEN
shards=4
docker run --gpus all --shm-size 1g \
-e HUGGING_FACE_HUB_TOKEN=${token} \
-e CUDA_VISIBLE_DEVICES=0,1,2,3 \
-p 8084:80 \
-v $volume:/data \
ghcr.io/huggingface/text-generation-inference:3.0.0 \
--model-id ${model} \
--huggingface-hub-cache /data/.cache/huggingface/hub \
--validation-workers ${shards} \
--num-shard ${shards} \
--sharded true \
--max-input-length 32000 \
--max-total-tokens 32768 \
--rope-scaling dynamic \
--rope-factor 1 \
--cuda-memory-fraction 0.8 \
--dtype bfloat16 \
Send request
$ curl localhost:8084/v1/chat/completions \
-X POST \
-d '{
"model": "tgi",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is deep learning?"
}
],
"stream": false,
"max_tokens": 20,
"n": 3
}' \
-H 'Content-Type: application/json'
{"object":"chat.completion","id":"","created":1736981209,"model":"Qwen/Qwen2.5-72B-Instruct","system_fingerprint":"3.0.0-sha-8f326c9","choices":[{"index":0,"message":{"role":"assistant","content":"Deep learning is a subset of machine learning, which in turn is a subset of artificial intelligence (AI"},"log
probs":null,"finish_reason":"length"}],"usage":curl localhost:8084/v1/chat/completions \0,"total_tokens":44}}
Expected behavior
The chat API is expected to return multiple responses when n > 1.
The text was updated successfully, but these errors were encountered:
System Info
Docker container:
ghcr.io/huggingface/text-generation-inference:3.0.0
Information
Tasks
Reproduction
Run tgi docker container
Send request
Expected behavior
The chat API is expected to return multiple responses when
n > 1
.The text was updated successfully, but these errors were encountered: