You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that if i want to load a base model with an adapter and consume it, i'll have to use the generate route only which allows specifying adapter_id
`curl 127.0.0.1:3000/generate
-X POST
-H 'Content-Type: application/json'
-d '{
"inputs": "Was "The office" the funniest tv series ever?",
"parameters": {
"max_new_tokens": 200,
"adapter_id": "tv_knowledge_id"
}
}'
but can't use v1/chat/completions
are you planing to support this?
Motivation
Many use v1/chat/completions and train lora adapters for it
Your contribution
Maybe, if you're over your capacity
The text was updated successfully, but these errors were encountered:
Hi here @tsvisab thanks for the question, indeed that's supported via the model parameter, if you provide the adapter_id as the model whenever you send the request, then it will use the loaded LoRA adapter instead of the base model. Anyway if that didn't work for you, happy to reproduce your test and see if we can fix it 🤗 (See an example cURL request)
curl http://localhost:8080/v1/chat/completions \
-X POST \
-d '{"messages":[{"role":"user","content":"What is Deep Learning?"}],"temperature":0.7,"top_p":0.95,"max_tokens":256,"model":"your-username/your-lora-adapter"}}' \
-H 'Content-Type: application/json'
And to send requests to the base model instead just remove the model or set it to the actual model value e.g. meta-llama/Llama-3.1-8B-Instruct
curl http://localhost:8080/v1/chat/completions \
-X POST \
-d '{"messages":[{"role":"user","content":"What is Deep Learning?"}],"temperature":0.7,"top_p":0.95,"max_tokens":256,"model":"meta-llama/Llama-3.1-8B-Instruct"}}' \
-H 'Content-Type: application/json'
Feature request
It seems that if i want to load a base model with an adapter and consume it, i'll have to use the
generate
route only which allows specifyingadapter_id
`curl 127.0.0.1:3000/generate
-X POST
-H 'Content-Type: application/json'
-d '{
"inputs": "Was "The office" the funniest tv series ever?",
"parameters": {
"max_new_tokens": 200,
"adapter_id": "tv_knowledge_id"
}
}'
but can't use
v1/chat/completions
are you planing to support this?
Motivation
Many use
v1/chat/completions
and train lora adapters for itYour contribution
Maybe, if you're over your capacity
The text was updated successfully, but these errors were encountered: