You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're running TGI with Llama 3.1 8b instruct, and observed some weird values when asking the LLM to generate strings containing the combination of letters <space>'m (e.g. the string "for 'manual", used in the reproduction code).
When running client.text_generation with a prompt leading to the LLM to generate a string containing the sequence 'm, the result gets mangled, both in the tokens stream and the generated_text attributed (tested with both the sync and async version of InferenceClient).
Interestingly, the mangling is different between the twos: the tokens stream "eats" the m character, while the generated_text eats the leading space. Meaning the result from the tokens stream will be different than the one provided by generated_text (and both will be incorrect).
I suspect the issue may be linked to a special handling for I'm, as I did not reproduce the issue with other sequences 'x with x different than m.
Information
Docker
The CLI directly
Tasks
An officially supported command
My own modifications
Reproduction
Running the following:
fromhuggingface_hubimportInferenceClient# TGI run with ghcr.io/huggingface/text-generation-inference:3.0.1# and arguments"--model-id meta-llama/Meta-Llama-3.1-8B-Instruct --revision d04e592bb4f6aa9cfee91e2e20afa771667e1d4b --hostname 0.0.0.0 --port 8080 --quantize bitsandbytes-nf4"endpoint="http://localhost:8080"client=InferenceClient(endpoint)
prompt="""Repeat the following once and exactly once:new result for 'manual'"""tokens= []
foranswerinclient.text_generation(
prompt,
stream=True,
details=True,
max_new_tokens=6, # to limit output, same behavior without this parameter
):
print(answer)
ifnotanswer.token.special:
tokens.append(answer.token.text)
print("".join(tokens))
Note that print("".join(tokens)) gives the string "new result for 'anual'\n" (since tokens with index 4 and 5 are respectively " '" and 'anual'), but `generated_text in token index 6 indicates instead "new result for'manual'\n"
So the two results are inconsistent and both mangled the string in different ways (missing the m in one case, missing the space in the other).
Expected behavior
Both the tokens and the generated_text attribute should result in the same value: "new result for 'manual'\n"
The text was updated successfully, but these errors were encountered:
System Info
We're running TGI with Llama 3.1 8b instruct, and observed some weird values when asking the LLM to generate strings containing the combination of letters
<space>'m
(e.g. the string "for 'manual", used in the reproduction code).When running
client.text_generation
with a prompt leading to the LLM to generate a string containing the sequence'm
, the result gets mangled, both in the tokens stream and thegenerated_text
attributed (tested with both the sync and async version ofInferenceClient
).Interestingly, the mangling is different between the twos: the tokens stream "eats" the
m
character, while thegenerated_text
eats the leading space. Meaning the result from the tokens stream will be different than the one provided bygenerated_text
(and both will be incorrect).I suspect the issue may be linked to a special handling for
I'm
, as I did not reproduce the issue with other sequences'x
with x different thanm
.Information
Tasks
Reproduction
Running the following:
Will print the following output:
Note that
print("".join(tokens))
gives the string "new result for 'anual'\n" (since tokens with index 4 and 5 are respectively " '" and 'anual'), but `generated_text in token index 6 indicates instead "new result for'manual'\n"So the two results are inconsistent and both mangled the string in different ways (missing the
m
in one case, missing the space in the other).Expected behavior
Both the tokens and the
generated_text
attribute should result in the same value: "new result for 'manual'\n"The text was updated successfully, but these errors were encountered: