Mangled generation for string sequences containing`<space>'m` with Llama 3.1 #2927

tomjorquera · 2025-01-20T13:46:08Z

System Info

We're running TGI with Llama 3.1 8b instruct, and observed some weird values when asking the LLM to generate strings containing the combination of letters <space>'m (e.g. the string "for 'manual", used in the reproduction code).

When running client.text_generation with a prompt leading to the LLM to generate a string containing the sequence 'm, the result gets mangled, both in the tokens stream and the generated_text attributed (tested with both the sync and async version of InferenceClient).

Interestingly, the mangling is different between the twos: the tokens stream "eats" the m character, while the generated_text eats the leading space. Meaning the result from the tokens stream will be different than the one provided by generated_text (and both will be incorrect).

I suspect the issue may be linked to a special handling for I'm, as I did not reproduce the issue with other sequences 'x with x different than m.

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Running the following:

from huggingface_hub import InferenceClient

# TGI run with ghcr.io/huggingface/text-generation-inference:3.0.1
# and arguments"--model-id meta-llama/Meta-Llama-3.1-8B-Instruct --revision d04e592bb4f6aa9cfee91e2e20afa771667e1d4b --hostname 0.0.0.0 --port 8080 --quantize bitsandbytes-nf4"
endpoint = "http://localhost:8080"

client = InferenceClient(endpoint)

prompt = """Repeat the following once and exactly once:
new result for 'manual'
"""

tokens = []
for answer in client.text_generation(
    prompt,
    stream=True,
    details=True,
    max_new_tokens=6, # to limit output, same behavior without this parameter
):
    print(answer)
    if not answer.token.special:
        tokens.append(answer.token.text)

print("".join(tokens))

Will print the following output:

TextGenerationStreamOutput(index=1, token=TextGenerationStreamOutputToken(id=943, logprob=-2.9980469, special=False, text='new'), details=None, generated_text=None, top_tokens=None)
TextGenerationStreamOutput(index=2, token=TextGenerationStreamOutputToken(id=1121, logprob=-0.35864258, special=False, text=' result'), details=None, generated_text=None, top_tokens=None)
TextGenerationStreamOutput(index=3, token=TextGenerationStreamOutputToken(id=369, logprob=-0.10369873, special=False, text=' for'), details=None, generated_text=None, top_tokens=None)
TextGenerationStreamOutput(index=4, token=TextGenerationStreamOutputToken(id=364, logprob=-0.15783691, special=False, text=" '"), details=None, generated_text=None, top_tokens=None)
TextGenerationStreamOutput(index=5, token=TextGenerationStreamOutputToken(id=20310, logprob=-2.21875, special=False, text='anual'), details=None, generated_text=None, top_tokens=None)
TextGenerationStreamOutput(index=6, token=TextGenerationStreamOutputToken(id=1270, logprob=-0.85546875, special=False, text="'\n"), details=TextGenerationStreamOutputStreamDetails(finish_reason='length', generated_tokens=6, input_length=15, seed=None), generated_text="new result for'manual'\n", top_tokens=None)
new result for 'anual'

Note that print("".join(tokens)) gives the string "new result for 'anual'\n" (since tokens with index 4 and 5 are respectively " '" and 'anual'), but `generated_text in token index 6 indicates instead "new result for'manual'\n"

So the two results are inconsistent and both mangled the string in different ways (missing the m in one case, missing the space in the other).

Expected behavior

Both the tokens and the generated_text attribute should result in the same value: "new result for 'manual'\n"

The text was updated successfully, but these errors were encountered:

tomjorquera · 2025-01-20T13:48:37Z

Additional information:

Investigating further, I found I could reproduce part of the issue with the transformers library:

Running:

from transformers import AutoTokenizer

prompt = """for 'manual'"""

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
tokenizer.batch_decode(tokenizer([prompt])["input_ids"], skip_special_tokens=True)[0]

prints

"for'manual'"

(missing whitespace before the leading ')

Should I open a bug directly at the transformers project?

tomjorquera · 2025-01-20T13:49:16Z

(Also, I couldn't reproduce the bug when testing with ollama/llama.cpp)

tomjorquera · 2025-01-20T14:20:24Z

Addendum:

Tested with a string containing <space>'s and can also reproduce. E.g. asking the LLM:

Repeat the following once and exactly once:
new result for 'sales'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mangled generation for string sequences containing`<space>'m` with Llama 3.1 #2927

Mangled generation for string sequences containing`<space>'m` with Llama 3.1 #2927

tomjorquera commented Jan 20, 2025

tomjorquera commented Jan 20, 2025

tomjorquera commented Jan 20, 2025

tomjorquera commented Jan 20, 2025

Mangled generation for string sequences containing<space>'m with Llama 3.1 #2927

Mangled generation for string sequences containing<space>'m with Llama 3.1 #2927

Comments

tomjorquera commented Jan 20, 2025

System Info

Information

Tasks

Reproduction

Expected behavior

tomjorquera commented Jan 20, 2025

tomjorquera commented Jan 20, 2025

tomjorquera commented Jan 20, 2025

Mangled generation for string sequences containing`<space>'m` with Llama 3.1 #2927

Mangled generation for string sequences containing`<space>'m` with Llama 3.1 #2927