Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mangled generation for string sequences containing<space>'m with Llama 3.1 #2927

Open
1 of 4 tasks
tomjorquera opened this issue Jan 20, 2025 · 3 comments
Open
1 of 4 tasks

Comments

@tomjorquera
Copy link

System Info

We're running TGI with Llama 3.1 8b instruct, and observed some weird values when asking the LLM to generate strings containing the combination of letters <space>'m (e.g. the string "for 'manual", used in the reproduction code).

When running client.text_generation with a prompt leading to the LLM to generate a string containing the sequence 'm, the result gets mangled, both in the tokens stream and the generated_text attributed (tested with both the sync and async version of InferenceClient).

Interestingly, the mangling is different between the twos: the tokens stream "eats" the m character, while the generated_text eats the leading space. Meaning the result from the tokens stream will be different than the one provided by generated_text (and both will be incorrect).

I suspect the issue may be linked to a special handling for I'm, as I did not reproduce the issue with other sequences 'x with x different than m.

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Running the following:

from huggingface_hub import InferenceClient

# TGI run with ghcr.io/huggingface/text-generation-inference:3.0.1
# and arguments"--model-id meta-llama/Meta-Llama-3.1-8B-Instruct --revision d04e592bb4f6aa9cfee91e2e20afa771667e1d4b --hostname 0.0.0.0 --port 8080 --quantize bitsandbytes-nf4"
endpoint = "http://localhost:8080"

client = InferenceClient(endpoint)

prompt = """Repeat the following once and exactly once:
new result for 'manual'
"""

tokens = []
for answer in client.text_generation(
    prompt,
    stream=True,
    details=True,
    max_new_tokens=6, # to limit output, same behavior without this parameter
):
    print(answer)
    if not answer.token.special:
        tokens.append(answer.token.text)

print("".join(tokens))

Will print the following output:

TextGenerationStreamOutput(index=1, token=TextGenerationStreamOutputToken(id=943, logprob=-2.9980469, special=False, text='new'), details=None, generated_text=None, top_tokens=None)
TextGenerationStreamOutput(index=2, token=TextGenerationStreamOutputToken(id=1121, logprob=-0.35864258, special=False, text=' result'), details=None, generated_text=None, top_tokens=None)
TextGenerationStreamOutput(index=3, token=TextGenerationStreamOutputToken(id=369, logprob=-0.10369873, special=False, text=' for'), details=None, generated_text=None, top_tokens=None)
TextGenerationStreamOutput(index=4, token=TextGenerationStreamOutputToken(id=364, logprob=-0.15783691, special=False, text=" '"), details=None, generated_text=None, top_tokens=None)
TextGenerationStreamOutput(index=5, token=TextGenerationStreamOutputToken(id=20310, logprob=-2.21875, special=False, text='anual'), details=None, generated_text=None, top_tokens=None)
TextGenerationStreamOutput(index=6, token=TextGenerationStreamOutputToken(id=1270, logprob=-0.85546875, special=False, text="'\n"), details=TextGenerationStreamOutputStreamDetails(finish_reason='length', generated_tokens=6, input_length=15, seed=None), generated_text="new result for'manual'\n", top_tokens=None)
new result for 'anual'

Note that print("".join(tokens)) gives the string "new result for 'anual'\n" (since tokens with index 4 and 5 are respectively " '" and 'anual'), but `generated_text in token index 6 indicates instead "new result for'manual'\n"

So the two results are inconsistent and both mangled the string in different ways (missing the m in one case, missing the space in the other).

Expected behavior

Both the tokens and the generated_text attribute should result in the same value: "new result for 'manual'\n"

@tomjorquera
Copy link
Author

Additional information:

Investigating further, I found I could reproduce part of the issue with the transformers library:

Running:

from transformers import AutoTokenizer

prompt = """for 'manual'"""

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
tokenizer.batch_decode(tokenizer([prompt])["input_ids"], skip_special_tokens=True)[0]

prints

"for'manual'"

(missing whitespace before the leading ')

Should I open a bug directly at the transformers project?

@tomjorquera
Copy link
Author

(Also, I couldn't reproduce the bug when testing with ollama/llama.cpp)

@tomjorquera
Copy link
Author

Addendum:

Tested with a string containing <space>'s and can also reproduce. E.g. asking the LLM:

Repeat the following once and exactly once:
new result for 'sales'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant