Fine tuning pre-trained LLM for language translation and to build ChatGPT like application #334

krishnadasar-sudheer-kumar · 2024-08-24T03:30:19Z

krishnadasar-sudheer-kumar
Aug 24, 2024

Thank you so much for delivering a great book. I just completed Chapter 7 (yet to review the bonus material). Two prominent questions in my mind (out of many) are:

What effort is involved in fine-tuning a pre-trained LLM to perform language translation?
What effort is involved in fine-tuning a pre-trained LLM along with additional components to build a ChatGPT kind of application?

Do you recommend any blog posts or discussion threads that address these topics?

Thank you again for writing a great book!

Answered by rasbt

Aug 24, 2024

Glad you liked the book!

Regarding your first question: I depends on the LLM and languages involved, but I'd say this is relatively straightforward. The key here is that the languages have been present in the pretraining dataset to create the tokenizer and pretrained LLM. Then, finetuning it for language translation (using the technique from Ch07) is relatively easy. The reason is that otherwise the tokenizer will break up a word into too many subtokens -- it will work but it not ideal. Base models that support multiple languages are for example Qwen 2 (~20 languages) and Llama 3.1 (~8 languages). (It's also possible to also extend existing tokenizers with new tokens but this is a separat…

View full answer

rasbt · 2024-08-24T11:35:49Z

rasbt
Aug 24, 2024
Maintainer

Glad you liked the book!

Regarding your first question: I depends on the LLM and languages involved, but I'd say this is relatively straightforward. The key here is that the languages have been present in the pretraining dataset to create the tokenizer and pretrained LLM. Then, finetuning it for language translation (using the technique from Ch07) is relatively easy. The reason is that otherwise the tokenizer will break up a word into too many subtokens -- it will work but it not ideal. Base models that support multiple languages are for example Qwen 2 (~20 languages) and Llama 3.1 (~8 languages). (It's also possible to also extend existing tokenizers with new tokens but this is a separate topic I will write about some time.)

Regarding the second question, that depends on how many customers you want to serve, and it can be a bit more effort. First, if you suggest doing an alignment step -- that's usually done for safety reasons. A popular technique for this these days is DPO (see DPO bonus material here).
The other things that this involves are building the serving backend and user interface. Also, there are considerations such as dynamic batching: you want to combine multiple user request into batches for efficiency reason, for instance.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine tuning pre-trained LLM for language translation and to build ChatGPT like application #334

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Fine tuning pre-trained LLM for language translation and to build ChatGPT like application #334

krishnadasar-sudheer-kumar Aug 24, 2024

Replies: 1 comment

rasbt Aug 24, 2024 Maintainer

krishnadasar-sudheer-kumar
Aug 24, 2024

rasbt
Aug 24, 2024
Maintainer