llm.import_ckpt cannot run directly #11756

LingxiaoShawn · 2025-01-04T02:14:13Z

When I run the following code directly, it seems there is a deadlock inside with spawn multiple processes. The follow code is exactly the same as just use llm.import_ckpt directly as a main function.

from nemo.collections import llm
import nemo_run as run

ckpt_import = run.Partial(
    llm.import_ckpt,
    model=run.Config(llm.LlamaModel, config=run.Config(llm.Llama31Config8B)),
    source=f'hf://{model_path}',
    overwrite=True,  
)
local_executor = run.LocalExecutor()
run.run(ckpt_import, executor=local_executor, direct=True)

However, if I remove direct=True, and use the local_executor to run the importer, the checkpoint can be successfully transformed.

run.run(ckpt_import, executor=local_executor) # can sucess

Can you let me know whether this behavior is expected or there is a bug within the ModelConnector?

Thank you!

The text was updated successfully, but these errors were encountered:

LingxiaoShawn added the bug Something isn't working label Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm.import_ckpt cannot run directly #11756

llm.import_ckpt cannot run directly #11756

LingxiaoShawn commented Jan 4, 2025

llm.import_ckpt cannot run directly #11756

llm.import_ckpt cannot run directly #11756

Comments

LingxiaoShawn commented Jan 4, 2025