有没有一些能够减少显存的操作，我在推理128k的needlebench时，8张A100仍然显存爆炸[Feature] #1131

1518630367 · 2024-05-09T06:52:39Z

1518630367
May 9, 2024

Describe the feature

models = [
dict(
type=HuggingFaceCausalLM,
abbr="llama-3-8b-instruct-hf",
path="/opt/218/models/Meta-Llama-3-8B-Instruct-NTK",
model_kwargs=dict(device_map="auto"),
tokenizer_kwargs=dict(
padding_side="left",
truncation_side="left",
use_fast=False,
),
meta_template=_meta_template,
max_out_len=128,
max_seq_len=122880,
batch_size=1,
run_cfg=dict(num_gpus=7, num_procs=1),
generation_kwargs={"eos_token_id": [128001, 128009]},
# batch_padding=True,
)
]

Will you implement it?

I would like to implement this feature and create a PR!

tonysy · 2024-05-09T06:53:40Z

tonysy
May 9, 2024
Maintainer

You can try LMDeploy

0 replies

AriesHaa · 2024-06-21T05:30:30Z

AriesHaa
Jun 21, 2024

可以尝试加入model_kwargs=dict(tensor_parallel_size=2, gpu_memory_utilization=0.7),在配置文件中，我加了后可以正常运行

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

有没有一些能够减少显存的操作，我在推理128k的needlebench时，8张A100仍然显存爆炸[Feature] #1131

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

有没有一些能够减少显存的操作，我在推理128k的needlebench时，8张A100仍然显存爆炸[Feature] #1131

1518630367 May 9, 2024

Describe the feature

Will you implement it?

Replies: 2 comments

tonysy May 9, 2024 Maintainer

AriesHaa Jun 21, 2024

1518630367
May 9, 2024

tonysy
May 9, 2024
Maintainer

AriesHaa
Jun 21, 2024