[Bug] hf推理与vllm推理评测结果不一致 #1594

luhairong11 · 2024-10-09T09:44:39Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
The bug has not been fixed in the latest version.

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

eval_hf_qwen2_5_0_5b_instruct.py文件内容如下

from mmengine.config import read_base
with read_base():
from opencompass.configs.datasets.gsm8k.gsm8k_gen_3309bd import gsm8k_datasets
from opencompass.configs.models.qwen2_5.hf_qwen2_5_0_5b_instruct import models
from opencompass.configs.summarizers.example import summarizer

datasets = sum([v for k, v in locals().items() if k.endswith('_datasets') or k == 'datasets'], [])
work_dir = './outputs/hf_qwen2_5_0_5b_instruct/'

eval_vllm_qwen2_5_0_5b_instruct.py文件内容如下

from mmengine.config import read_base
with read_base():
from opencompass.configs.datasets.gsm8k.gsm8k_gen_3309bd import gsm8k_datasets
from opencompass.configs.models.qwen2_5.vllm_qwen2_5_0_5b_instruct import models
from opencompass.configs.summarizers.example import summarizer

datasets = sum([v for k, v in locals().items() if k.endswith('_datasets') or k == 'datasets'], [])
work_dir = './outputs/hf_qwen2_5_0_5b_instruct/'

hf评测命令

CUDA_VISIBLE_DEVICES=6 python3 run.py configs/eval_hf_qwen2_5_0_5b_instruct.py --debug

vllm 评测命令

CUDA_VISIBLE_DEVICES=6 python3 run.py configs/eval_vllm_qwen2_5_0_5b_instruct.py --debug

Reproduces the problem - code/configuration sample

hf评测命令

CUDA_VISIBLE_DEVICES=6 python3 run.py configs/eval_hf_qwen2_5_0_5b_instruct.py --debug

vllm 评测命令

CUDA_VISIBLE_DEVICES=6 python3 run.py configs/eval_vllm_qwen2_5_0_5b_instruct.py --debug

Reproduces the problem - command or script

hf评测命令

CUDA_VISIBLE_DEVICES=6 python3 run.py configs/eval_hf_qwen2_5_0_5b_instruct.py --debug

vllm 评测命令

CUDA_VISIBLE_DEVICES=6 python3 run.py configs/eval_vllm_qwen2_5_0_5b_instruct.py --debug

Reproduces the problem - error message

hf结果：

dataset,version,metric,mode,qwen2.5-0.5b-instruct-hf
gsm8k,3309bd,accuracy,gen,0.83

vllm 结果：

dataset,version,metric,mode,qwen2.5-0.5b-instruct-vllm
gsm8k,3309bd,accuracy,gen,1.44

Other information

No response

mm-assistant bot assigned MaiziXiao Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] hf推理与vllm推理评测结果不一致 #1594

[Bug] hf推理与vllm推理评测结果不一致 #1594

luhairong11 commented Oct 9, 2024

[Bug] hf推理与vllm推理评测结果不一致 #1594

[Bug] hf推理与vllm推理评测结果不一致 #1594

Comments

luhairong11 commented Oct 9, 2024

Prerequisite

Type

Environment

eval_hf_qwen2_5_0_5b_instruct.py文件内容如下

eval_vllm_qwen2_5_0_5b_instruct.py文件内容如下

hf评测命令

vllm 评测命令

Reproduces the problem - code/configuration sample

hf评测命令

vllm 评测命令

Reproduces the problem - command or script

hf评测命令

vllm 评测命令

Reproduces the problem - error message

hf结果：

vllm 结果：

Other information