-
Notifications
You must be signed in to change notification settings - Fork 63
Open
Description
For a 1k data set, when batchsize=16, the duration is >13h. But llm-compressor awq calibration takes <1h. May I ask what is the reason?
I think one possible reason is that the performance of qwen2.5vl attn backend using sdpa is slower. Are there other factors?
Hardware platform: H20 * 1
input token numbers (text+image) is ≈ 200
awq yaml config:
base:
seed: &seed 42
model:
type: Qwen2_5VL
path: xxx
tokenizer_mode: slow
torch_dtype: auto
calib:
name: custom_mm
download: False
path: xxx
apply_chat_template: True
n_samples: 960
bs: 16
seq_len: 512
padding: True
seed: *seed
quant:
method: Awq
weight:
bit: 4
symmetric: False
granularity: per_group
group_size: 64
# Available options: ['gemm_pack']
pack_version: gemm_pack
special:
trans: True
trans_version: v2
weight_clip: True
do_gqa_trans: True
quant_out: False
save:
save_mlcllm: True
save_fake: True
save_path: xxx
Metadata
Metadata
Assignees
Labels
No labels