Qwen2.5VL 3B calibration using awq takes too long

For a 1k data set, when batchsize=16, the duration is >13h. But llm-compressor awq calibration takes <1h. May I ask what is the reason?  
I think one possible reason is that the performance of qwen2.5vl attn backend using sdpa is slower. Are there other factors?

Hardware platform: H20 * 1
input token numbers (text+image) is ≈ 200

awq yaml config:

```
base:
    seed: &seed 42
model:
    type: Qwen2_5VL
    path: xxx
    tokenizer_mode: slow
    torch_dtype: auto
calib:
    name: custom_mm
    download: False
    path: xxx
    apply_chat_template: True
    n_samples: 960
    bs: 16
    seq_len: 512
    padding: True
    seed: *seed

quant:
    method: Awq
    weight:
        bit: 4
        symmetric: False
        granularity: per_group
        group_size: 64
        # Available options: ['gemm_pack']
        pack_version: gemm_pack
    special:
        trans: True
        trans_version: v2
        weight_clip: True
        do_gqa_trans: True
    quant_out: False
save:
    save_mlcllm: True
    save_fake: True
    save_path: xxx
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen2.5VL 3B calibration using awq takes too long #433

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen2.5VL 3B calibration using awq takes too long #433

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions