Skip to content

Qwen2.5VL 3B calibration using awq takes too long #433

@kritohyh

Description

@kritohyh

For a 1k data set, when batchsize=16, the duration is >13h. But llm-compressor awq calibration takes <1h. May I ask what is the reason?
I think one possible reason is that the performance of qwen2.5vl attn backend using sdpa is slower. Are there other factors?

Hardware platform: H20 * 1
input token numbers (text+image) is ≈ 200

awq yaml config:

base:
    seed: &seed 42
model:
    type: Qwen2_5VL
    path: xxx
    tokenizer_mode: slow
    torch_dtype: auto
calib:
    name: custom_mm
    download: False
    path: xxx
    apply_chat_template: True
    n_samples: 960
    bs: 16
    seq_len: 512
    padding: True
    seed: *seed

quant:
    method: Awq
    weight:
        bit: 4
        symmetric: False
        granularity: per_group
        group_size: 64
        # Available options: ['gemm_pack']
        pack_version: gemm_pack
    special:
        trans: True
        trans_version: v2
        weight_clip: True
        do_gqa_trans: True
    quant_out: False
save:
    save_mlcllm: True
    save_fake: True
    save_path: xxx

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions