-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
System Info
peft==0.9.0; 0.13.0; 0.17.0 (All have made attempts.)
LLM : vicuna-7b-v1.5 (https://huggingface.co/lmsys/vicuna-7b-v1.5)
Who can help?
No response
Reproduction
When I used PeftModel.from_pretrained() to load the imported Lora weights, I couldn't obtain the desired results during model inference. However, by using the following method, it was possible to achieve the desired outcome.
# "q_proj,v_proj"
lora_modules = "q_proj,v_proj".split(",")
lora_config = LoraConfig(
r=128,
lora_alpha=256,
target_modules=lora_modules,
lora_dropout=0.3, # 0.3
bias="none",
task_type="CAUSAL_LM",
)
# model = get_peft_model(model, lora_config, adapter_name="default")
model = get_peft_model(model, lora_config)
from collections import OrderedDict
from safetensors import safe_open
lora_weight_path = "xxxxx/adapter_model.safetensors"
lora_state_dict = OrderedDict()
with safe_open(lora_weight_path, framework="pt", device="cpu") as f:
for key in f.keys():
value = f.get_tensor(key)
lora_state_dict[key] = value
ret = model.load_state_dict(lora_state_dict, strict=False)
print("Missing keys: \n", ret.missing_keys)
print("Unexpected keys: \n", ret.unexpected_keys)
During the test, it was also discovered that even though I correctly loaded the LoRA parameters using the aforementioned method. When I wanted to combine the LoRA parameters, I used the merge_and_unload function. However, when the final merged model was used for inference, it still did not yield the result I desired. I think this is a very serious bug.
It should be noted that if LLM is replaced with Qwen3-8B(https://huggingface.co/Qwen/Qwen3-8B), the aforementioned bug will no longer exist.
Expected behavior
work it out