Skip to content

Conversation

@HollowMan6
Copy link
Contributor

@HollowMan6 HollowMan6 commented Dec 7, 2025

What does this PR do ?

Fix the following error:

  File "megatron/core/transformer/transformer_layer.py", line 455, in forward
    hidden_states, context = self._forward_attention(*args, **kwargs)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/transformer_layer.py", line 529, in _forward_attention
    attention_output_with_bias = self.self_attention(
                                 ^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/attention.py", line 768, in forward
    qkv_output = self.get_query_key_value_tensors(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/attention.py", line 1151, in get_query_key_value_tensors
    mixed_qkv, _ = self.linear_qkv(hidden_states)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Megatron-Bridge/src/megatron/bridge/peft/canonical_lora.py", line 88, in forward
    return linear_output + adapter_output, bias
           ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (2560) must match the size of tensor b (1536) at non-singleton dimension 2

Changelog

  • Use m.config.kv_channels * m.config.num_attention_heads for calculating q_out_features instead of just directly using in_features

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

✨ Presented to you with Mind Lab - A Lab for Experiential Intelligence.

Fix the following error:

```log
  File "megatron/core/transformer/transformer_layer.py", line 455, in forward
    hidden_states, context = self._forward_attention(*args, **kwargs)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/transformer_layer.py", line 529, in _forward_attention
    attention_output_with_bias = self.self_attention(
                                 ^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/attention.py", line 768, in forward
    qkv_output = self.get_query_key_value_tensors(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/attention.py", line 1151, in get_query_key_value_tensors
    mixed_qkv, _ = self.linear_qkv(hidden_states)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Megatron-Bridge/src/megatron/bridge/peft/canonical_lora.py", line 88, in forward
    return linear_output + adapter_output, bias
           ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (2560) must match the size of tensor b (1536) at non-singleton dimension 2
```

Signed-off-by: Hollow Man <[email protected]>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 7, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33
Copy link
Contributor

yaoyu-33 commented Dec 7, 2025

/ok to test 290e577

@yaoyu-33 yaoyu-33 enabled auto-merge (squash) December 7, 2025 19:15
@yaoyu-33 yaoyu-33 merged commit 322df45 into NVIDIA-NeMo:main Dec 8, 2025
47 checks passed
matthew-frank pushed a commit to matthew-frank/Megatron-Bridge-251204 that referenced this pull request Dec 8, 2025
@HollowMan6 HollowMan6 deleted the canonical_lora_q branch December 8, 2025 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants