Skip to content

[Bug] Pad behaves differently from version 19 to 21 #26708

@ChristopherBrix

Description

@ChristopherBrix

Describe the issue

When I save a simple model using onnx opset version 20 vs 21, the only thing that changes is that the contained Pad operation switches from onnx opset verison 19 to 21 (there was no update in version 20).
However, the output of the model changes completely, which seems to be a bug.

The model looks like this:

Image

To reproduce

This script reproduces the bug:

import numpy as np
import onnx
import onnxruntime as ort
from onnx import helper, TensorProto

SHAPE = [1, 1, 2, 2]
# ---- Input Tensor ----
input_tensor = helper.make_tensor_value_info(
    'input', TensorProto.FLOAT, SHAPE
)

# ---- Initializers ----
pads = [0, 0, 0, 0, 0, 0, 1, 1]

pads_initializer = helper.make_tensor(
    name='pads_const',
    data_type=TensorProto.INT64,
    dims=[8],
    vals=pads
)

constant_value_initializer = helper.make_tensor(
    name='pad_value',
    data_type=TensorProto.FLOAT,
    dims=[],
    vals=[0.0]
)

# ---- Nodes ----
pad_node = helper.make_node(
    'Pad',
    inputs=['input', 'pads_const', 'pad_value'],
    outputs=['padded'],
    mode='constant'
)

avgpool_node = helper.make_node(
    'AveragePool',
    inputs=['padded'],
    outputs=['output'],
    kernel_shape=[2, 2]
)

# ---- Output Tensor ----
output_tensor = helper.make_tensor_value_info(
    'output', TensorProto.FLOAT, SHAPE
)

# ---- Graph ----
graph = helper.make_graph(
    [pad_node, avgpool_node],
    'PadAvgPoolModel',
    [input_tensor],
    [output_tensor],
    initializer=[pads_initializer, constant_value_initializer]
)

# ---- Model ----
def save_model(op_version: int):
    model = helper.make_model(
        graph,
        producer_name='custom-onnx-builder',
        opset_imports=[helper.make_opsetid("", op_version)]
    )

    onnx.checker.check_model(model)
    onnx.save(model, f"model_op{op_version}.onnx")

save_model(20)
save_model(21)

dummy_input = np.ones(SHAPE, dtype=np.float32)
feed = {"input": dummy_input}

model20 = onnx.load("model_op20.onnx")
model21 = onnx.load("model_op21.onnx")

def run(model_proto, feed):
    sess = ort.InferenceSession(model_proto.SerializeToString())
    all_outs = [o.name for o in sess.get_outputs()]
    outs = sess.run(all_outs, feed)
    return dict(zip(all_outs, outs))

out20 = run(model20, feed)
out21 = run(model21, feed)

print("Output of model with version 20")
print(out20["output"])
print("Output of model with version 21")
print(out21["output"])

np.testing.assert_allclose(
    actual=out21["output"],
    desired=out20["output"],
    rtol=0.0,
    atol=1e-5,
)

The model saved with version 20 (which has the Pad operation of version 19) has a different output to the model saved with version 21.

Note that I was unable to reproduce this when I replace the AveragePooling layer. However, I suspect this is because then the model is simplified even more, potentially eliminating the critical code path that triggers this bug.

Urgency

This may be critical for people using Pad. At least understanding why this difference appears would be crucial for us.

Platform

Linux

OS Version

Ubuntu 24.04.3 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.23.2

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Other / Unknown

Execution Provider Library Version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions