-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Describe the issue
When I save a simple model using onnx opset version 20 vs 21, the only thing that changes is that the contained Pad operation switches from onnx opset verison 19 to 21 (there was no update in version 20).
However, the output of the model changes completely, which seems to be a bug.
The model looks like this:
To reproduce
This script reproduces the bug:
import numpy as np
import onnx
import onnxruntime as ort
from onnx import helper, TensorProto
SHAPE = [1, 1, 2, 2]
# ---- Input Tensor ----
input_tensor = helper.make_tensor_value_info(
'input', TensorProto.FLOAT, SHAPE
)
# ---- Initializers ----
pads = [0, 0, 0, 0, 0, 0, 1, 1]
pads_initializer = helper.make_tensor(
name='pads_const',
data_type=TensorProto.INT64,
dims=[8],
vals=pads
)
constant_value_initializer = helper.make_tensor(
name='pad_value',
data_type=TensorProto.FLOAT,
dims=[],
vals=[0.0]
)
# ---- Nodes ----
pad_node = helper.make_node(
'Pad',
inputs=['input', 'pads_const', 'pad_value'],
outputs=['padded'],
mode='constant'
)
avgpool_node = helper.make_node(
'AveragePool',
inputs=['padded'],
outputs=['output'],
kernel_shape=[2, 2]
)
# ---- Output Tensor ----
output_tensor = helper.make_tensor_value_info(
'output', TensorProto.FLOAT, SHAPE
)
# ---- Graph ----
graph = helper.make_graph(
[pad_node, avgpool_node],
'PadAvgPoolModel',
[input_tensor],
[output_tensor],
initializer=[pads_initializer, constant_value_initializer]
)
# ---- Model ----
def save_model(op_version: int):
model = helper.make_model(
graph,
producer_name='custom-onnx-builder',
opset_imports=[helper.make_opsetid("", op_version)]
)
onnx.checker.check_model(model)
onnx.save(model, f"model_op{op_version}.onnx")
save_model(20)
save_model(21)
dummy_input = np.ones(SHAPE, dtype=np.float32)
feed = {"input": dummy_input}
model20 = onnx.load("model_op20.onnx")
model21 = onnx.load("model_op21.onnx")
def run(model_proto, feed):
sess = ort.InferenceSession(model_proto.SerializeToString())
all_outs = [o.name for o in sess.get_outputs()]
outs = sess.run(all_outs, feed)
return dict(zip(all_outs, outs))
out20 = run(model20, feed)
out21 = run(model21, feed)
print("Output of model with version 20")
print(out20["output"])
print("Output of model with version 21")
print(out21["output"])
np.testing.assert_allclose(
actual=out21["output"],
desired=out20["output"],
rtol=0.0,
atol=1e-5,
)
The model saved with version 20 (which has the Pad operation of version 19) has a different output to the model saved with version 21.
Note that I was unable to reproduce this when I replace the AveragePooling layer. However, I suspect this is because then the model is simplified even more, potentially eliminating the critical code path that triggers this bug.
Urgency
This may be critical for people using Pad. At least understanding why this difference appears would be crucial for us.
Platform
Linux
OS Version
Ubuntu 24.04.3 LTS
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.23.2
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Other / Unknown
Execution Provider Library Version
No response