Skip to content

Conversation

@pdabre12
Copy link
Contributor

@pdabre12 pdabre12 commented Feb 20, 2025

Description

Add a ExpressionOptimizer which delegates to the native sidecar process to evaluate expressions with Velox.

Motivation and Context

#26475 added support for an endpoint in the sidecar for constant folding expressions. This follows up on that by adding an expression interpreter to call that endpoint.

For more context: RFC-0006.

Impact

No impact by default as the old in-memory evaluation is the default.

Test Plan

Tests have been added.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Prestissimo (Native Execution) Changes
* Add a native expression optimizer for optimizing expressions in the sidecar.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Feb 20, 2025
@pdabre12 pdabre12 force-pushed the expression-optimizer branch 3 times, most recently from baa80c5 to 5bf64bf Compare March 3, 2025 21:27
@pdabre12 pdabre12 force-pushed the expression-optimizer branch from 5bf64bf to 02e81b9 Compare March 3, 2025 22:56
@pdabre12 pdabre12 force-pushed the expression-optimizer branch 2 times, most recently from 5c75b74 to fc9942d Compare April 9, 2025 19:02
@pdabre12 pdabre12 force-pushed the expression-optimizer branch 5 times, most recently from ccbf3ac to e7e90f6 Compare April 21, 2025 22:59
@pdabre12 pdabre12 force-pushed the expression-optimizer branch 2 times, most recently from dd590aa to 241f562 Compare December 4, 2025 21:21
@pdabre12 pdabre12 changed the title [WIP] Add native row expression optimizer feat(plugin-native-sidecar): Add native row expression optimizer Dec 4, 2025
@pdabre12 pdabre12 force-pushed the expression-optimizer branch from 241f562 to 9b1e4f6 Compare December 8, 2025 19:26
@pdabre12 pdabre12 force-pushed the expression-optimizer branch from 9b1e4f6 to 47fe096 Compare December 8, 2025 19:37
@pdabre12 pdabre12 marked this pull request as ready for review December 8, 2025 19:48
@prestodb-ci prestodb-ci requested review from a team and removed request for a team December 8, 2025 19:48
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 8, 2025

Reviewer's Guide

Introduces a native expression optimizer that delegates row expression optimization to the native sidecar over HTTP, wires it into the ExpressionOptimizerManager and plugin infrastructure, and refactors tests and JSON/serde plumbing to use a reusable RowExpressionSerde and the NativeSidecarExpressionInterpreter instead of manual HTTP/JSON handling.

Class diagram for native expression optimizer and serde

classDiagram
    class ExpressionOptimizerManager {
        <<service>>
        - NodeManager nodeManager
        - FunctionAndTypeManager functionAndTypeManager
        - RowExpressionSerde rowExpressionSerde
        - StandardFunctionResolution functionResolution
        - File configurationDirectory
        - Map~String,ExpressionOptimizerFactory~ expressionOptimizerFactories
        - Map~String,ExpressionOptimizer~ expressionOptimizers
        + ExpressionOptimizerManager(PluginNodeManager nodeManager, FunctionAndTypeManager functionAndTypeManager, RowExpressionSerde rowExpressionSerde)
        + loadExpressionOptimizerFactories()
        + loadExpressionOptimizerFactory(File configurationFile)
        + loadExpressionOptimizerFactory(String factoryName, String optimizerName, Map~String,String~ properties)
        + addExpressionOptimizerFactory(ExpressionOptimizerFactory factory)
    }

    class ExpressionOptimizerContext {
        - NodeManager nodeManager
        - RowExpressionSerde rowExpressionSerde
        - FunctionMetadataManager functionMetadataManager
        - StandardFunctionResolution functionResolution
        + ExpressionOptimizerContext(NodeManager nodeManager, RowExpressionSerde rowExpressionSerde, FunctionMetadataManager functionMetadataManager, StandardFunctionResolution functionResolution)
        + getNodeManager() NodeManager
        + getRowExpressionSerde() RowExpressionSerde
        + getFunctionMetadataManager() FunctionMetadataManager
        + getFunctionResolution() StandardFunctionResolution
    }

    class RowExpressionSerde {
        <<interface>>
        + serialize(RowExpression expression) String
        + deserialize(String value) RowExpression
    }

    class JsonCodecRowExpressionSerde {
        - JsonCodec~RowExpression~ codec
        + JsonCodecRowExpressionSerde(JsonCodec~RowExpression~ codec)
        + serialize(RowExpression expression) String
        + deserialize(String data) RowExpression
    }

    class NativeExpressionOptimizerFactory {
        <<factory>>
        - ClassLoader classLoader
        + NativeExpressionOptimizerFactory(ClassLoader classLoader)
        + getName() String
        + createOptimizer(Map~String,String~ config, ExpressionOptimizerContext context) ExpressionOptimizer
    }

    class NativeExpressionsModule {
        <<module>>
        - NodeManager nodeManager
        - RowExpressionSerde rowExpressionSerde
        - FunctionMetadataManager functionMetadataManager
        - StandardFunctionResolution functionResolution
        + NativeExpressionsModule(NodeManager nodeManager, RowExpressionSerde rowExpressionSerde, FunctionMetadataManager functionMetadataManager, StandardFunctionResolution functionResolution)
        + configure(Binder binder)
    }

    class NativeExpressionOptimizer {
        <<service>>
        - FunctionMetadataManager functionMetadataManager
        - StandardFunctionResolution resolution
        - NativeSidecarExpressionInterpreter rowExpressionInterpreterService
        + NativeExpressionOptimizer(NativeSidecarExpressionInterpreter rowExpressionInterpreterService, FunctionMetadataManager functionMetadataManager, StandardFunctionResolution resolution)
        + optimize(RowExpression expression, ExpressionOptimizer.Level level, ConnectorSession session, Function~VariableReferenceExpression,Object~ variableResolver) RowExpression
    }

    class NativeSidecarExpressionInterpreter {
        <<service>>
        + PRESTO_TIME_ZONE_HEADER
        + PRESTO_USER_HEADER
        + PRESTO_EXPRESSION_OPTIMIZER_LEVEL_HEADER
        - NodeManager nodeManager
        - HttpClient httpClient
        - JsonCodec~List~RowExpression~~ rowExpressionCodec
        - JsonCodec~List~RowExpressionOptimizationResult~~ rowExpressionOptimizationResultJsonCodec
        + NativeSidecarExpressionInterpreter(HttpClient httpClient, NodeManager nodeManager, JsonCodec~List~RowExpressionOptimizationResult~~ rowExpressionOptimizationResultJsonCodec, JsonCodec~List~RowExpression~~ rowExpressionCodec)
        + optimizeBatch(ConnectorSession session, Map~RowExpression,RowExpression~ expressions, ExpressionOptimizer.Level level) Map~RowExpression,RowExpression~
        + optimize(ConnectorSession session, ExpressionOptimizer.Level level, List~RowExpression~ resolvedExpressions) List~RowExpressionOptimizationResult~
    }

    class RowExpressionSerializer {
        <<json-serializer>>
        - RowExpressionSerde rowExpressionSerde
        + RowExpressionSerializer(RowExpressionSerde rowExpressionSerde)
        + serialize(RowExpression rowExpression, JsonGenerator jsonGenerator, SerializerProvider serializerProvider)
        + serializeWithType(RowExpression rowExpression, JsonGenerator jsonGenerator, SerializerProvider serializerProvider, TypeSerializer typeSerializer)
    }

    class RowExpressionDeserializer {
        <<json-deserializer>>
        - RowExpressionSerde rowExpressionSerde
        + RowExpressionDeserializer(RowExpressionSerde rowExpressionSerde)
        + deserialize(JsonParser jsonParser, DeserializationContext context) RowExpression
        + deserializeWithType(JsonParser jsonParser, DeserializationContext context, TypeDeserializer typeDeserializer) RowExpression
    }

    class NativeSidecarPlugin {
        + getExpressionOptimizerFactories() Iterable~ExpressionOptimizerFactory~
    }

    ExpressionOptimizerManager --> ExpressionOptimizerContext : uses to
    ExpressionOptimizerManager --> RowExpressionSerde : depends on
    ExpressionOptimizerContext --> RowExpressionSerde : holds

    RowExpressionSerde <|.. JsonCodecRowExpressionSerde

    NativeSidecarPlugin --> NativeExpressionOptimizerFactory : registers

    NativeExpressionOptimizerFactory --> ExpressionOptimizerContext : uses
    NativeExpressionOptimizerFactory --> NativeExpressionsModule : creates
    NativeExpressionsModule --> NativeExpressionOptimizer : binds
    NativeExpressionsModule --> NativeSidecarExpressionInterpreter : binds
    NativeExpressionsModule --> RowExpressionSerializer : binds
    NativeExpressionsModule --> RowExpressionDeserializer : binds
    NativeExpressionsModule --> RowExpressionSerde : uses instance

    NativeExpressionOptimizer --> NativeSidecarExpressionInterpreter : uses

    RowExpressionSerializer --> RowExpressionSerde : uses
    RowExpressionDeserializer --> RowExpressionSerde : uses

    ExpressionOptimizerManager ..> NativeExpressionOptimizerFactory : via SPI registration
Loading

File-Level Changes

Change Details Files
Add a native expression optimizer pipeline backed by the sidecar and integrate it via a new ExpressionOptimizerFactory and RowExpressionSerde abstraction.
  • Introduce NativeExpressionOptimizer that collects optimizable RowExpressions, resolves variables, batches them, and uses NativeSidecarExpressionInterpreter to optimize them at different levels.
  • Create NativeSidecarExpressionInterpreter to call the sidecar /v1/expressions endpoint using Airlift HttpClient and JSON codecs, including batch optimization and error handling.
  • Add NativeExpressionOptimizerFactory and NativeExpressionsModule to bootstrap the optimizer inside the plugin using ExpressionOptimizerContext, NodeManager, RowExpressionSerde, and function metadata/resolution.
  • Add RowExpressionSerde SPI and JsonCodecRowExpressionSerde implementation so RowExpressions can be serialized generically and injected where needed (server, tests, sidecar).
  • Extend ExpressionOptimizerContext and ExpressionOptimizerManager to accept and propagate RowExpressionSerde and to expose factory-loading APIs used by tests and NativeSidecarPluginQueryRunnerUtils.
presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java
presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeSidecarExpressionInterpreter.java
presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizerFactory.java
presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionsModule.java
presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/RowExpressionSerializer.java
presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/RowExpressionDeserializer.java
presto-spi/src/main/java/com/facebook/presto/spi/RowExpressionSerde.java
presto-main-base/src/main/java/com/facebook/presto/sql/expressions/JsonCodecRowExpressionSerde.java
presto-spi/src/main/java/com/facebook/presto/spi/sql/planner/ExpressionOptimizerContext.java
presto-main-base/src/main/java/com/facebook/presto/sql/expressions/ExpressionOptimizerManager.java
Wire the native expression optimizer into server, plugin, and test infrastructure, including NativeSidecarPlugin and various QueryRunner-based tests.
  • Expose FunctionAndTypeManager.handleResolver and TestingPrestoServer.getFunctionAndTypeManager to allow building metadata and HandleJsonModule with a specific HandleResolver.
  • Update HandleJsonModule to optionally accept an injected HandleResolver instance instead of always binding a singleton.
  • Bind RowExpressionSerde as JsonCodecRowExpressionSerde in ServerMainModule and PrestoSparkModule and register RowExpression JSON codecs where required.
  • Extend NativeSidecarPlugin to provide ExpressionOptimizerFactory instances (NativeExpressionOptimizerFactory) and configure the native optimizer in NativeSidecarPluginQueryRunnerUtils.
  • Introduce MetadataManager.createTestMetadataManager(FunctionAndTypeManager) helper for tests that already have a FunctionAndTypeManager.
  • Add missing or adjusted dependencies in presto-native-execution, presto-native-sidecar-plugin, and presto-tests poms needed for JSON, JAX-RS, and ignored test deps.
presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/NativeSidecarPlugin.java
presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/NativeSidecarPluginQueryRunner.java
presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/NativeSidecarPluginQueryRunnerUtils.java
presto-main/src/main/java/com/facebook/presto/server/ServerMainModule.java
presto-spark-base/src/main/java/com/facebook/presto/spark/PrestoSparkModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/HandleJsonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/FunctionAndTypeManager.java
presto-main/src/main/java/com/facebook/presto/server/testing/TestingPrestoServer.java
presto-main-base/src/main/java/com/facebook/presto/metadata/MetadataManager.java
presto-native-execution/pom.xml
presto-native-sidecar-plugin/pom.xml
presto-tests/pom.xml
Refactor native sidecar expression tests to use the new NativeSidecarExpressionInterpreter and DistributedQueryRunner instead of manually launching a sidecar process and handling HTTP/JSON.
  • Change TestNativeExpressionInterpreter to obtain a DistributedQueryRunner from NativeSidecarPluginQueryRunner, build MetadataManager and TestingRowExpressionTranslator from the coordinator’s FunctionAndTypeManager, and create a NativeSidecarExpressionInterpreter via Guice instead of spinning up an external sidecar process.
  • Replace manual HTTP client calls and low-level JSON parsing in tests with direct calls to NativeSidecarExpressionInterpreter.optimize / optimizeBatch and assertions on RowExpressionOptimizationResult.
  • Adjust test expectations around function namespace changes (e.g., fail() functions now under native.default) and use Closeables.closeAllRuntimeException to tear down resources.
  • Expose NativeSidecarPluginQueryRunner.getQueryRunner() helper used by tests.
presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java
presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/NativeSidecarPluginQueryRunner.java
Update planning, optimizer, and test utilities to construct ExpressionOptimizerManager with the new RowExpressionSerde dependency.
  • Update LocalQueryRunner, AbstractTestQueryFramework, OptimizerAssert, TestSimplifyRowExpressions, TestExpressionOptimizerManager, and native execution tests to pass a JsonCodecRowExpressionSerde when constructing ExpressionOptimizerManager.
  • Ensure JSON codecs for RowExpression are bound where these managers are created in tests so RowExpressionSerde works correctly.
presto-main-base/src/main/java/com/facebook/presto/testing/LocalQueryRunner.java
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestQueryFramework.java
presto-main-base/src/test/java/com/facebook/presto/sql/planner/assertions/OptimizerAssert.java
presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestSimplifyRowExpressions.java
presto-main-base/src/test/java/com/facebook/presto/sql/expressions/TestExpressionOptimizerManager.java
presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeBuiltInFunctions.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • NativeSidecarExpressionInterpreter.optimizeBatch currently assumes every RowExpressionOptimizationResult contains a non-null optimizedExpression and ignores expressionFailureInfo; consider explicitly validating each result and propagating failures (e.g., via a PrestoException with the sidecar error message) instead of risking NPEs or silent partial failures.
  • The RowExpression serialization setup is now split between JsonCodecRowExpressionSerde and the custom RowExpressionSerializer/RowExpressionDeserializer in NativeExpressionsModule; it would be cleaner and less error-prone to have a single canonical RowExpressionSerde implementation wired through Guice and reused across server and sidecar/plugin code rather than duplicating JSON wiring.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- NativeSidecarExpressionInterpreter.optimizeBatch currently assumes every RowExpressionOptimizationResult contains a non-null optimizedExpression and ignores expressionFailureInfo; consider explicitly validating each result and propagating failures (e.g., via a PrestoException with the sidecar error message) instead of risking NPEs or silent partial failures.
- The RowExpression serialization setup is now split between JsonCodecRowExpressionSerde and the custom RowExpressionSerializer/RowExpressionDeserializer in NativeExpressionsModule; it would be cleaner and less error-prone to have a single canonical RowExpressionSerde implementation wired through Guice and reused across server and sidecar/plugin code rather than duplicating JSON wiring.

## Individual Comments

### Comment 1
<location> `presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java:201-210` </location>
<code_context>
+            // If the special form is COALESCE, then we can optimize it if there are any duplicate arguments
</code_context>

<issue_to_address>
**issue (bug_risk):** COALESCE optimization condition is always true and will mark all COALESCE forms as optimizable.

In `visitSpecialForm` the COALESCE branch currently does:

```java
ImmutableSet.Builder<RowExpression> builder = ImmutableSet.builder();
...
boolean canBeOptimized = builder.build().size() <= node.getArguments().size() || node.getArguments().size() <= 1;
```
Since a set’s size is always `<=` the original list’s size, this condition is always true, so non-optimizable expressions (including those with non-deterministic children) may be incorrectly treated as optimizable.

To match the comment about duplicates, you likely want to detect whether any duplicates were actually removed, e.g.:

```java
Set<RowExpression> uniqueOptimizableArgs = builder.build();
boolean hasDuplicates = uniqueOptimizableArgs.size() < node.getArguments().size();
boolean canBeOptimized = hasDuplicates || node.getArguments().size() <= 1;
```
This limits optimization to cases with duplicates or a single argument and avoids over-eager constant folding.
</issue_to_address>

### Comment 2
<location> `presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java:191-193` </location>
<code_context>
+            // will stop evaluating arguments as soon as it finds a false argument.  Because a sub-expression could be simplified into a constant, and this
+            // constant could cause the expression to short circuit, if there is at least one argument which is optimizable, then the entire expression should
+            // be sent to the sidecar to be optimized.
+            boolean anyArgumentsOptimizable = node.getArguments().stream()
+                    .peek(child -> child.accept(this, context))
+                    .reduce(false, (a, b) -> canBeOptimized(b) || a, (a, b) -> a || b);
+
+            // If all arguments are constant foldable, then the whole expression is constant foldable
</code_context>

<issue_to_address>
**suggestion:** The reduction logic over arguments is hard to read and mixes traversal with state updates.

`visitSpecialForm` uses `peek(child -> child.accept(this, context))` plus a `reduce` that depends on `canBeOptimized(b)` being updated during `accept`, which is order-dependent and non-obvious:

```java
boolean anyArgumentsOptimizable = node.getArguments().stream()
        .peek(child -> child.accept(this, context))
        .reduce(false, (a, b) -> canBeOptimized(b) || a, (a, b) -> a || b);
```
A straightforward loop would make the side effects and condition explicit:

```java
boolean anyArgumentsOptimizable = false;
for (RowExpression arg : node.getArguments()) {
    arg.accept(this, context);
    if (canBeOptimized(arg)) {
        anyArgumentsOptimizable = true;
    }
}
```
The same approach would simplify the `visitCall` logic for `allConstantFoldable` as well.

Suggested implementation:

```java
            // Most special form expressions short circuit, meaning that they potentially don't evaluate all arguments. For example, the AND expression
            // will stop evaluating arguments as soon as it finds a false argument. Because a sub-expression could be simplified into a constant, and this
            // constant could cause the expression to short circuit, if there is at least one argument which is optimizable, then the entire expression should
            // be sent to the sidecar to be optimized.
            boolean anyArgumentsOptimizable = false;
            for (RowExpression argument : node.getArguments()) {
                // Visit the child to allow state (e.g., canBeOptimized) to be updated
                argument.accept(this, context);
                if (canBeOptimized(argument)) {
                    anyArgumentsOptimizable = true;
                }
            }

```

You mentioned the same issue exists in `visitCall` for the `allConstantFoldable` logic. To keep traversal and state updates explicit and consistent, refactor that code similarly:

1. Replace any `stream()` / `peek()` / `reduce()` combination that both traverses arguments and computes `allConstantFoldable` with a simple `for` loop:
   - Initialize `boolean allConstantFoldable = true;`
   - Loop over each `RowExpression` argument.
   - Call `argument.accept(this, context);`
   - Update `allConstantFoldable &= isConstantFoldable(argument);` (or equivalent).
2. Ensure, as with `visitSpecialForm`, that you do not short-circuit the loop if later traversal has important side effects on the visitor's state.
</issue_to_address>

### Comment 3
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/expressions/JsonCodecRowExpressionSerde.java:38-46` </location>
<code_context>
+    }
+
+    @Override
+    public String serialize(RowExpression expression)
+    {
+        return new String(codec.toBytes(expression), StandardCharsets.UTF_8);
+    }
+
+    @Override
+    public RowExpression deserialize(String data)
+    {
+        return codec.fromBytes(data.getBytes(StandardCharsets.UTF_8));
+    }
+}
</code_context>

<issue_to_address>
**suggestion (performance):** Avoid unnecessary byte↔string conversions in JsonCodecRowExpressionSerde.

This implementation does an unnecessary UTF-8 round trip:

```java
public String serialize(RowExpression expression)
{
    return new String(codec.toBytes(expression), StandardCharsets.UTF_8);
}

public RowExpression deserialize(String data)
{
    return codec.fromBytes(data.getBytes(StandardCharsets.UTF_8));
}
```

`JsonCodec` already exposes string-based APIs, so you can avoid extra allocations and encoding/decoding by using them directly:

```java
public String serialize(RowExpression expression)
{
    return codec.toJson(expression);
}

public RowExpression deserialize(String data)
{
    return codec.fromJson(data);
}
```

Suggested implementation:

```java
    @Override
    public String serialize(RowExpression expression)
    {
        return codec.toJson(expression);
    }

    @Override
    public RowExpression deserialize(String data)
    {
        return codec.fromJson(data);
    }

```

If `java.nio.charset.StandardCharsets` is imported at the top of this file and is not used elsewhere, remove that import to avoid an unused import warning.
</issue_to_address>

### Comment 4
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:351-353` </location>
<code_context>
-            log.error(e, "Failed to decode RowExpression from sidecar response: %s.", e.getMessage());
-            throw new RuntimeException(e);
-        }
+        RowExpressionOptimizationResult response = optimize(rowExpression, ExpressionOptimizer.Level.EVALUATED);
+        assertNotNull(response.getExpressionFailureInfo().getMessage());
+        assertTrue(response.getExpressionFailureInfo().getMessage().contains(errorMessage), format("Sidecar response: %s did not contain expected error message: %s.", response, errorMessage));
     }

</code_context>

<issue_to_address>
**suggestion (testing):** The failure-path test no longer asserts that `optimizedExpression` is null when an error is returned

In the refactored test we no longer verify that failures return no optimized expression. To preserve that contract, please also assert that `response.getOptimizedExpression()` is null (or equivalent) in this failure-path test.

Suggested implementation:

```java
        RowExpressionOptimizationResult response = optimize(rowExpression, ExpressionOptimizer.Level.EVALUATED);
        assertNotNull(response.getExpressionFailureInfo().getMessage());
        assertTrue(
                response.getExpressionFailureInfo().getMessage().contains(errorMessage),
                format("Sidecar response: %s did not contain expected error message: %s.", response, errorMessage));
        assertNull(
                response.getOptimizedExpression(),
                format("Expected no optimized expression when an error occurs. Sidecar response: %s", response));

```

If `assertNull` is not already statically imported in this test file, add it alongside the other assertion imports, e.g.:
`import static org.testng.Assert.assertNull;`
(or the equivalent assertion library used in the rest of the file).
</issue_to_address>

### Comment 5
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:388-392` </location>
<code_context>
-            log.error(e, "Failed to decode RowExpression from sidecar response: %s.", e.getMessage());
-            throw new RuntimeException(e);
-        }
+        RowExpressionOptimizationResult response = optimize(expression, level);

-        return result;
+        assert response.getExpressionFailureInfo().getMessage() != null;
+        assertTrue(response.getExpressionFailureInfo().getMessage().isEmpty());
+        return response.getOptimizedExpression();
     }

</code_context>

<issue_to_address>
**issue (testing):** Avoid using Java `assert` for test conditions; use TestNG assertions instead

In `optimizeRowExpression`, the non-null check

```java
assert response.getExpressionFailureInfo().getMessage() != null;
```

relies on the Java `assert` keyword, which is disabled by default and won’t fail the test unless run with `-ea`. Please replace this with a TestNG assertion (e.g. `assertNotNull(response.getExpressionFailureInfo().getMessage())`) so the check is always enforced in CI.

The subsequent `assertTrue(response.getExpressionFailureInfo().getMessage().isEmpty());` is fine as-is since it already uses TestNG.
</issue_to_address>

### Comment 6
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:84-88` </location>
<code_context>
-    public void tearDown()
-    {
-        sidecar.destroyForcibly();
+        DistributedQueryRunner queryRunner = NativeSidecarPluginQueryRunner.getQueryRunner();
+        FunctionAndTypeManager functionAndTypeManager = queryRunner.getCoordinator().getFunctionAndTypeManager();
+        this.metadata = createTestMetadataManager(functionAndTypeManager);
+        this.translator = new TestingRowExpressionTranslator(metadata);
+        this.rowExpressionInterpreter = getRowExpressionInterpreter(functionAndTypeManager, queryRunner.getCoordinator().getPluginNodeManager());
+        this.visitor = new TestVisitor();
     }
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Consider adding teardown to close `DistributedQueryRunner` and native sidecar resources created in the test

Previously the test explicitly started and stopped a sidecar `Process` in `@BeforeClass`/`@AfterClass`. Now the `DistributedQueryRunner` is created in the constructor via `NativeSidecarPluginQueryRunner.getQueryRunner()` but never closed. Please add an `@AfterClass` (or equivalent) to close the `DistributedQueryRunner` and its sidecar resources to avoid leaking threads/ports between tests.

Suggested implementation:

```java
        this.queryRunner = NativeSidecarPluginQueryRunner.getQueryRunner();

```

To fully implement the teardown and avoid leaking the `DistributedQueryRunner` and sidecar resources, you should also:

1. Add a field to the test class to hold the runner:
   - Near the top of `TestNativeExpressionInterpreter`, add:
     - `private DistributedQueryRunner queryRunner;`

2. Add an `@AfterClass` teardown method that closes the runner:
   - Import the close helper:
     - `import static io.airlift.testing.Closeables.closeAllRuntimeException;`
   - Add a method in the class:
     - 
     ```java
     @AfterClass(alwaysRun = true)
     public void tearDown()
     {
         closeAllRuntimeException(queryRunner);
     }
     ```

3. Ensure `@AfterClass` is from the same test framework used in this file (Presto usually uses TestNG: `org.testng.annotations.AfterClass`). If the existing setup uses `@AfterMethod`/`@AfterSuite` or JUnit, mirror that instead.

This will ensure the `DistributedQueryRunner` and its sidecar are properly closed between test runs.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@tdcmeehan tdcmeehan self-assigned this Dec 8, 2025
@pdabre12
Copy link
Contributor Author

pdabre12 commented Dec 8, 2025

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and found some issues that need to be addressed.

  • In NativeExpressionOptimizer.optimize/NativeSidecarExpressionInterpreter.optimizeBatch, sidecar failures (non-empty expressionFailureInfo) are ignored and the optimizedExpression is blindly used; consider checking expressionFailureInfo and either skipping the replacement (fall back to the original RowExpression) or surfacing a PrestoException to avoid introducing null/invalid expressions into planning.
  • NativeExpressionsModule re-installs JsonModule and binds custom RowExpression (de)serializers and list codecs even though a RowExpressionSerde is already provided via the context; consider relying solely on the injected RowExpressionSerde and avoiding duplicate Jackson bindings to reduce configuration complexity and the risk of conflicts.
  • In TestNativeExpressionInterpreter.optimizeRowExpression/assertEvaluateFails, the first assert response.getExpressionFailureInfo().getMessage() != null; uses a raw Java assert and is redundant with the TestNG assertions; for consistency and to avoid dependency on -ea, replace or remove it in favor of TestNG’s Assert methods.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In NativeExpressionOptimizer.optimize/NativeSidecarExpressionInterpreter.optimizeBatch, sidecar failures (non-empty expressionFailureInfo) are ignored and the optimizedExpression is blindly used; consider checking expressionFailureInfo and either skipping the replacement (fall back to the original RowExpression) or surfacing a PrestoException to avoid introducing null/invalid expressions into planning.
- NativeExpressionsModule re-installs JsonModule and binds custom RowExpression (de)serializers and list codecs even though a RowExpressionSerde is already provided via the context; consider relying solely on the injected RowExpressionSerde and avoiding duplicate Jackson bindings to reduce configuration complexity and the risk of conflicts.
- In TestNativeExpressionInterpreter.optimizeRowExpression/assertEvaluateFails, the first `assert response.getExpressionFailureInfo().getMessage() != null;` uses a raw Java assert and is redundant with the TestNG assertions; for consistency and to avoid dependency on `-ea`, replace or remove it in favor of TestNG’s Assert methods.

## Individual Comments

### Comment 1
<location> `presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionsModule.java:55-56` </location>
<code_context>
+        binder.bind(FunctionMetadataManager.class).toInstance(functionMetadataManager);
+        binder.bind(StandardFunctionResolution.class).toInstance(functionResolution);
+
+        // JSON dependencies and setup
+        binder.install(new JsonModule());
+        jsonBinder(binder).addDeserializerBinding(RowExpression.class).to(RowExpressionDeserializer.class).in(Scopes.SINGLETON);
+        jsonBinder(binder).addSerializerBinding(RowExpression.class).to(RowExpressionSerializer.class).in(Scopes.SINGLETON);
</code_context>

<issue_to_address>
**issue (bug_risk):** JsonModule is installed twice, which can cause duplicate Guice bindings

`NativeExpressionOptimizerFactory` already installs `JsonModule` in the `Bootstrap`, and `NativeExpressionsModule` installs it again here. Installing the same Guice module twice can cause duplicate bindings and startup failures. Since you only use the JsonBinder/JsonCodecBinder helpers in this module, remove `binder.install(new JsonModule());` and rely on the JsonModule from the top-level Bootstrap instead.
</issue_to_address>

### Comment 2
<location> `presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java:80-82` </location>
<code_context>
+        Map<RowExpression, RowExpression> expressions = expressionsToOptimize.stream()
+                .collect(toMap(
+                        Function.identity(),
+                        rowExpression -> rowExpression.accept(
+                                new ReplacingVisitor(
+                                        variable -> toRowExpression(variable.getSourceLocation(), variableResolver.apply(variable), variable.getType())),
+                                null),
+                        (a, b) -> a));
</code_context>

<issue_to_address>
**issue (bug_risk):** Variable resolver null handling is inconsistent between collection and replacement, which may turn unknowns into literal NULLs

In `CollectingVisitor.visitVariableReference`, `value == null` prevents constant folding, but here `ReplacingVisitor` always calls `toRowExpression(..., variableResolver.apply(variable), ...)`. If the resolver returns null for an unresolved variable, `toRowExpression` produces a `ConstantExpression` with a null value, effectively turning an unknown variable into a literal NULL and diverging from the collection semantics. Please either enforce that the resolver never returns null (e.g., with a precondition) or keep the original `VariableReferenceExpression` when it does, to align behavior with `CollectingVisitor`.
</issue_to_address>

### Comment 3
<location> `presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeSidecarExpressionInterpreter.java:105-106` </location>
<code_context>
+                    getSidecarRequest(session, level, resolvedExpressions),
+                    createJsonResponseHandler(rowExpressionOptimizationResultJsonCodec));
+        }
+        catch (Exception e) {
+            throw new PrestoException(INVALID_ARGUMENTS, "Failed to get optimized expressions from sidecar.", e);
+        }
+        return optimizedExpressions;
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Using INVALID_ARGUMENTS for transport/sidecar failures may be misleading

Catching all HTTP exceptions and rethrowing them as INVALID_ARGUMENTS mixes user input errors with connectivity/sidecar failures. This makes infrastructure issues (e.g., sidecar down/misconfigured) look like user mistakes. Please use a more suitable error code for transport/remote failures, or at least distinguish 4xx (bad request) from connection/5xx errors when mapping to PrestoException codes.

Suggested implementation:

```java
        List<RowExpressionOptimizationResult> optimizedExpressions;
        try {
            optimizedExpressions = httpClient.execute(
                    getSidecarRequest(session, level, resolvedExpressions),
                    createJsonResponseHandler(rowExpressionOptimizationResultJsonCodec));
        }
        catch (UnexpectedResponseException e) {
            int statusCode = e.getStatusCode();
            if (statusCode >= 400 && statusCode < 500) {
                // Sidecar understood the request but rejected it: treat as invalid arguments
                throw new PrestoException(
                        INVALID_ARGUMENTS,
                        String.format("Sidecar rejected optimization request (status code %s)", statusCode),
                        e);
            }

            // Sidecar returned a server error or unexpected status: treat as remote/transport failure
            throw new PrestoException(
                    REMOTE_HOST_GONE,
                    String.format("Sidecar request failed with status code %s", statusCode),
                    e);
        }
        catch (Exception e) {
            // Connection issues, timeouts, etc. are treated as remote failures
            throw new PrestoException(REMOTE_HOST_GONE, "Failed to get optimized expressions from sidecar.", e);
        }
        return optimizedExpressions;
    }

```

To compile successfully you will also need to:
1. Add an import for `UnexpectedResponseException` from Airlift HTTP client, for example:
   `import io.airlift.http.client.UnexpectedResponseException;`
2. Add a static import for the remote error code:
   `import static com.facebook.presto.spi.StandardErrorCode.REMOTE_HOST_GONE;`
These should be placed alongside the existing import for `INVALID_ARGUMENTS` and other Airlift HTTP client imports.
</issue_to_address>

### Comment 4
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:360-362` </location>
<code_context>
-            log.error(e, "Failed to decode RowExpression from sidecar response: %s.", e.getMessage());
-            throw new RuntimeException(e);
-        }
+        RowExpressionOptimizationResult response = optimize(rowExpression, ExpressionOptimizer.Level.EVALUATED);
+        assertNotNull(response.getExpressionFailureInfo().getMessage());
+        assertTrue(response.getExpressionFailureInfo().getMessage().contains(errorMessage), format("Sidecar response: %s did not contain expected error message: %s.", response, errorMessage));
     }

</code_context>

<issue_to_address>
**suggestion (testing):** Strengthen the failure-path assertion to verify that no optimized expression is returned when evaluation fails

This test now validates the failure message via `RowExpressionOptimizationResult`, but it doesn’t check that `optimizedExpression` is null. To preserve the original failure semantics and catch regressions where a failure is returned alongside an optimized expression, please also assert `assertNull(response.getOptimizedExpression(), ...)` here.

Suggested implementation:

```java
        RowExpressionOptimizationResult response = optimize(rowExpression, ExpressionOptimizer.Level.EVALUATED);
        assertNotNull(response.getExpressionFailureInfo().getMessage());
        assertNull(response.getOptimizedExpression(), format("Expected no optimized expression when evaluation fails, but got: %s", response.getOptimizedExpression()));
        assertTrue(
                response.getExpressionFailureInfo().getMessage().contains(errorMessage),
                format("Sidecar response: %s did not contain expected error message: %s.", response, errorMessage));
    }

```

If `assertNull` is not already statically imported in this test file, add it alongside the other TestNG assertions, for example:
- `import static org.testng.Assert.assertNull;`
</issue_to_address>

### Comment 5
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:397-401` </location>
<code_context>
+        RowExpressionOptimizationResult response = optimize(expression, level);
</code_context>

<issue_to_address>
**suggestion (testing):** Use stronger assertions on the optimization result and ensure the optimized expression is non-null

In `optimizeRowExpression`, consider replacing the bare `assert` on `response.getExpressionFailureInfo().getMessage()` with `assertNotNull(...)` so the check still runs when JVM assertions are disabled. Also, add an `assertNotNull(response.getOptimizedExpression(), ...)` before returning it to guarantee that a successful optimization always produces a non-null expression.

Suggested implementation:

```java
import org.testng.annotations.Test;

import static org.testng.Assert.assertNotNull;

```

```java
        expression = expression.accept(visitor, null);
        RowExpressionOptimizationResult response = optimize(expression, level);
        assertNotNull(
                response.getOptimizedExpression(),
                "Optimized expression must not be null for a successful optimization");

```

```java
        assertNotNull(
                response.getExpressionFailureInfo().getMessage(),
                "Expected non-null failure message for failed optimization");

```

The exact `SEARCH` pattern for the bare `assert` on `response.getExpressionFailureInfo().getMessage()` may differ slightly in your file (e.g., it may include additional conditions or message text). If the search does not match, adjust that block so that the entire original `assert` line (or block) is replaced with the `assertNotNull(...)` call shown above while preserving any additional checks you still need.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +55 to +56
// JSON dependencies and setup
binder.install(new JsonModule());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): JsonModule is installed twice, which can cause duplicate Guice bindings

NativeExpressionOptimizerFactory already installs JsonModule in the Bootstrap, and NativeExpressionsModule installs it again here. Installing the same Guice module twice can cause duplicate bindings and startup failures. Since you only use the JsonBinder/JsonCodecBinder helpers in this module, remove binder.install(new JsonModule()); and rely on the JsonModule from the top-level Bootstrap instead.

Comment on lines +80 to +82
rowExpression -> rowExpression.accept(
new ReplacingVisitor(
variable -> toRowExpression(variable.getSourceLocation(), variableResolver.apply(variable), variable.getType())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Variable resolver null handling is inconsistent between collection and replacement, which may turn unknowns into literal NULLs

In CollectingVisitor.visitVariableReference, value == null prevents constant folding, but here ReplacingVisitor always calls toRowExpression(..., variableResolver.apply(variable), ...). If the resolver returns null for an unresolved variable, toRowExpression produces a ConstantExpression with a null value, effectively turning an unknown variable into a literal NULL and diverging from the collection semantics. Please either enforce that the resolver never returns null (e.g., with a precondition) or keep the original VariableReferenceExpression when it does, to align behavior with CollectingVisitor.

Comment on lines +105 to +106
catch (Exception e) {
throw new PrestoException(INVALID_ARGUMENTS, "Failed to get optimized expressions from sidecar.", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Using INVALID_ARGUMENTS for transport/sidecar failures may be misleading

Catching all HTTP exceptions and rethrowing them as INVALID_ARGUMENTS mixes user input errors with connectivity/sidecar failures. This makes infrastructure issues (e.g., sidecar down/misconfigured) look like user mistakes. Please use a more suitable error code for transport/remote failures, or at least distinguish 4xx (bad request) from connection/5xx errors when mapping to PrestoException codes.

Suggested implementation:

        List<RowExpressionOptimizationResult> optimizedExpressions;
        try {
            optimizedExpressions = httpClient.execute(
                    getSidecarRequest(session, level, resolvedExpressions),
                    createJsonResponseHandler(rowExpressionOptimizationResultJsonCodec));
        }
        catch (UnexpectedResponseException e) {
            int statusCode = e.getStatusCode();
            if (statusCode >= 400 && statusCode < 500) {
                // Sidecar understood the request but rejected it: treat as invalid arguments
                throw new PrestoException(
                        INVALID_ARGUMENTS,
                        String.format("Sidecar rejected optimization request (status code %s)", statusCode),
                        e);
            }

            // Sidecar returned a server error or unexpected status: treat as remote/transport failure
            throw new PrestoException(
                    REMOTE_HOST_GONE,
                    String.format("Sidecar request failed with status code %s", statusCode),
                    e);
        }
        catch (Exception e) {
            // Connection issues, timeouts, etc. are treated as remote failures
            throw new PrestoException(REMOTE_HOST_GONE, "Failed to get optimized expressions from sidecar.", e);
        }
        return optimizedExpressions;
    }

To compile successfully you will also need to:

  1. Add an import for UnexpectedResponseException from Airlift HTTP client, for example:
    import io.airlift.http.client.UnexpectedResponseException;
  2. Add a static import for the remote error code:
    import static com.facebook.presto.spi.StandardErrorCode.REMOTE_HOST_GONE;
    These should be placed alongside the existing import for INVALID_ARGUMENTS and other Airlift HTTP client imports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants