-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(plugin-native-sidecar): Add native row expression optimizer #24602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
baa80c5 to
5bf64bf
Compare
5bf64bf to
02e81b9
Compare
5c75b74 to
fc9942d
Compare
ccbf3ac to
e7e90f6
Compare
dd590aa to
241f562
Compare
Co-authored-by: Pratik Joseph Dabre <[email protected]>
241f562 to
9b1e4f6
Compare
Co-authored-by: Pratik Joseph Dabre <[email protected]>
Co-authored-by: Pratik Joseph Dabre <[email protected]>
9b1e4f6 to
47fe096
Compare
Reviewer's GuideIntroduces a native expression optimizer that delegates row expression optimization to the native sidecar over HTTP, wires it into the ExpressionOptimizerManager and plugin infrastructure, and refactors tests and JSON/serde plumbing to use a reusable RowExpressionSerde and the NativeSidecarExpressionInterpreter instead of manual HTTP/JSON handling. Class diagram for native expression optimizer and serdeclassDiagram
class ExpressionOptimizerManager {
<<service>>
- NodeManager nodeManager
- FunctionAndTypeManager functionAndTypeManager
- RowExpressionSerde rowExpressionSerde
- StandardFunctionResolution functionResolution
- File configurationDirectory
- Map~String,ExpressionOptimizerFactory~ expressionOptimizerFactories
- Map~String,ExpressionOptimizer~ expressionOptimizers
+ ExpressionOptimizerManager(PluginNodeManager nodeManager, FunctionAndTypeManager functionAndTypeManager, RowExpressionSerde rowExpressionSerde)
+ loadExpressionOptimizerFactories()
+ loadExpressionOptimizerFactory(File configurationFile)
+ loadExpressionOptimizerFactory(String factoryName, String optimizerName, Map~String,String~ properties)
+ addExpressionOptimizerFactory(ExpressionOptimizerFactory factory)
}
class ExpressionOptimizerContext {
- NodeManager nodeManager
- RowExpressionSerde rowExpressionSerde
- FunctionMetadataManager functionMetadataManager
- StandardFunctionResolution functionResolution
+ ExpressionOptimizerContext(NodeManager nodeManager, RowExpressionSerde rowExpressionSerde, FunctionMetadataManager functionMetadataManager, StandardFunctionResolution functionResolution)
+ getNodeManager() NodeManager
+ getRowExpressionSerde() RowExpressionSerde
+ getFunctionMetadataManager() FunctionMetadataManager
+ getFunctionResolution() StandardFunctionResolution
}
class RowExpressionSerde {
<<interface>>
+ serialize(RowExpression expression) String
+ deserialize(String value) RowExpression
}
class JsonCodecRowExpressionSerde {
- JsonCodec~RowExpression~ codec
+ JsonCodecRowExpressionSerde(JsonCodec~RowExpression~ codec)
+ serialize(RowExpression expression) String
+ deserialize(String data) RowExpression
}
class NativeExpressionOptimizerFactory {
<<factory>>
- ClassLoader classLoader
+ NativeExpressionOptimizerFactory(ClassLoader classLoader)
+ getName() String
+ createOptimizer(Map~String,String~ config, ExpressionOptimizerContext context) ExpressionOptimizer
}
class NativeExpressionsModule {
<<module>>
- NodeManager nodeManager
- RowExpressionSerde rowExpressionSerde
- FunctionMetadataManager functionMetadataManager
- StandardFunctionResolution functionResolution
+ NativeExpressionsModule(NodeManager nodeManager, RowExpressionSerde rowExpressionSerde, FunctionMetadataManager functionMetadataManager, StandardFunctionResolution functionResolution)
+ configure(Binder binder)
}
class NativeExpressionOptimizer {
<<service>>
- FunctionMetadataManager functionMetadataManager
- StandardFunctionResolution resolution
- NativeSidecarExpressionInterpreter rowExpressionInterpreterService
+ NativeExpressionOptimizer(NativeSidecarExpressionInterpreter rowExpressionInterpreterService, FunctionMetadataManager functionMetadataManager, StandardFunctionResolution resolution)
+ optimize(RowExpression expression, ExpressionOptimizer.Level level, ConnectorSession session, Function~VariableReferenceExpression,Object~ variableResolver) RowExpression
}
class NativeSidecarExpressionInterpreter {
<<service>>
+ PRESTO_TIME_ZONE_HEADER
+ PRESTO_USER_HEADER
+ PRESTO_EXPRESSION_OPTIMIZER_LEVEL_HEADER
- NodeManager nodeManager
- HttpClient httpClient
- JsonCodec~List~RowExpression~~ rowExpressionCodec
- JsonCodec~List~RowExpressionOptimizationResult~~ rowExpressionOptimizationResultJsonCodec
+ NativeSidecarExpressionInterpreter(HttpClient httpClient, NodeManager nodeManager, JsonCodec~List~RowExpressionOptimizationResult~~ rowExpressionOptimizationResultJsonCodec, JsonCodec~List~RowExpression~~ rowExpressionCodec)
+ optimizeBatch(ConnectorSession session, Map~RowExpression,RowExpression~ expressions, ExpressionOptimizer.Level level) Map~RowExpression,RowExpression~
+ optimize(ConnectorSession session, ExpressionOptimizer.Level level, List~RowExpression~ resolvedExpressions) List~RowExpressionOptimizationResult~
}
class RowExpressionSerializer {
<<json-serializer>>
- RowExpressionSerde rowExpressionSerde
+ RowExpressionSerializer(RowExpressionSerde rowExpressionSerde)
+ serialize(RowExpression rowExpression, JsonGenerator jsonGenerator, SerializerProvider serializerProvider)
+ serializeWithType(RowExpression rowExpression, JsonGenerator jsonGenerator, SerializerProvider serializerProvider, TypeSerializer typeSerializer)
}
class RowExpressionDeserializer {
<<json-deserializer>>
- RowExpressionSerde rowExpressionSerde
+ RowExpressionDeserializer(RowExpressionSerde rowExpressionSerde)
+ deserialize(JsonParser jsonParser, DeserializationContext context) RowExpression
+ deserializeWithType(JsonParser jsonParser, DeserializationContext context, TypeDeserializer typeDeserializer) RowExpression
}
class NativeSidecarPlugin {
+ getExpressionOptimizerFactories() Iterable~ExpressionOptimizerFactory~
}
ExpressionOptimizerManager --> ExpressionOptimizerContext : uses to
ExpressionOptimizerManager --> RowExpressionSerde : depends on
ExpressionOptimizerContext --> RowExpressionSerde : holds
RowExpressionSerde <|.. JsonCodecRowExpressionSerde
NativeSidecarPlugin --> NativeExpressionOptimizerFactory : registers
NativeExpressionOptimizerFactory --> ExpressionOptimizerContext : uses
NativeExpressionOptimizerFactory --> NativeExpressionsModule : creates
NativeExpressionsModule --> NativeExpressionOptimizer : binds
NativeExpressionsModule --> NativeSidecarExpressionInterpreter : binds
NativeExpressionsModule --> RowExpressionSerializer : binds
NativeExpressionsModule --> RowExpressionDeserializer : binds
NativeExpressionsModule --> RowExpressionSerde : uses instance
NativeExpressionOptimizer --> NativeSidecarExpressionInterpreter : uses
RowExpressionSerializer --> RowExpressionSerde : uses
RowExpressionDeserializer --> RowExpressionSerde : uses
ExpressionOptimizerManager ..> NativeExpressionOptimizerFactory : via SPI registration
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- NativeSidecarExpressionInterpreter.optimizeBatch currently assumes every RowExpressionOptimizationResult contains a non-null optimizedExpression and ignores expressionFailureInfo; consider explicitly validating each result and propagating failures (e.g., via a PrestoException with the sidecar error message) instead of risking NPEs or silent partial failures.
- The RowExpression serialization setup is now split between JsonCodecRowExpressionSerde and the custom RowExpressionSerializer/RowExpressionDeserializer in NativeExpressionsModule; it would be cleaner and less error-prone to have a single canonical RowExpressionSerde implementation wired through Guice and reused across server and sidecar/plugin code rather than duplicating JSON wiring.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- NativeSidecarExpressionInterpreter.optimizeBatch currently assumes every RowExpressionOptimizationResult contains a non-null optimizedExpression and ignores expressionFailureInfo; consider explicitly validating each result and propagating failures (e.g., via a PrestoException with the sidecar error message) instead of risking NPEs or silent partial failures.
- The RowExpression serialization setup is now split between JsonCodecRowExpressionSerde and the custom RowExpressionSerializer/RowExpressionDeserializer in NativeExpressionsModule; it would be cleaner and less error-prone to have a single canonical RowExpressionSerde implementation wired through Guice and reused across server and sidecar/plugin code rather than duplicating JSON wiring.
## Individual Comments
### Comment 1
<location> `presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java:201-210` </location>
<code_context>
+ // If the special form is COALESCE, then we can optimize it if there are any duplicate arguments
</code_context>
<issue_to_address>
**issue (bug_risk):** COALESCE optimization condition is always true and will mark all COALESCE forms as optimizable.
In `visitSpecialForm` the COALESCE branch currently does:
```java
ImmutableSet.Builder<RowExpression> builder = ImmutableSet.builder();
...
boolean canBeOptimized = builder.build().size() <= node.getArguments().size() || node.getArguments().size() <= 1;
```
Since a set’s size is always `<=` the original list’s size, this condition is always true, so non-optimizable expressions (including those with non-deterministic children) may be incorrectly treated as optimizable.
To match the comment about duplicates, you likely want to detect whether any duplicates were actually removed, e.g.:
```java
Set<RowExpression> uniqueOptimizableArgs = builder.build();
boolean hasDuplicates = uniqueOptimizableArgs.size() < node.getArguments().size();
boolean canBeOptimized = hasDuplicates || node.getArguments().size() <= 1;
```
This limits optimization to cases with duplicates or a single argument and avoids over-eager constant folding.
</issue_to_address>
### Comment 2
<location> `presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java:191-193` </location>
<code_context>
+ // will stop evaluating arguments as soon as it finds a false argument. Because a sub-expression could be simplified into a constant, and this
+ // constant could cause the expression to short circuit, if there is at least one argument which is optimizable, then the entire expression should
+ // be sent to the sidecar to be optimized.
+ boolean anyArgumentsOptimizable = node.getArguments().stream()
+ .peek(child -> child.accept(this, context))
+ .reduce(false, (a, b) -> canBeOptimized(b) || a, (a, b) -> a || b);
+
+ // If all arguments are constant foldable, then the whole expression is constant foldable
</code_context>
<issue_to_address>
**suggestion:** The reduction logic over arguments is hard to read and mixes traversal with state updates.
`visitSpecialForm` uses `peek(child -> child.accept(this, context))` plus a `reduce` that depends on `canBeOptimized(b)` being updated during `accept`, which is order-dependent and non-obvious:
```java
boolean anyArgumentsOptimizable = node.getArguments().stream()
.peek(child -> child.accept(this, context))
.reduce(false, (a, b) -> canBeOptimized(b) || a, (a, b) -> a || b);
```
A straightforward loop would make the side effects and condition explicit:
```java
boolean anyArgumentsOptimizable = false;
for (RowExpression arg : node.getArguments()) {
arg.accept(this, context);
if (canBeOptimized(arg)) {
anyArgumentsOptimizable = true;
}
}
```
The same approach would simplify the `visitCall` logic for `allConstantFoldable` as well.
Suggested implementation:
```java
// Most special form expressions short circuit, meaning that they potentially don't evaluate all arguments. For example, the AND expression
// will stop evaluating arguments as soon as it finds a false argument. Because a sub-expression could be simplified into a constant, and this
// constant could cause the expression to short circuit, if there is at least one argument which is optimizable, then the entire expression should
// be sent to the sidecar to be optimized.
boolean anyArgumentsOptimizable = false;
for (RowExpression argument : node.getArguments()) {
// Visit the child to allow state (e.g., canBeOptimized) to be updated
argument.accept(this, context);
if (canBeOptimized(argument)) {
anyArgumentsOptimizable = true;
}
}
```
You mentioned the same issue exists in `visitCall` for the `allConstantFoldable` logic. To keep traversal and state updates explicit and consistent, refactor that code similarly:
1. Replace any `stream()` / `peek()` / `reduce()` combination that both traverses arguments and computes `allConstantFoldable` with a simple `for` loop:
- Initialize `boolean allConstantFoldable = true;`
- Loop over each `RowExpression` argument.
- Call `argument.accept(this, context);`
- Update `allConstantFoldable &= isConstantFoldable(argument);` (or equivalent).
2. Ensure, as with `visitSpecialForm`, that you do not short-circuit the loop if later traversal has important side effects on the visitor's state.
</issue_to_address>
### Comment 3
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/expressions/JsonCodecRowExpressionSerde.java:38-46` </location>
<code_context>
+ }
+
+ @Override
+ public String serialize(RowExpression expression)
+ {
+ return new String(codec.toBytes(expression), StandardCharsets.UTF_8);
+ }
+
+ @Override
+ public RowExpression deserialize(String data)
+ {
+ return codec.fromBytes(data.getBytes(StandardCharsets.UTF_8));
+ }
+}
</code_context>
<issue_to_address>
**suggestion (performance):** Avoid unnecessary byte↔string conversions in JsonCodecRowExpressionSerde.
This implementation does an unnecessary UTF-8 round trip:
```java
public String serialize(RowExpression expression)
{
return new String(codec.toBytes(expression), StandardCharsets.UTF_8);
}
public RowExpression deserialize(String data)
{
return codec.fromBytes(data.getBytes(StandardCharsets.UTF_8));
}
```
`JsonCodec` already exposes string-based APIs, so you can avoid extra allocations and encoding/decoding by using them directly:
```java
public String serialize(RowExpression expression)
{
return codec.toJson(expression);
}
public RowExpression deserialize(String data)
{
return codec.fromJson(data);
}
```
Suggested implementation:
```java
@Override
public String serialize(RowExpression expression)
{
return codec.toJson(expression);
}
@Override
public RowExpression deserialize(String data)
{
return codec.fromJson(data);
}
```
If `java.nio.charset.StandardCharsets` is imported at the top of this file and is not used elsewhere, remove that import to avoid an unused import warning.
</issue_to_address>
### Comment 4
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:351-353` </location>
<code_context>
- log.error(e, "Failed to decode RowExpression from sidecar response: %s.", e.getMessage());
- throw new RuntimeException(e);
- }
+ RowExpressionOptimizationResult response = optimize(rowExpression, ExpressionOptimizer.Level.EVALUATED);
+ assertNotNull(response.getExpressionFailureInfo().getMessage());
+ assertTrue(response.getExpressionFailureInfo().getMessage().contains(errorMessage), format("Sidecar response: %s did not contain expected error message: %s.", response, errorMessage));
}
</code_context>
<issue_to_address>
**suggestion (testing):** The failure-path test no longer asserts that `optimizedExpression` is null when an error is returned
In the refactored test we no longer verify that failures return no optimized expression. To preserve that contract, please also assert that `response.getOptimizedExpression()` is null (or equivalent) in this failure-path test.
Suggested implementation:
```java
RowExpressionOptimizationResult response = optimize(rowExpression, ExpressionOptimizer.Level.EVALUATED);
assertNotNull(response.getExpressionFailureInfo().getMessage());
assertTrue(
response.getExpressionFailureInfo().getMessage().contains(errorMessage),
format("Sidecar response: %s did not contain expected error message: %s.", response, errorMessage));
assertNull(
response.getOptimizedExpression(),
format("Expected no optimized expression when an error occurs. Sidecar response: %s", response));
```
If `assertNull` is not already statically imported in this test file, add it alongside the other assertion imports, e.g.:
`import static org.testng.Assert.assertNull;`
(or the equivalent assertion library used in the rest of the file).
</issue_to_address>
### Comment 5
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:388-392` </location>
<code_context>
- log.error(e, "Failed to decode RowExpression from sidecar response: %s.", e.getMessage());
- throw new RuntimeException(e);
- }
+ RowExpressionOptimizationResult response = optimize(expression, level);
- return result;
+ assert response.getExpressionFailureInfo().getMessage() != null;
+ assertTrue(response.getExpressionFailureInfo().getMessage().isEmpty());
+ return response.getOptimizedExpression();
}
</code_context>
<issue_to_address>
**issue (testing):** Avoid using Java `assert` for test conditions; use TestNG assertions instead
In `optimizeRowExpression`, the non-null check
```java
assert response.getExpressionFailureInfo().getMessage() != null;
```
relies on the Java `assert` keyword, which is disabled by default and won’t fail the test unless run with `-ea`. Please replace this with a TestNG assertion (e.g. `assertNotNull(response.getExpressionFailureInfo().getMessage())`) so the check is always enforced in CI.
The subsequent `assertTrue(response.getExpressionFailureInfo().getMessage().isEmpty());` is fine as-is since it already uses TestNG.
</issue_to_address>
### Comment 6
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:84-88` </location>
<code_context>
- public void tearDown()
- {
- sidecar.destroyForcibly();
+ DistributedQueryRunner queryRunner = NativeSidecarPluginQueryRunner.getQueryRunner();
+ FunctionAndTypeManager functionAndTypeManager = queryRunner.getCoordinator().getFunctionAndTypeManager();
+ this.metadata = createTestMetadataManager(functionAndTypeManager);
+ this.translator = new TestingRowExpressionTranslator(metadata);
+ this.rowExpressionInterpreter = getRowExpressionInterpreter(functionAndTypeManager, queryRunner.getCoordinator().getPluginNodeManager());
+ this.visitor = new TestVisitor();
}
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider adding teardown to close `DistributedQueryRunner` and native sidecar resources created in the test
Previously the test explicitly started and stopped a sidecar `Process` in `@BeforeClass`/`@AfterClass`. Now the `DistributedQueryRunner` is created in the constructor via `NativeSidecarPluginQueryRunner.getQueryRunner()` but never closed. Please add an `@AfterClass` (or equivalent) to close the `DistributedQueryRunner` and its sidecar resources to avoid leaking threads/ports between tests.
Suggested implementation:
```java
this.queryRunner = NativeSidecarPluginQueryRunner.getQueryRunner();
```
To fully implement the teardown and avoid leaking the `DistributedQueryRunner` and sidecar resources, you should also:
1. Add a field to the test class to hold the runner:
- Near the top of `TestNativeExpressionInterpreter`, add:
- `private DistributedQueryRunner queryRunner;`
2. Add an `@AfterClass` teardown method that closes the runner:
- Import the close helper:
- `import static io.airlift.testing.Closeables.closeAllRuntimeException;`
- Add a method in the class:
-
```java
@AfterClass(alwaysRun = true)
public void tearDown()
{
closeAllRuntimeException(queryRunner);
}
```
3. Ensure `@AfterClass` is from the same test framework used in this file (Presto usually uses TestNG: `org.testng.annotations.AfterClass`). If the existing setup uses `@AfterMethod`/`@AfterSuite` or JUnit, mirror that instead.
This will ensure the `DistributedQueryRunner` and its sidecar are properly closed between test runs.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
...-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java
Outdated
Show resolved
Hide resolved
...main-base/src/main/java/com/facebook/presto/sql/expressions/JsonCodecRowExpressionSerde.java
Outdated
Show resolved
Hide resolved
...n/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java
Outdated
Show resolved
Hide resolved
|
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and found some issues that need to be addressed.
- In NativeExpressionOptimizer.optimize/NativeSidecarExpressionInterpreter.optimizeBatch, sidecar failures (non-empty expressionFailureInfo) are ignored and the optimizedExpression is blindly used; consider checking expressionFailureInfo and either skipping the replacement (fall back to the original RowExpression) or surfacing a PrestoException to avoid introducing null/invalid expressions into planning.
- NativeExpressionsModule re-installs JsonModule and binds custom RowExpression (de)serializers and list codecs even though a RowExpressionSerde is already provided via the context; consider relying solely on the injected RowExpressionSerde and avoiding duplicate Jackson bindings to reduce configuration complexity and the risk of conflicts.
- In TestNativeExpressionInterpreter.optimizeRowExpression/assertEvaluateFails, the first
assert response.getExpressionFailureInfo().getMessage() != null;uses a raw Java assert and is redundant with the TestNG assertions; for consistency and to avoid dependency on-ea, replace or remove it in favor of TestNG’s Assert methods.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In NativeExpressionOptimizer.optimize/NativeSidecarExpressionInterpreter.optimizeBatch, sidecar failures (non-empty expressionFailureInfo) are ignored and the optimizedExpression is blindly used; consider checking expressionFailureInfo and either skipping the replacement (fall back to the original RowExpression) or surfacing a PrestoException to avoid introducing null/invalid expressions into planning.
- NativeExpressionsModule re-installs JsonModule and binds custom RowExpression (de)serializers and list codecs even though a RowExpressionSerde is already provided via the context; consider relying solely on the injected RowExpressionSerde and avoiding duplicate Jackson bindings to reduce configuration complexity and the risk of conflicts.
- In TestNativeExpressionInterpreter.optimizeRowExpression/assertEvaluateFails, the first `assert response.getExpressionFailureInfo().getMessage() != null;` uses a raw Java assert and is redundant with the TestNG assertions; for consistency and to avoid dependency on `-ea`, replace or remove it in favor of TestNG’s Assert methods.
## Individual Comments
### Comment 1
<location> `presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionsModule.java:55-56` </location>
<code_context>
+ binder.bind(FunctionMetadataManager.class).toInstance(functionMetadataManager);
+ binder.bind(StandardFunctionResolution.class).toInstance(functionResolution);
+
+ // JSON dependencies and setup
+ binder.install(new JsonModule());
+ jsonBinder(binder).addDeserializerBinding(RowExpression.class).to(RowExpressionDeserializer.class).in(Scopes.SINGLETON);
+ jsonBinder(binder).addSerializerBinding(RowExpression.class).to(RowExpressionSerializer.class).in(Scopes.SINGLETON);
</code_context>
<issue_to_address>
**issue (bug_risk):** JsonModule is installed twice, which can cause duplicate Guice bindings
`NativeExpressionOptimizerFactory` already installs `JsonModule` in the `Bootstrap`, and `NativeExpressionsModule` installs it again here. Installing the same Guice module twice can cause duplicate bindings and startup failures. Since you only use the JsonBinder/JsonCodecBinder helpers in this module, remove `binder.install(new JsonModule());` and rely on the JsonModule from the top-level Bootstrap instead.
</issue_to_address>
### Comment 2
<location> `presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java:80-82` </location>
<code_context>
+ Map<RowExpression, RowExpression> expressions = expressionsToOptimize.stream()
+ .collect(toMap(
+ Function.identity(),
+ rowExpression -> rowExpression.accept(
+ new ReplacingVisitor(
+ variable -> toRowExpression(variable.getSourceLocation(), variableResolver.apply(variable), variable.getType())),
+ null),
+ (a, b) -> a));
</code_context>
<issue_to_address>
**issue (bug_risk):** Variable resolver null handling is inconsistent between collection and replacement, which may turn unknowns into literal NULLs
In `CollectingVisitor.visitVariableReference`, `value == null` prevents constant folding, but here `ReplacingVisitor` always calls `toRowExpression(..., variableResolver.apply(variable), ...)`. If the resolver returns null for an unresolved variable, `toRowExpression` produces a `ConstantExpression` with a null value, effectively turning an unknown variable into a literal NULL and diverging from the collection semantics. Please either enforce that the resolver never returns null (e.g., with a precondition) or keep the original `VariableReferenceExpression` when it does, to align behavior with `CollectingVisitor`.
</issue_to_address>
### Comment 3
<location> `presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeSidecarExpressionInterpreter.java:105-106` </location>
<code_context>
+ getSidecarRequest(session, level, resolvedExpressions),
+ createJsonResponseHandler(rowExpressionOptimizationResultJsonCodec));
+ }
+ catch (Exception e) {
+ throw new PrestoException(INVALID_ARGUMENTS, "Failed to get optimized expressions from sidecar.", e);
+ }
+ return optimizedExpressions;
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Using INVALID_ARGUMENTS for transport/sidecar failures may be misleading
Catching all HTTP exceptions and rethrowing them as INVALID_ARGUMENTS mixes user input errors with connectivity/sidecar failures. This makes infrastructure issues (e.g., sidecar down/misconfigured) look like user mistakes. Please use a more suitable error code for transport/remote failures, or at least distinguish 4xx (bad request) from connection/5xx errors when mapping to PrestoException codes.
Suggested implementation:
```java
List<RowExpressionOptimizationResult> optimizedExpressions;
try {
optimizedExpressions = httpClient.execute(
getSidecarRequest(session, level, resolvedExpressions),
createJsonResponseHandler(rowExpressionOptimizationResultJsonCodec));
}
catch (UnexpectedResponseException e) {
int statusCode = e.getStatusCode();
if (statusCode >= 400 && statusCode < 500) {
// Sidecar understood the request but rejected it: treat as invalid arguments
throw new PrestoException(
INVALID_ARGUMENTS,
String.format("Sidecar rejected optimization request (status code %s)", statusCode),
e);
}
// Sidecar returned a server error or unexpected status: treat as remote/transport failure
throw new PrestoException(
REMOTE_HOST_GONE,
String.format("Sidecar request failed with status code %s", statusCode),
e);
}
catch (Exception e) {
// Connection issues, timeouts, etc. are treated as remote failures
throw new PrestoException(REMOTE_HOST_GONE, "Failed to get optimized expressions from sidecar.", e);
}
return optimizedExpressions;
}
```
To compile successfully you will also need to:
1. Add an import for `UnexpectedResponseException` from Airlift HTTP client, for example:
`import io.airlift.http.client.UnexpectedResponseException;`
2. Add a static import for the remote error code:
`import static com.facebook.presto.spi.StandardErrorCode.REMOTE_HOST_GONE;`
These should be placed alongside the existing import for `INVALID_ARGUMENTS` and other Airlift HTTP client imports.
</issue_to_address>
### Comment 4
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:360-362` </location>
<code_context>
- log.error(e, "Failed to decode RowExpression from sidecar response: %s.", e.getMessage());
- throw new RuntimeException(e);
- }
+ RowExpressionOptimizationResult response = optimize(rowExpression, ExpressionOptimizer.Level.EVALUATED);
+ assertNotNull(response.getExpressionFailureInfo().getMessage());
+ assertTrue(response.getExpressionFailureInfo().getMessage().contains(errorMessage), format("Sidecar response: %s did not contain expected error message: %s.", response, errorMessage));
}
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen the failure-path assertion to verify that no optimized expression is returned when evaluation fails
This test now validates the failure message via `RowExpressionOptimizationResult`, but it doesn’t check that `optimizedExpression` is null. To preserve the original failure semantics and catch regressions where a failure is returned alongside an optimized expression, please also assert `assertNull(response.getOptimizedExpression(), ...)` here.
Suggested implementation:
```java
RowExpressionOptimizationResult response = optimize(rowExpression, ExpressionOptimizer.Level.EVALUATED);
assertNotNull(response.getExpressionFailureInfo().getMessage());
assertNull(response.getOptimizedExpression(), format("Expected no optimized expression when evaluation fails, but got: %s", response.getOptimizedExpression()));
assertTrue(
response.getExpressionFailureInfo().getMessage().contains(errorMessage),
format("Sidecar response: %s did not contain expected error message: %s.", response, errorMessage));
}
```
If `assertNull` is not already statically imported in this test file, add it alongside the other TestNG assertions, for example:
- `import static org.testng.Assert.assertNull;`
</issue_to_address>
### Comment 5
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:397-401` </location>
<code_context>
+ RowExpressionOptimizationResult response = optimize(expression, level);
</code_context>
<issue_to_address>
**suggestion (testing):** Use stronger assertions on the optimization result and ensure the optimized expression is non-null
In `optimizeRowExpression`, consider replacing the bare `assert` on `response.getExpressionFailureInfo().getMessage()` with `assertNotNull(...)` so the check still runs when JVM assertions are disabled. Also, add an `assertNotNull(response.getOptimizedExpression(), ...)` before returning it to guarantee that a successful optimization always produces a non-null expression.
Suggested implementation:
```java
import org.testng.annotations.Test;
import static org.testng.Assert.assertNotNull;
```
```java
expression = expression.accept(visitor, null);
RowExpressionOptimizationResult response = optimize(expression, level);
assertNotNull(
response.getOptimizedExpression(),
"Optimized expression must not be null for a successful optimization");
```
```java
assertNotNull(
response.getExpressionFailureInfo().getMessage(),
"Expected non-null failure message for failed optimization");
```
The exact `SEARCH` pattern for the bare `assert` on `response.getExpressionFailureInfo().getMessage()` may differ slightly in your file (e.g., it may include additional conditions or message text). If the search does not match, adjust that block so that the entire original `assert` line (or block) is replaced with the `assertNotNull(...)` call shown above while preserving any additional checks you still need.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| // JSON dependencies and setup | ||
| binder.install(new JsonModule()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): JsonModule is installed twice, which can cause duplicate Guice bindings
NativeExpressionOptimizerFactory already installs JsonModule in the Bootstrap, and NativeExpressionsModule installs it again here. Installing the same Guice module twice can cause duplicate bindings and startup failures. Since you only use the JsonBinder/JsonCodecBinder helpers in this module, remove binder.install(new JsonModule()); and rely on the JsonModule from the top-level Bootstrap instead.
| rowExpression -> rowExpression.accept( | ||
| new ReplacingVisitor( | ||
| variable -> toRowExpression(variable.getSourceLocation(), variableResolver.apply(variable), variable.getType())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Variable resolver null handling is inconsistent between collection and replacement, which may turn unknowns into literal NULLs
In CollectingVisitor.visitVariableReference, value == null prevents constant folding, but here ReplacingVisitor always calls toRowExpression(..., variableResolver.apply(variable), ...). If the resolver returns null for an unresolved variable, toRowExpression produces a ConstantExpression with a null value, effectively turning an unknown variable into a literal NULL and diverging from the collection semantics. Please either enforce that the resolver never returns null (e.g., with a precondition) or keep the original VariableReferenceExpression when it does, to align behavior with CollectingVisitor.
| catch (Exception e) { | ||
| throw new PrestoException(INVALID_ARGUMENTS, "Failed to get optimized expressions from sidecar.", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Using INVALID_ARGUMENTS for transport/sidecar failures may be misleading
Catching all HTTP exceptions and rethrowing them as INVALID_ARGUMENTS mixes user input errors with connectivity/sidecar failures. This makes infrastructure issues (e.g., sidecar down/misconfigured) look like user mistakes. Please use a more suitable error code for transport/remote failures, or at least distinguish 4xx (bad request) from connection/5xx errors when mapping to PrestoException codes.
Suggested implementation:
List<RowExpressionOptimizationResult> optimizedExpressions;
try {
optimizedExpressions = httpClient.execute(
getSidecarRequest(session, level, resolvedExpressions),
createJsonResponseHandler(rowExpressionOptimizationResultJsonCodec));
}
catch (UnexpectedResponseException e) {
int statusCode = e.getStatusCode();
if (statusCode >= 400 && statusCode < 500) {
// Sidecar understood the request but rejected it: treat as invalid arguments
throw new PrestoException(
INVALID_ARGUMENTS,
String.format("Sidecar rejected optimization request (status code %s)", statusCode),
e);
}
// Sidecar returned a server error or unexpected status: treat as remote/transport failure
throw new PrestoException(
REMOTE_HOST_GONE,
String.format("Sidecar request failed with status code %s", statusCode),
e);
}
catch (Exception e) {
// Connection issues, timeouts, etc. are treated as remote failures
throw new PrestoException(REMOTE_HOST_GONE, "Failed to get optimized expressions from sidecar.", e);
}
return optimizedExpressions;
}To compile successfully you will also need to:
- Add an import for
UnexpectedResponseExceptionfrom Airlift HTTP client, for example:
import io.airlift.http.client.UnexpectedResponseException; - Add a static import for the remote error code:
import static com.facebook.presto.spi.StandardErrorCode.REMOTE_HOST_GONE;
These should be placed alongside the existing import forINVALID_ARGUMENTSand other Airlift HTTP client imports.
Description
Add a
ExpressionOptimizerwhich delegates to the native sidecar process to evaluate expressions with Velox.Motivation and Context
#26475 added support for an endpoint in the sidecar for constant folding expressions. This follows up on that by adding an expression interpreter to call that endpoint.
For more context: RFC-0006.
Impact
No impact by default as the old in-memory evaluation is the default.
Test Plan
Tests have been added.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.