-
Notifications
You must be signed in to change notification settings - Fork 277
Description
Issue: Unhandled exceptions on LLM‑initiated invalid tool/node calls terminate agent run
Environment
- Library:
Koog0.5.3 - Example project:
trip-planning-example - LLM model:
GoogleModels.Gemini2_5Flash(switched from the default) - OS/Runtime: JVM (Gradle run), Kotlin Multiplatform project
Summary
When the LLM occasionally references a non‑existent tool/node or returns an incompatible tool result, the execution fails with an unhandled exception in AIAgentNode, which aborts the entire agent run. The current behavior logs the error and invokes the pipeline’s onNodeExecutionFailed hook, but the exception is then rethrown, causing the process to terminate instead of allowing the agent to recover and ask the LLM to select a valid tool/node.
Details
- The failure manifests as a
kotlinx.serialization.SerializationExceptionduring tool result handling, and/or when the LLM produces a call to a non‑existent tool/node. - Excerpt from logs (from
scratch.txt):Error executing node (name: callToolsHacked): No tag in stack for requested element- Repeated
kotlinx.serialization.SerializationException: No tag in stack for requested elementoriginating fromToolResultserialization andGenericAgentEnvironment.processToolCall - The run ends with non‑zero exit (
Execution exception reported by server!and Gradle task FAILED)
- Relevant code path (current behavior in
AIAgentNode):The exception is rethrown after reporting, which prevents the agent from recovering from LLM‑provoked mistakes.} catch (e: Exception) { logger.error(e) { "Error executing node (name: $name): ${e.message}" } context.pipeline.onNodeExecutionFailed(this@AIAgentNode, context, input, inputType, e) throw e }
Steps to Reproduce
- In
trip-planning-example, switch the model toGoogleModels.Gemini2_5Flash. - Run the example and interact normally (e.g., ask for a plan in Sofia).
- Intermittently, the LLM will issue a malformed tool call or reference a tool/node that does not exist.
- Observe the serialization error and subsequent termination of the run.
Actual Behavior
- The agent logs the failure, rethrows the exception from
AIAgentNode, and terminates the run, returning a non‑zero exit status.
Expected/Desired Behavior
- LLM‑provoked execution errors (e.g., non‑existent tool/node names or incompatible tool result shapes) are handled gracefully so the agent can continue.
- Ideally, the error would be surfaced back to the LLM in a structured manner, allowing it to choose a valid tool/node on the next step without aborting the entire run.
A gentle suggestion
Would you consider an option to treat such failures as recoverable at the node level (e.g., via a configurable retry/recovery policy or a pipeline strategy), where onNodeExecutionFailed could return a structured signal back to the LLM instead of rethrowing immediately? This could enable the agent to request a corrected tool/node call from the model and proceed without terminating the session.
If a configuration already exists to achieve this (e.g., by customizing the pipeline or error policy to suppress the rethrow and respond to the LLM with an error context), guidance on how to enable it in Koog 0.5.3 would be greatly appreciated.
Additional Context
- Example node name in logs:
callToolsHacked(from the example’s execution path) - Exceptions originate around
ToolResultserialization andGenericAgentEnvironment.processToolCallwhen the LLM tool call is malformed.
Thank you in advance for your time and for the excellent work on Koog! If any extra logs or a minimal reproduction are useful, I can provide them.
