Skip to content

Conversation

@nightscape
Copy link
Contributor

Convert logical cell_divider_idx to physical index by counting overflow cells with index < cell_divider_idx before accessing cells via cell_get and cell_get_raw_region in the debug validation code path.

The original code used cell_divider_idx (a logical index) directly as a physical index, which would cause 'idx out of bounds' panic when there was an overflow cell with a smaller index.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in B-tree overflow cell handling during balance operations. The bug occurred when validation code used logical cell indices (which include overflow cells) directly as physical indices to access cells via cell_get and cell_get_raw_region, causing "index out of bounds" panics when overflow cells existed with smaller indices.

Key Changes:

  • Converts logical cell indices to physical indices by subtracting overflow cells with smaller indices before accessing cell data in validation code
  • Fixes overflow cell insertion order when multiple cells share the same index by sorting and tracking same_index_count
  • Adds comprehensive regression tests for overflow cell handling with duplicate indices

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
// Insert overflow cells into correct place
// Insert overflow cells into correct place.
// Multiple overflow cells may have the same index, so we need to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They may not have the same index. In fact the only scenario where there are multiple overflow cells is when the insertion of divider cells to the parent during a balance causes the parent to overflow. Even in that case, the divider cells will be in ascending sorted order by index, which means Vec::insert() will not cause incorrect shifting.

The above looks like an invariant we should enforce with an assertion, actually.

@jussisaurio
Copy link
Collaborator

This entire PR seems predicated on the idea that there can be multiple overflow cells with the same index, but this is not a scenario that can occur.

@nightscape
Copy link
Contributor Author

@jussisaurio this is actually a minimization of a bug I had while creating a materialized view on a bigger table containing Org-Mode text. It can be reproduced through other means as well, but the test I added is the most minimized I could find.
If you want, I can add other reproductions on higher level.

@jussisaurio
Copy link
Collaborator

jussisaurio commented Dec 3, 2025

@jussisaurio this is actually a minimization of a bug I had while creating a materialized view on a bigger table containing Org-Mode text. It can be reproduced through other means as well, but the test I added is the most minimized I could find. If you want, I can add other reproductions on higher level.

The latter of the tests you added passes on main too, and the other one artificially inserts overflow cells. So indeed it would be great to see a test that reproduces a bug in a real setting.

Note that materialized views are highly highly WIP so finding a bunch of bugs there is expected. Bugs in b-tree are less expected, but obviously possible. This just means that I would consider it more likely that the issue you've found is caused by the materialized view code.

For example, it seems nearly all the code related to inserting to a btree cursor in core/incremental/compiler.rs and core/incremental/persistence.rs is not properly handling IO when it calls insert(). Not sure this is the cause of the bug you're seeing, but it would explain the possibility of multiple overflow cells, because if any yielding due to IO occurs during those inserts, the matview code doesn't re-enter the insert() function properly, and hence not correctly rebalance the tree, leaving overflow cells dangling.

Opened: #4089

@nightscape
Copy link
Contributor Author

I've updated this PR to expose one issue where data is lost in a matview and removed the rather symptomatic fix.
I'm working on a better one.

jussisaurio added a commit that referenced this pull request Dec 4, 2025
…e adjacent sequential' from Jussi Saurio

References PR #4084

Closes #4086
@nightscape nightscape force-pushed the fix-btree-overflow-logic branch 2 times, most recently from fa33f08 to 9e977d7 Compare December 4, 2025 22:17
@nightscape
Copy link
Contributor Author

nightscape commented Dec 4, 2025

@jussisaurio I hope this is the proper and minimized root cause fix now.
I kept the tests that expose the issue and the fix in two commits so you can verify that it fails before and succeeds afterwards.
It's probably best to squash-merge so the commit history is test-green.

@nightscape nightscape force-pushed the fix-btree-overflow-logic branch 4 times, most recently from 91b46f5 to 244e37b Compare December 6, 2025 10:45
When next() was called and do_seek() returned IOResult::IO, the seek
state machine would be left in SeekState::Seek. However, on the next
call to next(), it would check current_row first (which was set to None
in SeekState::Init) and return false immediately, never resuming the
pending seek.

This caused rows to be silently "lost" when iterating over a
materialized view - the first scan would return fewer rows than
expected (e.g., 192 instead of 250), while a second scan would
return the correct count.

The fix checks if there's a pending seek operation before checking
current_row, allowing the IO operation to complete properly.
@nightscape nightscape force-pushed the fix-btree-overflow-logic branch from 244e37b to c56c103 Compare December 8, 2025 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants