-
Notifications
You must be signed in to change notification settings - Fork 618
Fix B-tree overflow cell index bug in validation code #4084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix B-tree overflow cell index bug in validation code #4084
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical bug in B-tree overflow cell handling during balance operations. The bug occurred when validation code used logical cell indices (which include overflow cells) directly as physical indices to access cells via cell_get and cell_get_raw_region, causing "index out of bounds" panics when overflow cells existed with smaller indices.
Key Changes:
- Converts logical cell indices to physical indices by subtracting overflow cells with smaller indices before accessing cell data in validation code
- Fixes overflow cell insertion order when multiple cells share the same index by sorting and tracking
same_index_count - Adds comprehensive regression tests for overflow cell handling with duplicate indices
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
core/storage/btree.rs
Outdated
| } | ||
| // Insert overflow cells into correct place | ||
| // Insert overflow cells into correct place. | ||
| // Multiple overflow cells may have the same index, so we need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They may not have the same index. In fact the only scenario where there are multiple overflow cells is when the insertion of divider cells to the parent during a balance causes the parent to overflow. Even in that case, the divider cells will be in ascending sorted order by index, which means Vec::insert() will not cause incorrect shifting.
The above looks like an invariant we should enforce with an assertion, actually.
|
This entire PR seems predicated on the idea that there can be multiple overflow cells with the same index, but this is not a scenario that can occur. |
|
@jussisaurio this is actually a minimization of a bug I had while creating a materialized view on a bigger table containing Org-Mode text. It can be reproduced through other means as well, but the test I added is the most minimized I could find. |
The latter of the tests you added passes on Note that materialized views are highly highly WIP so finding a bunch of bugs there is expected. Bugs in b-tree are less expected, but obviously possible. This just means that I would consider it more likely that the issue you've found is caused by the materialized view code. For example, it seems nearly all the code related to inserting to a btree cursor in Opened: #4089 |
ed7d333 to
f59a8aa
Compare
|
I've updated this PR to expose one issue where data is lost in a matview and removed the rather symptomatic fix. |
fa33f08 to
9e977d7
Compare
|
@jussisaurio I hope this is the proper and minimized root cause fix now. |
91b46f5 to
244e37b
Compare
When next() was called and do_seek() returned IOResult::IO, the seek state machine would be left in SeekState::Seek. However, on the next call to next(), it would check current_row first (which was set to None in SeekState::Init) and return false immediately, never resuming the pending seek. This caused rows to be silently "lost" when iterating over a materialized view - the first scan would return fewer rows than expected (e.g., 192 instead of 250), while a second scan would return the correct count. The fix checks if there's a pending seek operation before checking current_row, allowing the IO operation to complete properly.
244e37b to
c56c103
Compare
Convert logical cell_divider_idx to physical index by counting overflow cells with index < cell_divider_idx before accessing cells via cell_get and cell_get_raw_region in the debug validation code path.
The original code used cell_divider_idx (a logical index) directly as a physical index, which would cause 'idx out of bounds' panic when there was an overflow cell with a smaller index.