Skip to content

Conversation

@gouhongshen
Copy link
Contributor

@gouhongshen gouhongshen commented Dec 8, 2025

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue ##13959

What this PR does / why we need it:

fix block filter EQ, IN, ...


PR Type

Bug fix


Description

  • Fix block filter search functions to correctly handle duplicate values

  • Add early return checks for empty input vectors or search values

  • Refactor binary search logic to find all matching indices for duplicates

  • Improve handling of variable-length and fixed-size data type searches


Diagram Walkthrough

flowchart LR
  A["Search Functions<br/>OrderedBinarySearch<br/>VarlenBinarySearch<br/>FixedSizedBinarySearch"] -->|"Add empty input checks"| B["Early Return<br/>for empty data"]
  A -->|"Refactor duplicate handling"| C["Find all matching<br/>indices per value"]
  C -->|"Small dataset path"| D["Binary search with<br/>run detection"]
  C -->|"Large dataset path"| E["Two-pointer merge<br/>with run handling"]
  D --> F["Return all indices<br/>for duplicates"]
  E --> F
Loading

File Walkthrough

Relevant files
Bug fix
search.go
Refactor search functions to handle duplicate values         

pkg/container/vector/search.go

  • Add empty input validation to OrderedBinarySearchOffsetByValFactory,
    VarlenBinarySearchOffsetByValFactory, and
    FixedSizedBinarySearchOffsetByValFactory
  • Refactor small dataset binary search path to detect and handle
    consecutive duplicate values by finding run boundaries
  • Refactor large dataset two-pointer merge path to collect all indices
    for matching duplicate values instead of just first occurrence
  • Optimize variable-length search by pre-extracting varlena data and
    area references
+120/-56
Tests
search_test.go
Add comprehensive tests for duplicate value handling         

pkg/container/vector/search_test.go

  • Add comprehensive test for variable-length binary search with small
    dataset and duplicate values
  • Add benchmark test for variable-length search with various hit rates
    and dataset sizes
  • Add test for ordered binary search with duplicate values across
    multiple data ranges
  • Add test for fixed-size binary search with duplicate values and custom
    comparator
  • Add test for variable-length search with duplicate numeric strings
+292/-0 

@qodo-code-review
Copy link

qodo-code-review bot commented Dec 8, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🟡
🎫 #13959
🟢 Optimize the reader to support applying filters on unsorted blocks.
Improve performance behavior for equality and IN filters over unsorted data.
Ensure filtering works for cases with no cluster key but with a primary key.
Ensure filtering works for cases with a cluster key that is a fake primary key.
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No auditing: The added search functions and tests perform in-memory computations without any logging of
critical actions, though it is unclear if these operations require audit trails within
this context.

Referred Code
if len(rows) == 0 || len(vals) == 0 {
	return sels
}
subVals := vals
if len(vals) >= kMinLenForSubVector {
	minVal := rows[0]
	maxVal := rows[len(rows)-1]
	lowerBound := sort.Search(len(vals), func(i int) bool {
		return minVal <= vals[i]
	})
	upperBound := sort.Search(len(vals), func(i int) bool {
		return maxVal < vals[i]
	})
	subVals = vals[lowerBound:upperBound]
	if len(subVals) == 0 {
		return sels
	}
}

if len(subVals) <= kMaxLenForBinarySearch {
	start := 0


 ... (clipped 42 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

qodo-code-review bot commented Dec 8, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Refactor duplicated search logic

The search functions OrderedBinarySearchOffsetByValFactory,
VarlenBinarySearchOffsetByValFactory, and
FixedSizedBinarySearchOffsetByValFactory share almost identical logic. This
duplicated code should be refactored into a single generic helper function to
reduce redundancy and improve maintainability.

Examples:

pkg/container/vector/search.go [378-427]
		if len(subVals) <= kMaxLenForBinarySearch {
			start := 0
			n1 := len(rows)
			for i := 0; i < len(subVals); i++ {
				if i > 0 && subVals[i] == subVals[i-1] {
					continue
				}
				idx := sort.Search(n1-start, func(idx int) bool {
					return rows[start+idx] >= subVals[i]
				})

 ... (clipped 40 lines)
pkg/container/vector/search.go [536-586]
		if len(subVals) <= kMaxLenForBinarySearch {
			start := 0
			n1 := len(rows)
			for i := 0; i < len(subVals); i++ {
				if i > 0 && cmp(subVals[i], subVals[i-1]) == 0 {
					continue
				}
				idx := sort.Search(n1-start, func(idx int) bool {
					return cmp(rows[start+idx], subVals[i]) >= 0
				})

 ... (clipped 41 lines)

Solution Walkthrough:

Before:

func OrderedBinarySearchOffsetByValFactory[T types.OrderedT](vals []T) func(*Vector) []int64 {
  // ...
  if len(subVals) <= kMaxLenForBinarySearch {
    // ... specific binary search logic for OrderedT with duplicate handling
  } else {
    // ... specific merge-join logic for OrderedT with duplicate handling
  }
  return sels
}

func VarlenBinarySearchOffsetByValFactory(vals [][]byte) func(*Vector) []int64 {
  // ...
  if len(subVals) <= kMaxLenForBinarySearch {
    // ... nearly identical binary search logic for Varlen
  } else {
    // ... nearly identical merge-join logic for Varlen
  }
  return sels
}
// ... and similarly for FixedSizedBinarySearchOffsetByValFactory

After:

func genericSearch[T, U any](
    rows []T, vals []U,
    less func(T, U) bool,
    equal func(T, U) bool,
) []int64 {
  // ...
  if len(vals) <= kMaxLenForBinarySearch {
    // ... generic binary search logic with duplicate handling
  } else {
    // ... generic merge-join logic with duplicate handling
  }
  return sels
}

func OrderedBinarySearchOffsetByValFactory[T types.OrderedT](vals []T) func(*Vector) []int64 {
    // ... setup and call genericSearch
}

func VarlenBinarySearchOffsetByValFactory(vals [][]byte) func(*Vector) []int64 {
    // ... setup and call genericSearch
}
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies significant logic duplication across three functions, which poses a long-term maintenance risk, making it an important and valid design improvement.

Medium
  • Update

@mergify mergify bot added the kind/bug Something isn't working label Dec 8, 2025
@mergify mergify bot added the queued label Dec 9, 2025
@mergify
Copy link
Contributor

mergify bot commented Dec 9, 2025

Merge Queue Status

✅ The pull request has been merged at a9fb29c

This pull request spent 56 minutes 36 seconds in the queue, including 56 minutes 27 seconds running CI.
The checks were run in-place.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
  • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
  • branch-protection-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-neutral = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-skipped = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / SCA Test on Ubuntu/x86
    • check-neutral = Matrixone CI / SCA Test on Ubuntu/x86
    • check-skipped = Matrixone CI / SCA Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / UT Test on Ubuntu/x86
    • check-neutral = Matrixone CI / UT Test on Ubuntu/x86
    • check-skipped = Matrixone CI / UT Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Utils CI / Coverage
    • check-neutral = Matrixone Utils CI / Coverage
    • check-skipped = Matrixone Utils CI / Coverage

@mergify mergify bot merged commit 7a67ffd into matrixorigin:main Dec 9, 2025
19 checks passed
@mergify mergify bot removed the queued label Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working Review effort 3/5 size/M Denotes a PR that changes [100,499] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants