Skip to content

Query System Enhancements - Expert Analysis & Proposal #1188

@TimeToBuildBob

Description

@TimeToBuildBob

Query System Enhancements - Expert Analysis

Background

After comprehensive study of ActivityWatch's architecture, watcher implementation, and query system (detailed in my documentation work), I've identified several high-value enhancement opportunities for the query system.

Study Context:

  • Analyzed core architecture: data flow, storage layer, extension points
  • Deep-dive into watcher implementation patterns: polling, heartbeats, event collection
  • Comprehensive query system review: 25+ functions, execution model, performance characteristics

Proposed Enhancements

1. Enhanced Query Error Messages 🔥 High Impact

Current State: Basic exception classes without contextual information

  • QueryException, QueryFunctionException, QueryParseException, QueryInterpretException
  • Error messages lack context about query execution state
  • No indication of query position where error occurred
  • Missing suggestions for common mistakes

Proposed Improvements:

# Instead of:
QueryFunctionException("Variable 'events' passed to function call is of invalid type")

# Provide:
QueryFunctionException(
    "Variable 'events' in function 'filter_keyvals' is of invalid type.\n"
    "Expected: List[Event]\n"
    "Got: str\n"
    "Query context: line 3, position 42\n"
    "Suggestion: Did you forget to call query_bucket() first?"
)

Benefits:

  • Faster debugging for users
  • Lower support burden
  • Better learning experience
  • Reduced trial-and-error

Implementation Effort: Medium (2-3 days)

2. Query Validation Tool 🎯 Developer Experience

Motivation: Pre-validate queries before execution to catch errors early

Capabilities:

  • Syntax validation
  • Bucket existence checking
  • Function signature validation
  • Type checking for parameters
  • Suggested corrections for typos

Example Usage:

from aw_query.validator import validate_query

result = validate_query(
    query="events = query_bucket('nonexistent'); filter_keyval(events, 'app', ['Firefox'])",
    datastore=datastore
)

if result.errors:
    for error in result.errors:
        print(f"Line {error.line}: {error.message}")
        if error.suggestion:
            print(f"  Suggestion: {error.suggestion}")

Benefits:

  • Catch errors before execution
  • Better IDE integration potential
  • Improved testing experience
  • Educational tool for learning queries

Implementation Effort: Large (1-2 weeks)

3. Query Examples Documentation 📚 Essential

Current Gap: No comprehensive query examples in repository

Proposed Content:

  • Common patterns (5+ examples with explanations)
  • Function reference with usage examples
  • Performance optimization guide
  • Troubleshooting common issues
  • Interactive tutorial/cookbook

Format: Sphinx/RST documentation integrated with docs.activitywatch.net

Benefits:

  • Lower barrier to entry
  • Reduce "how do I..." support questions
  • Enable advanced usage
  • Showcase query system capabilities

Implementation Effort: Medium (1 week)
Note: I have comprehensive documentation already drafted that could be adapted

4. Query Performance Profiler 🚀 Advanced

Motivation: Help users identify slow query operations

Capabilities:

  • Per-operation timing
  • Data volume metrics
  • Optimization suggestions
  • Comparison between approaches

Example:

from aw_query.profiler import profile_query

with profile_query() as prof:
    result = query("complex query here", datastore)

prof.report()
# Output:
# Operation              Time    % Total  Events Processed
# query_bucket          120ms        40%  10,000
# filter_keyvals         80ms        27%   8,500
# merge_events_by_keys  100ms        33%   8,500
# Total:                300ms       100%
#
# Suggestions:
# - Consider filtering before merge to reduce events processed
# - bucket 'aw-watcher-window' has 10K events, consider date filtering

Benefits:

  • Enable query optimization
  • Identify performance bottlenecks
  • Educational about query cost
  • Support for large datasets

Implementation Effort: Large (2 weeks)

5. Flooding Algorithm Improvements 🔧 Technical

Context: Issue #1177 identified accuracy problems with non-default polling times

Current Behavior:

  • Flooding works correctly with default 1s polling
  • Loses 10-20% of time with 5s polling
  • Heuristic prefers larger events when filling gaps

Proposed Investigation:

  • Analyze flooding algorithm assumptions
  • Test even-split vs size-biased approaches
  • Document trade-offs and limitations
  • Consider configurable strategies

Benefits:

  • Accurate tracking with custom polling times
  • Better understanding of trade-offs
  • Documented behavior and limitations
  • Potential for user-configurable strategies

Implementation Effort: Large (2-3 weeks, requires research)

Priority Recommendation

Based on impact/effort ratio:

High Priority (Quick wins):

  1. Enhanced error messages (Medium effort, High impact)
  2. Query examples documentation (Medium effort, High impact)

Medium Priority (Valuable, more effort):
3. Query validation tool (Large effort, High value for power users)

Lower Priority (Advanced features):
4. Query performance profiler (Large effort, niche use case)
5. Flooding algorithm improvements (Research project, affects specific edge case)

Implementation Approach

I'm willing to contribute to any of these enhancements. For the documentation (item #3), I have comprehensive draft content that could be adapted to the project's format.

Suggested phased approach:

  1. Start with enhanced error messages (quick win)
  2. Add query examples documentation (high user value)
  3. Evaluate demand for validation tool and profiler
  4. Research flooding algorithm as separate investigation

Technical Background

These proposals are based on:

  • Complete architecture understanding (core components, data flow, storage patterns)
  • Watcher implementation expertise (polling, heartbeats, error handling, configuration)
  • Deep query system knowledge (25+ functions, execution model, performance characteristics)
  • Analysis of existing test coverage and error handling
  • Review of common user issues and questions

Questions for Maintainers

  1. Which enhancements align best with current project priorities?
  2. Is there interest in the query documentation? I have comprehensive draft content.
  3. For error messages: Any preference on error message format/style?
  4. For validation tool: Would this fit better in aw-client or aw-core?
  5. For flooding improvements: Is this worth investigation given it's edge case?

Let me know if you'd like me to tackle any of these! Happy to discuss technical approach or break down into smaller issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions