-
-
Notifications
You must be signed in to change notification settings - Fork 774
Description
Query System Enhancements - Expert Analysis
Background
After comprehensive study of ActivityWatch's architecture, watcher implementation, and query system (detailed in my documentation work), I've identified several high-value enhancement opportunities for the query system.
Study Context:
- Analyzed core architecture: data flow, storage layer, extension points
- Deep-dive into watcher implementation patterns: polling, heartbeats, event collection
- Comprehensive query system review: 25+ functions, execution model, performance characteristics
Proposed Enhancements
1. Enhanced Query Error Messages 🔥 High Impact
Current State: Basic exception classes without contextual information
QueryException,QueryFunctionException,QueryParseException,QueryInterpretException- Error messages lack context about query execution state
- No indication of query position where error occurred
- Missing suggestions for common mistakes
Proposed Improvements:
# Instead of:
QueryFunctionException("Variable 'events' passed to function call is of invalid type")
# Provide:
QueryFunctionException(
"Variable 'events' in function 'filter_keyvals' is of invalid type.\n"
"Expected: List[Event]\n"
"Got: str\n"
"Query context: line 3, position 42\n"
"Suggestion: Did you forget to call query_bucket() first?"
)Benefits:
- Faster debugging for users
- Lower support burden
- Better learning experience
- Reduced trial-and-error
Implementation Effort: Medium (2-3 days)
2. Query Validation Tool 🎯 Developer Experience
Motivation: Pre-validate queries before execution to catch errors early
Capabilities:
- Syntax validation
- Bucket existence checking
- Function signature validation
- Type checking for parameters
- Suggested corrections for typos
Example Usage:
from aw_query.validator import validate_query
result = validate_query(
query="events = query_bucket('nonexistent'); filter_keyval(events, 'app', ['Firefox'])",
datastore=datastore
)
if result.errors:
for error in result.errors:
print(f"Line {error.line}: {error.message}")
if error.suggestion:
print(f" Suggestion: {error.suggestion}")Benefits:
- Catch errors before execution
- Better IDE integration potential
- Improved testing experience
- Educational tool for learning queries
Implementation Effort: Large (1-2 weeks)
3. Query Examples Documentation 📚 Essential
Current Gap: No comprehensive query examples in repository
Proposed Content:
- Common patterns (5+ examples with explanations)
- Function reference with usage examples
- Performance optimization guide
- Troubleshooting common issues
- Interactive tutorial/cookbook
Format: Sphinx/RST documentation integrated with docs.activitywatch.net
Benefits:
- Lower barrier to entry
- Reduce "how do I..." support questions
- Enable advanced usage
- Showcase query system capabilities
Implementation Effort: Medium (1 week)
Note: I have comprehensive documentation already drafted that could be adapted
4. Query Performance Profiler 🚀 Advanced
Motivation: Help users identify slow query operations
Capabilities:
- Per-operation timing
- Data volume metrics
- Optimization suggestions
- Comparison between approaches
Example:
from aw_query.profiler import profile_query
with profile_query() as prof:
result = query("complex query here", datastore)
prof.report()
# Output:
# Operation Time % Total Events Processed
# query_bucket 120ms 40% 10,000
# filter_keyvals 80ms 27% 8,500
# merge_events_by_keys 100ms 33% 8,500
# Total: 300ms 100%
#
# Suggestions:
# - Consider filtering before merge to reduce events processed
# - bucket 'aw-watcher-window' has 10K events, consider date filteringBenefits:
- Enable query optimization
- Identify performance bottlenecks
- Educational about query cost
- Support for large datasets
Implementation Effort: Large (2 weeks)
5. Flooding Algorithm Improvements 🔧 Technical
Context: Issue #1177 identified accuracy problems with non-default polling times
Current Behavior:
- Flooding works correctly with default 1s polling
- Loses 10-20% of time with 5s polling
- Heuristic prefers larger events when filling gaps
Proposed Investigation:
- Analyze flooding algorithm assumptions
- Test even-split vs size-biased approaches
- Document trade-offs and limitations
- Consider configurable strategies
Benefits:
- Accurate tracking with custom polling times
- Better understanding of trade-offs
- Documented behavior and limitations
- Potential for user-configurable strategies
Implementation Effort: Large (2-3 weeks, requires research)
Priority Recommendation
Based on impact/effort ratio:
High Priority (Quick wins):
- Enhanced error messages (Medium effort, High impact)
- Query examples documentation (Medium effort, High impact)
Medium Priority (Valuable, more effort):
3. Query validation tool (Large effort, High value for power users)
Lower Priority (Advanced features):
4. Query performance profiler (Large effort, niche use case)
5. Flooding algorithm improvements (Research project, affects specific edge case)
Implementation Approach
I'm willing to contribute to any of these enhancements. For the documentation (item #3), I have comprehensive draft content that could be adapted to the project's format.
Suggested phased approach:
- Start with enhanced error messages (quick win)
- Add query examples documentation (high user value)
- Evaluate demand for validation tool and profiler
- Research flooding algorithm as separate investigation
Technical Background
These proposals are based on:
- Complete architecture understanding (core components, data flow, storage patterns)
- Watcher implementation expertise (polling, heartbeats, error handling, configuration)
- Deep query system knowledge (25+ functions, execution model, performance characteristics)
- Analysis of existing test coverage and error handling
- Review of common user issues and questions
Questions for Maintainers
- Which enhancements align best with current project priorities?
- Is there interest in the query documentation? I have comprehensive draft content.
- For error messages: Any preference on error message format/style?
- For validation tool: Would this fit better in aw-client or aw-core?
- For flooding improvements: Is this worth investigation given it's edge case?
Let me know if you'd like me to tackle any of these! Happy to discuss technical approach or break down into smaller issues.