MemoryLens is a non-real-time, photo-based memory assistant for blind/low-vision users. It uses advanced AI to create searchable memories from photos and allows users to ask questions about them via voice or text.
- Capture Memory: Take a photo, describe it, and save it as a rich memory.
- Identify: Upload a photo to identify people or objects based on past memories.
- Ask: Ask natural language questions about your saved memories (e.g., "Where did I meet Suresh?").
- Accessibility: Designed with voice feedback and simple interfaces.
- Backend: Modal (Serverless Python)
- Frontend: Gradio
- AI Services:
- Vision: Local/Modal (CLIP ViT-L/14 + captioning)
- Reasoning: Local/Modal OpenAI-compatible model
- TTS: ElevenLabs
- Embeddings: Local CLIP (image + text)
- Database: SQLite (vector store)
- Storage: Local/MCP (Contacts, Notes)
- Clone the repository.
- Create Environment:
chmod +x setup_env.sh ./setup_env.sh conda activate memorylens
- Configure API Keys:
- Copy
.env.exampleto.env. - Fill in your API keys for ElevenLabs and your local/Modal LLM endpoints.
- Set these keys as Modal Secrets if deploying to Modal cloud.
- Copy
To deploy the backend to Modal:
modal deploy backend/main.pyCopy the returned URL and set it as BACKEND_URL in your .env file.
To run the local web interface:
python frontend/app.pyRun unit tests:
pytest tests/