Skip to content

githubbermoon/memory-lens

Repository files navigation

MemoryLens

MemoryLens is a non-real-time, photo-based memory assistant for blind/low-vision users. It uses advanced AI to create searchable memories from photos and allows users to ask questions about them via voice or text.

Features

  • Capture Memory: Take a photo, describe it, and save it as a rich memory.
  • Identify: Upload a photo to identify people or objects based on past memories.
  • Ask: Ask natural language questions about your saved memories (e.g., "Where did I meet Suresh?").
  • Accessibility: Designed with voice feedback and simple interfaces.

Tech Stack

  • Backend: Modal (Serverless Python)
  • Frontend: Gradio
  • AI Services:
    • Vision: Local/Modal (CLIP ViT-L/14 + captioning)
    • Reasoning: Local/Modal OpenAI-compatible model
    • TTS: ElevenLabs
    • Embeddings: Local CLIP (image + text)
  • Database: SQLite (vector store)
  • Storage: Local/MCP (Contacts, Notes)

Setup

  1. Clone the repository.
  2. Create Environment:
    chmod +x setup_env.sh
    ./setup_env.sh
    conda activate memorylens
  3. Configure API Keys:
    • Copy .env.example to .env.
    • Fill in your API keys for ElevenLabs and your local/Modal LLM endpoints.
    • Set these keys as Modal Secrets if deploying to Modal cloud.

Running

Backend (Modal)

To deploy the backend to Modal:

modal deploy backend/main.py

Copy the returned URL and set it as BACKEND_URL in your .env file.

Frontend (Gradio)

To run the local web interface:

python frontend/app.py

Testing

Run unit tests:

pytest tests/

About

Kind of RAG search on images with auto tagging. Expandable to other forms of information.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published