A local Retrieval-Augmented Generation (RAG) agent that answers questions using your own PDF documents as its knowledge base. I tried on Luxembourg Pillar 2 and ATAD Legislation.
It provides source citations, generates Mermaid.js flowcharts, and ensures all data remains fully local and secure.
-
Citations
Every answer includes the exact source document and page number. -
Flowchart Generation
Automatically generates Mermaid.js diagrams (useful for company structures or process flows). -
Local & Secure
All your PDFs, embeddings, and API keys are stored locally and never sent anywhere else.
This agent uses a RAG (Retrieval-Augmented Generation) pipeline:
-
Load
On startup, the app reads and processes all PDF documents located in the/datafolder. -
Embed & Index
The text from PDFs is chunked and converted into numerical embeddings using the Google Gemini API.
These embeddings are stored in a local FAISS vector database. -
Retrieve
When a user asks a question, the system searches the FAISS database for the most relevant text chunks. -
Generate
The question and retrieved text are sent to the Gemini LLM, which generates an answer strictly based on the provided sources.
Before running the app, make sure you have these installed:
- Python 3.10+
- Git
Follow these steps to set up the project locally.
git clone https://github.com/vikramlingam/Lux-Pillar-2-RAG-Agent
cd Lux-Pillar-2-RAG-AgentIt’s best to use a virtual environment to isolate dependencies.
# Create the environment
python3 -m venv venv
# Activate the environment (Mac/Linux)
source venv/bin/activate
# (or) Activate the environment (Windows)
.env\Scripts\activateInstall all required Python libraries:
pip install -r requirements.txtYou’ll need a Google Gemini API key for embeddings and responses.
Create a folder named .streamlit in the project root:
mkdir .streamlitThen create a new file named secrets.toml inside it:
touch .streamlit/secrets.tomlOpen secrets.toml and add your API key:
GEMINI_API_KEY = "YOUR_API_KEY_HERE"The agent learns from whatever PDFs you provide.
- Create a folder named
data:mkdir data
- Add all your PDF documents (like OECD commentary, tax firm reports, etc.) into the
/datafolder.
The app automatically detects and reads every .pdf file placed here.
Once setup is complete, start the app:
streamlit run app.pyThis will open the app automatically at:
👉 http://localhost:8501
⚡ Note: On first launch, it may take 30–60 seconds to process and embed all documents.
Once done, the “brain” is cached and responses become nearly instant.
- Local RAG pipeline with Google Gemini
- PDF-based knowledge base
- FAISS for fast semantic retrieval
- Streamlit frontend for easy interaction
- 100% local data control
- Legal and tax research assistants
- Internal policy Q&A systems
- Private company document search
- Knowledge management tools
- Python 3.10+
- Streamlit
- FAISS
- Google Gemini API
- PyPDF
- LangChain
This project is open-source.
Feel free to use, modify, and extend it for your own local RAG workflows.