diff --git a/services/rag/langchain/PLANNING.md b/services/rag/langchain/PLANNING.md index af619e3..138981f 100644 --- a/services/rag/langchain/PLANNING.md +++ b/services/rag/langchain/PLANNING.md @@ -11,7 +11,7 @@ Chosen data folder: relatve ./../../../data - from the current folder # Phase 1 (cli entrypoint) - [x] Create virtual env in the `venv` folder in the current directory. -- [ ] Create cli.py file, with the usage of `click` python library. Make default command "ping" which will write output "pong" +- [x] Create cli.py file, with the usage of `click` python library. Make default command "ping" which will write output "pong" # Phase 2 (installation of base framework for RAG solution and preparation for data loading) diff --git a/services/rag/langchain/QWEN.md b/services/rag/langchain/QWEN.md new file mode 100644 index 0000000..53ec532 --- /dev/null +++ b/services/rag/langchain/QWEN.md @@ -0,0 +1,124 @@ +# RAG Solution with Langchain + +## Project Overview + +This is a Retrieval-Augmented Generation (RAG) solution built using the Langchain framework. The project is designed to load documents from a data directory, store them in a vector database (Qdrant), and enable semantic search and chat capabilities using local LLMs via Ollama. + +The project follows a phased development approach with CLI entry points for different functionalities like document loading, retrieval, and chat. + +### Key Technologies: +- **Framework**: Langchain +- **Vector Storage**: Qdrant +- **Embeddings**: Ollama (with fallback option for OpenAI via OpenRouter) +- **Chat Models**: Ollama +- **Data Directory**: `./../../../data` (relative to project root) +- **Virtual Environment**: Python venv in `venv/` directory + +## Project Structure + +``` +rag-solution/services/rag/langchain/ +├── .env.dist # Environment variable template +├── .gitignore # Git ignore rules +├── app.py # Main application file (currently empty) +├── cli.py # CLI entrypoint with click library +├── PLANNING.md # Development roadmap and phases +├── QWEN.md # Current file - project context +├── requirements.txt # Python dependencies +└── venv/ # Virtual environment +``` + +## Dependencies + +The project relies on several key libraries: +- `langchain` and related ecosystem (`langchain-community`, `langchain-core`, `langchain-ollama`) +- `langgraph` for workflow management +- `qdrant-client` for vector storage (to be installed) +- `ollama` for local LLM interaction +- `click` for CLI interface +- `loguru` for logging (to be installed per requirements) +- `python-dotenv` for environment management + +## Development Phases + +The project is organized into 6 development phases as outlined in `PLANNING.md`: + +### Phase 1: CLI Entrypoint +- [x] Virtual environment setup +- [x] Create CLI with `click` library +- [x] Implement "ping" command that outputs "pong" + +### Phase 2: Framework Installation & Data Analysis +- [ ] Install Langchain as base RAG framework +- [ ] Analyze data folder extensions and create `EXTENSIONS.md` +- [ ] Install required loader libraries +- [ ] Configure environment variables + +### Phase 3: Vector Storage Setup +- [ ] Install Qdrant client library +- [ ] Create `vector_storage.py` for initialization +- [ ] Configure Ollama embeddings using `OLLAMA_EMBEDDING_MODEL` +- [ ] Prepare OpenAI fallback (commented) + +### Phase 4: Document Loading Module +- [ ] Create `enrichment.py` for loading documents to vector storage +- [ ] Implement text splitting strategies +- [ ] Add document tracking to prevent re-processing +- [ ] Integrate with CLI + +### Phase 5: Retrieval Feature +- [ ] Create `retrieval.py` for querying vector storage +- [ ] Implement metadata retrieval (filename, page, section, etc.) + +### Phase 6: Chat Agent +- [ ] Create `agent.py` with Ollama-powered chat agent +- [ ] Integrate with retrieval functionality +- [ ] Add CLI command for chat interaction + +## Environment Configuration + +The project uses environment variables for configuration: + +```env +OLLAMA_EMBEDDING_MODEL=MODEL # Name of the Ollama model for embeddings +OLLAMA_CHAT_MODEL=MODEL # Name of the Ollama model for chat +``` + +## Building and Running + +Since the project is in early development stages, the following steps are planned: + +1. **Setup Virtual Environment**: + ```bash + python -m venv venv + source venv/bin/activate # On Windows: venv\Scripts\activate + pip install -r requirements.txt + ``` + +2. **Install Missing Dependencies** (as development progresses): + ```bash + pip install loguru qdrant-client # Examples of needed libraries + ``` + +3. **Configure Environment**: + ```bash + cp .env.dist .env + # Edit .env with appropriate values + ``` + +4. **Run CLI Commands**: + ```bash + python cli.py ping + ``` + +## Development Conventions + +- Use `loguru` for logging with rotation to `logs/dev.log` and stdout +- Follow Langchain best practices for RAG implementations +- Prioritize open-source solutions that don't require external API keys +- Implement proper error handling and document processing tracking +- Use modular code organization with separate files for different components + +## Current Status + +The project is in early development phase. The virtual environment is set up and dependencies are defined, but the core functionality (CLI, document loading, vector storage, etc.) is yet to be implemented according to the planned phases. \ No newline at end of file diff --git a/services/rag/langchain/app.py b/services/rag/langchain/app.py index 0a9c5dc..e69de29 100644 --- a/services/rag/langchain/app.py +++ b/services/rag/langchain/app.py @@ -1 +0,0 @@ -from langchain.agents import create_agent diff --git a/services/rag/langchain/cli.py b/services/rag/langchain/cli.py new file mode 100644 index 0000000..a222c8f --- /dev/null +++ b/services/rag/langchain/cli.py @@ -0,0 +1,32 @@ +import click +from loguru import logger +import os +from pathlib import Path + + +# Configure logging to output to both file and stdout as specified in requirements +def setup_logging(): + # Create logs directory if it doesn't exist + logs_dir = Path("logs") + logs_dir.mkdir(exist_ok=True) + + # Add file logging with rotation + logger.add("logs/dev.log", rotation="10 MB", retention="10 days") + + +@click.group() +def cli(): + """Main CLI group""" + setup_logging() + pass + + +@cli.command(name="ping", help="Ping command that outputs pong") +def ping(): + """Ping command that outputs pong""" + logger.info("Ping command executed") + click.echo("pong") + + +if __name__ == "__main__": + cli() \ No newline at end of file diff --git a/services/rag/langchain/logs/.gitignore b/services/rag/langchain/logs/.gitignore new file mode 100644 index 0000000..d6b7ef3 --- /dev/null +++ b/services/rag/langchain/logs/.gitignore @@ -0,0 +1,2 @@ +* +!.gitignore