rag-solution/services/rag/llamaindex/QWEN.md

# RAG Solution with LlamaIndex and Qdrant

## Project Overview

This is a Retrieval Augmented Generation (RAG) solution built using LlamaIndex as the primary framework and Qdrant as the vector storage. The project is designed to load documents from a shared data directory, store them in a vector database, and enable semantic search and chat capabilities using local Ollama models.

The system has been enhanced to properly handle Russian language documents with Cyrillic characters, ensuring proper encoding during document loading, storage, and retrieval.

### Key Technologies
- **RAG Framework**: LlamaIndex
- **Vector Storage**: Qdrant
- **Embedding Models**: Ollama (configurable via environment variables)
- **Chat Models**: Ollama (configurable via environment variables)
- **Data Directory**: `./../../../data` (relative to project root)
- **Logging**: loguru with file rotation and stdout logging

### Architecture Components
- CLI entry point (`cli.py`)
- Configuration module (`config.py`) - manages model strategies and environment variables
- Document enrichment module (`enrichment.py`)
- Vector storage configuration (`vector_storage.py`)
- Retrieval module (`retrieval.py`)
- Chat agent (`agent.py`)

## Building and Running

### Prerequisites
1. Python virtual environment (already created in `venv` folder)
2. Ollama running locally on default port 11434
3. Qdrant running locally (REST API on port 6333, gRPC on port 6334)
4. Data files in the `./../../../data` directory

### Setup Process
1. Activate the virtual environment:
   ```bash
   source venv/bin/activate
   ```

2. Install required packages based on the document extensions found in the data directory (see EXTENSIONS.md for details)

3. Configure environment variables in `.env` file (copy from `.env.dist`)

4. Run the CLI to initialize the system:
   ```bash
   python cli.py ping  # Should return "pong"
   ```

### Available Commands
- `ping`: Basic connectivity test
- `enrich`: Load and process documents from the data directory into vector storage
- `chat`: Start an interactive chat session with the RAG system

## Development Conventions

### Logging
- Use `loguru` for all logging
- Log to both file (`logs/dev.log`) with rotation and stdout
- Use appropriate log levels (DEBUG, INFO, WARNING, ERROR)

### Environment Variables
- `CHAT_STRATEGY`: Strategy for chat models ("ollama" or "openai")
- `EMBEDDING_STRATEGY`: Strategy for embedding models ("ollama" or "openai")
- `OLLAMA_EMBEDDING_MODEL`: Name of the Ollama model to use for embeddings
- `OLLAMA_CHAT_MODEL`: Name of the Ollama model to use for chat functionality
- `OPENAI_CHAT_URL`: URL for OpenAI-compatible chat API (when using OpenAI strategy)
- `OPENAI_CHAT_KEY`: API key for OpenAI-compatible chat API (when using OpenAI strategy)
- `OPENAI_EMBEDDING_MODEL`: Name of the OpenAI embedding model (when using OpenAI strategy)
- `OPENAI_EMBEDDING_BASE_URL`: Base URL for OpenAI-compatible embedding API (when using OpenAI strategy)
- `OPENAI_EMBEDDING_API_KEY`: API key for OpenAI-compatible embedding API (when using OpenAI strategy)

### Document Processing
- Support multiple file formats based on EXTENSIONS.md
- Use text splitters appropriate for each document type
- Store metadata (filename, page, section, paragraph) with embeddings
- Track processed documents to avoid re-processing (using SQLite if needed)
- Proper encoding handling for Russian/Cyrillic text during loading and retrieval

### Vector Storage
- Collection name: "documents_llamaindex"
- Initialize automatically if not exists
- Support for Ollama embeddings by default
- Optional OpenAI embedding support via OpenRouter (commented out)

## Project Phases

### Phase 1: CLI Entry Point
- [x] Virtual environment setup
- [x] CLI creation with `click` library
- [x] Basic "ping" command implementation

### Phase 2: Framework Installation
- [x] LlamaIndex installation
- [x] Data folder analysis and EXTENSIONS.md creation
- [x] Required loader libraries installation

### Phase 3: Vector Storage Setup
- [x] Qdrant library installation
- [x] Vector storage initialization module
- [x] Collection creation strategy for "documents_llamaindex"
- [x] Ollama embedding model configuration
- [x] Optional OpenAI embedding via OpenRouter (commented)

### Phase 4: Document Enrichment
- [x] Document loading module with appropriate loaders
- [x] Text splitting strategies implementation
- [x] Document tracking mechanism
- [x] CLI command for enrichment
- [x] Russian language/Cyrillic text encoding support during document loading

### Phase 5: Retrieval Feature
- [x] Retrieval module configuration
- [x] Query processing with metadata retrieval
- [x] Russian language/Cyrillic text encoding support

### Phase 6: Model Strategy
- [x] Add `CHAT_STRATEGY` and `EMBEDDING_STRATEGY` environment variables
- [x] Add OpenAI configuration options to .env files
- [x] Create reusable model configuration function
- [x] Update all modules to use the new configuration system
- [x] Ensure proper .env loading across all modules

### Phase 7: Enhanced Logging and Progress Tracking
- [x] Added progress bar using tqdm to show processing progress
- [x] Added logging to show total files and processed count during document enrichment
- [x] Enhanced user feedback during document processing with percentage and counts

### Phase 8: Chat Agent
- [ ] Agent module with Ollama integration
- [ ] Integration with retrieval module
- [ ] CLI command for chat functionality

## File Structure
```
llamaindex/
├── venv/                 # Python virtual environment
├── cli.py               # CLI entry point
├── config.py            # Configuration module for model strategies
├── vector_storage.py    # Vector storage configuration
├── enrichment.py        # Document loading and processing
├── retrieval.py         # Search and retrieval functionality
├── agent.py             # Chat agent implementation (to be created)
├── EXTENSIONS.md        # Supported file extensions and loaders
├── .env.dist            # Environment variable template
├── .env                 # Local environment variables
├── logs/                # Log files directory
│   └── dev.log          # Main log file with rotation
└── PLANNING.md          # Project planning document
```

## Data Directory
The system expects documents to be placed in `./../../../data` relative to the project root. The system will analyze this directory to determine supported file types and appropriate loaders.

## Testing
- Unit tests for individual modules
- Integration tests for end-to-end functionality
- CLI command tests

## Troubleshooting
- Ensure Ollama is running on port 11434
- Verify Qdrant is accessible on ports 6333 (REST) and 6334 (gRPC)
- Check that the data directory contains supported file types
- Review logs in `logs/dev.log` for detailed error information
- For Russian/Cyrillic text issues, ensure proper encoding handling is configured in both enrichment and retrieval modules

## Important Notes
- Do not test long-running or heavy system scripts during development as they can consume significant system resources and take hours to complete
- The enrich command processes all files in the data directory and may require substantial memory and processing time
Cli with ping for llamaindex 2026-02-04 00:59:01 +03:00			`# RAG Solution with LlamaIndex and Qdrant`

			`## Project Overview`

			`This is a Retrieval Augmented Generation (RAG) solution built using LlamaIndex as the primary framework and Qdrant as the vector storage. The project is designed to load documents from a shared data directory, store them in a vector database, and enable semantic search and chat capabilities using local Ollama models.`

Retrieval and also update on russian language 2026-02-04 16:51:50 +03:00			`The system has been enhanced to properly handle Russian language documents with Cyrillic characters, ensuring proper encoding during document loading, storage, and retrieval.`

Cli with ping for llamaindex 2026-02-04 00:59:01 +03:00			`### Key Technologies`
			`- RAG Framework: LlamaIndex`
			`- Vector Storage: Qdrant`
			`- Embedding Models: Ollama (configurable via environment variables)`
			`- Chat Models: Ollama (configurable via environment variables)`
			- Data Directory: `./../../../data` (relative to project root)
			`- Logging: loguru with file rotation and stdout logging`

			`### Architecture Components`
			- CLI entry point (`cli.py`)
llamaindex update + unpacking archives in data 2026-02-09 19:00:23 +03:00			- Configuration module (`config.py`) - manages model strategies and environment variables
Cli with ping for llamaindex 2026-02-04 00:59:01 +03:00			- Document enrichment module (`enrichment.py`)
			- Vector storage configuration (`vector_storage.py`)
			- Retrieval module (`retrieval.py`)
			- Chat agent (`agent.py`)

			`## Building and Running`

			`### Prerequisites`
			1. Python virtual environment (already created in `venv` folder)
			`2. Ollama running locally on default port 11434`
			`3. Qdrant running locally (REST API on port 6333, gRPC on port 6334)`
			4. Data files in the `./../../../data` directory

			`### Setup Process`
			`1. Activate the virtual environment:`
			```bash
			`source venv/bin/activate`
			```

			`2. Install required packages based on the document extensions found in the data directory (see EXTENSIONS.md for details)`

			3. Configure environment variables in `.env` file (copy from `.env.dist`)

			`4. Run the CLI to initialize the system:`
			```bash
			`python cli.py ping # Should return "pong"`
			```

			`### Available Commands`
			- `ping`: Basic connectivity test
			- `enrich`: Load and process documents from the data directory into vector storage
			- `chat`: Start an interactive chat session with the RAG system

			`## Development Conventions`

			`### Logging`
			- Use `loguru` for all logging
			- Log to both file (`logs/dev.log`) with rotation and stdout
			`- Use appropriate log levels (DEBUG, INFO, WARNING, ERROR)`

			`### Environment Variables`
llamaindex update + unpacking archives in data 2026-02-09 19:00:23 +03:00			- `CHAT_STRATEGY`: Strategy for chat models ("ollama" or "openai")
			- `EMBEDDING_STRATEGY`: Strategy for embedding models ("ollama" or "openai")
Cli with ping for llamaindex 2026-02-04 00:59:01 +03:00			- `OLLAMA_EMBEDDING_MODEL`: Name of the Ollama model to use for embeddings
			- `OLLAMA_CHAT_MODEL`: Name of the Ollama model to use for chat functionality
llamaindex update + unpacking archives in data 2026-02-09 19:00:23 +03:00			- `OPENAI_CHAT_URL`: URL for OpenAI-compatible chat API (when using OpenAI strategy)
			- `OPENAI_CHAT_KEY`: API key for OpenAI-compatible chat API (when using OpenAI strategy)
			- `OPENAI_EMBEDDING_MODEL`: Name of the OpenAI embedding model (when using OpenAI strategy)
			- `OPENAI_EMBEDDING_BASE_URL`: Base URL for OpenAI-compatible embedding API (when using OpenAI strategy)
			- `OPENAI_EMBEDDING_API_KEY`: API key for OpenAI-compatible embedding API (when using OpenAI strategy)
Cli with ping for llamaindex 2026-02-04 00:59:01 +03:00
			`### Document Processing`
			`- Support multiple file formats based on EXTENSIONS.md`
			`- Use text splitters appropriate for each document type`
			`- Store metadata (filename, page, section, paragraph) with embeddings`
			`- Track processed documents to avoid re-processing (using SQLite if needed)`
Retrieval and also update on russian language 2026-02-04 16:51:50 +03:00			`- Proper encoding handling for Russian/Cyrillic text during loading and retrieval`
Cli with ping for llamaindex 2026-02-04 00:59:01 +03:00
			`### Vector Storage`
			`- Collection name: "documents_llamaindex"`
			`- Initialize automatically if not exists`
			`- Support for Ollama embeddings by default`
			`- Optional OpenAI embedding support via OpenRouter (commented out)`

			`## Project Phases`

			`### Phase 1: CLI Entry Point`
			`- [x] Virtual environment setup`
			- [x] CLI creation with `click` library
			`- [x] Basic "ping" command implementation`

			`### Phase 2: Framework Installation`
			`- [x] LlamaIndex installation`
File extensions and libraries for llamaindex 2026-02-04 01:02:21 +03:00			`- [x] Data folder analysis and EXTENSIONS.md creation`
			`- [x] Required loader libraries installation`
Cli with ping for llamaindex 2026-02-04 00:59:01 +03:00
			`### Phase 3: Vector Storage Setup`
Vector storage Qdrant initialization and configuration 2026-02-04 01:10:07 +03:00			`- [x] Qdrant library installation`
			`- [x] Vector storage initialization module`
			`- [x] Collection creation strategy for "documents_llamaindex"`
			`- [x] Ollama embedding model configuration`
			`- [x] Optional OpenAI embedding via OpenRouter (commented)`
Cli with ping for llamaindex 2026-02-04 00:59:01 +03:00
			`### Phase 4: Document Enrichment`
Enrichment for llamaindex. It goes for a long time using local model, so better use external model not local, for EMBEDDING 2026-02-04 16:06:01 +03:00			`- [x] Document loading module with appropriate loaders`
			`- [x] Text splitting strategies implementation`
			`- [x] Document tracking mechanism`
			`- [x] CLI command for enrichment`
Retrieval and also update on russian language 2026-02-04 16:51:50 +03:00			`- [x] Russian language/Cyrillic text encoding support during document loading`
Cli with ping for llamaindex 2026-02-04 00:59:01 +03:00
			`### Phase 5: Retrieval Feature`
Retrieval and also update on russian language 2026-02-04 16:51:50 +03:00			`- [x] Retrieval module configuration`
			`- [x] Query processing with metadata retrieval`
			`- [x] Russian language/Cyrillic text encoding support`
Cli with ping for llamaindex 2026-02-04 00:59:01 +03:00
llamaindex update + unpacking archives in data 2026-02-09 19:00:23 +03:00			`### Phase 6: Model Strategy`
			- [x] Add `CHAT_STRATEGY` and `EMBEDDING_STRATEGY` environment variables
			`- [x] Add OpenAI configuration options to .env files`
			`- [x] Create reusable model configuration function`
			`- [x] Update all modules to use the new configuration system`
			`- [x] Ensure proper .env loading across all modules`

			`### Phase 7: Enhanced Logging and Progress Tracking`
			`- [x] Added progress bar using tqdm to show processing progress`
			`- [x] Added logging to show total files and processed count during document enrichment`
			`- [x] Enhanced user feedback during document processing with percentage and counts`

			`### Phase 8: Chat Agent`
Cli with ping for llamaindex 2026-02-04 00:59:01 +03:00			`- [ ] Agent module with Ollama integration`
			`- [ ] Integration with retrieval module`
			`- [ ] CLI command for chat functionality`

			`## File Structure`
			```
			`llamaindex/`
			`├── venv/ # Python virtual environment`
			`├── cli.py # CLI entry point`
llamaindex update + unpacking archives in data 2026-02-09 19:00:23 +03:00			`├── config.py # Configuration module for model strategies`
Vector storage Qdrant initialization and configuration 2026-02-04 01:10:07 +03:00			`├── vector_storage.py # Vector storage configuration`
llamaindex update + unpacking archives in data 2026-02-09 19:00:23 +03:00			`├── enrichment.py # Document loading and processing`
			`├── retrieval.py # Search and retrieval functionality`
Cli with ping for llamaindex 2026-02-04 00:59:01 +03:00			`├── agent.py # Chat agent implementation (to be created)`
File extensions and libraries for llamaindex 2026-02-04 01:02:21 +03:00			`├── EXTENSIONS.md # Supported file extensions and loaders`
Cli with ping for llamaindex 2026-02-04 00:59:01 +03:00			`├── .env.dist # Environment variable template`
File extensions and libraries for llamaindex 2026-02-04 01:02:21 +03:00			`├── .env # Local environment variables`
Cli with ping for llamaindex 2026-02-04 00:59:01 +03:00			`├── logs/ # Log files directory`
			`│ └── dev.log # Main log file with rotation`
			`└── PLANNING.md # Project planning document`
			```

			`## Data Directory`
			The system expects documents to be placed in `./../../../data` relative to the project root. The system will analyze this directory to determine supported file types and appropriate loaders.

			`## Testing`
			`- Unit tests for individual modules`
			`- Integration tests for end-to-end functionality`
			`- CLI command tests`

			`## Troubleshooting`
			`- Ensure Ollama is running on port 11434`
			`- Verify Qdrant is accessible on ports 6333 (REST) and 6334 (gRPC)`
			`- Check that the data directory contains supported file types`
Retrieval and also update on russian language 2026-02-04 16:51:50 +03:00			- Review logs in `logs/dev.log` for detailed error information
llamaindex update + unpacking archives in data 2026-02-09 19:00:23 +03:00			`- For Russian/Cyrillic text issues, ensure proper encoding handling is configured in both enrichment and retrieval modules`

			`## Important Notes`
			`- Do not test long-running or heavy system scripts during development as they can consume significant system resources and take hours to complete`
			`- The enrich command processes all files in the data directory and may require substantial memory and processing time`