Retrieval and also update on russian language
This commit is contained in:
@@ -4,6 +4,8 @@
|
||||
|
||||
This is a Retrieval Augmented Generation (RAG) solution built using LlamaIndex as the primary framework and Qdrant as the vector storage. The project is designed to load documents from a shared data directory, store them in a vector database, and enable semantic search and chat capabilities using local Ollama models.
|
||||
|
||||
The system has been enhanced to properly handle Russian language documents with Cyrillic characters, ensuring proper encoding during document loading, storage, and retrieval.
|
||||
|
||||
### Key Technologies
|
||||
- **RAG Framework**: LlamaIndex
|
||||
- **Vector Storage**: Qdrant
|
||||
@@ -64,6 +66,7 @@ This is a Retrieval Augmented Generation (RAG) solution built using LlamaIndex a
|
||||
- Use text splitters appropriate for each document type
|
||||
- Store metadata (filename, page, section, paragraph) with embeddings
|
||||
- Track processed documents to avoid re-processing (using SQLite if needed)
|
||||
- Proper encoding handling for Russian/Cyrillic text during loading and retrieval
|
||||
|
||||
### Vector Storage
|
||||
- Collection name: "documents_llamaindex"
|
||||
@@ -95,10 +98,12 @@ This is a Retrieval Augmented Generation (RAG) solution built using LlamaIndex a
|
||||
- [x] Text splitting strategies implementation
|
||||
- [x] Document tracking mechanism
|
||||
- [x] CLI command for enrichment
|
||||
- [x] Russian language/Cyrillic text encoding support during document loading
|
||||
|
||||
### Phase 5: Retrieval Feature
|
||||
- [ ] Retrieval module configuration
|
||||
- [ ] Query processing with metadata retrieval
|
||||
- [x] Retrieval module configuration
|
||||
- [x] Query processing with metadata retrieval
|
||||
- [x] Russian language/Cyrillic text encoding support
|
||||
|
||||
### Phase 6: Chat Agent
|
||||
- [ ] Agent module with Ollama integration
|
||||
@@ -134,4 +139,5 @@ The system expects documents to be placed in `./../../../data` relative to the p
|
||||
- Ensure Ollama is running on port 11434
|
||||
- Verify Qdrant is accessible on ports 6333 (REST) and 6334 (gRPC)
|
||||
- Check that the data directory contains supported file types
|
||||
- Review logs in `logs/dev.log` for detailed error information
|
||||
- Review logs in `logs/dev.log` for detailed error information
|
||||
- For Russian/Cyrillic text issues, ensure proper encoding handling is configured in both enrichment and retrieval modules
|
||||
Reference in New Issue
Block a user