langchain loading documents into vector storage

2026-02-03 20:52:08 +03:00
parent 762ed89843
commit 8d7e39a603
5 changed files with 299 additions and 42 deletions
--- a/services/rag/langchain/QWEN.md
+++ b/services/rag/langchain/QWEN.md
@@ -24,6 +24,7 @@ rag-solution/services/rag/langchain/
 ├── app.py             # Main application file (currently empty)
 ├── cli.py             # CLI entrypoint with click library
 ├── EXTENSIONS.md      # Supported file extensions and LangChain loaders
+├── enrichment.py      # Document enrichment module for loading documents to vector storage
 ├── PLANNING.md        # Development roadmap and phases
 ├── QWEN.md            # Current file - project context
 ├── requirements.txt   # Python dependencies
@@ -64,10 +65,10 @@ The project is organized into 6 development phases as outlined in `PLANNING.md`:
 - [x] Prepare OpenAI fallback (commented)

 ### Phase 4: Document Loading Module
- [ ] Create `enrichment.py` for loading documents to vector storage
- [ ] Implement text splitting strategies
- [ ] Add document tracking to prevent re-processing
- [ ] Integrate with CLI
+- [x] Create `enrichment.py` for loading documents to vector storage
+- [x] Implement text splitting strategies
+- [x] Add document tracking to prevent re-processing
+- [x] Integrate with CLI

 ### Phase 5: Retrieval Feature
 - [ ] Create `retrieval.py` for querying vector storage