llamaindex update + unpacking archives in data
This commit is contained in:
@@ -16,6 +16,7 @@ The system has been enhanced to properly handle Russian language documents with
|
||||
|
||||
### Architecture Components
|
||||
- CLI entry point (`cli.py`)
|
||||
- Configuration module (`config.py`) - manages model strategies and environment variables
|
||||
- Document enrichment module (`enrichment.py`)
|
||||
- Vector storage configuration (`vector_storage.py`)
|
||||
- Retrieval module (`retrieval.py`)
|
||||
@@ -57,9 +58,15 @@ The system has been enhanced to properly handle Russian language documents with
|
||||
- Use appropriate log levels (DEBUG, INFO, WARNING, ERROR)
|
||||
|
||||
### Environment Variables
|
||||
- `CHAT_STRATEGY`: Strategy for chat models ("ollama" or "openai")
|
||||
- `EMBEDDING_STRATEGY`: Strategy for embedding models ("ollama" or "openai")
|
||||
- `OLLAMA_EMBEDDING_MODEL`: Name of the Ollama model to use for embeddings
|
||||
- `OLLAMA_CHAT_MODEL`: Name of the Ollama model to use for chat functionality
|
||||
- API keys for external services (OpenRouter option available but commented out)
|
||||
- `OPENAI_CHAT_URL`: URL for OpenAI-compatible chat API (when using OpenAI strategy)
|
||||
- `OPENAI_CHAT_KEY`: API key for OpenAI-compatible chat API (when using OpenAI strategy)
|
||||
- `OPENAI_EMBEDDING_MODEL`: Name of the OpenAI embedding model (when using OpenAI strategy)
|
||||
- `OPENAI_EMBEDDING_BASE_URL`: Base URL for OpenAI-compatible embedding API (when using OpenAI strategy)
|
||||
- `OPENAI_EMBEDDING_API_KEY`: API key for OpenAI-compatible embedding API (when using OpenAI strategy)
|
||||
|
||||
### Document Processing
|
||||
- Support multiple file formats based on EXTENSIONS.md
|
||||
@@ -105,7 +112,19 @@ The system has been enhanced to properly handle Russian language documents with
|
||||
- [x] Query processing with metadata retrieval
|
||||
- [x] Russian language/Cyrillic text encoding support
|
||||
|
||||
### Phase 6: Chat Agent
|
||||
### Phase 6: Model Strategy
|
||||
- [x] Add `CHAT_STRATEGY` and `EMBEDDING_STRATEGY` environment variables
|
||||
- [x] Add OpenAI configuration options to .env files
|
||||
- [x] Create reusable model configuration function
|
||||
- [x] Update all modules to use the new configuration system
|
||||
- [x] Ensure proper .env loading across all modules
|
||||
|
||||
### Phase 7: Enhanced Logging and Progress Tracking
|
||||
- [x] Added progress bar using tqdm to show processing progress
|
||||
- [x] Added logging to show total files and processed count during document enrichment
|
||||
- [x] Enhanced user feedback during document processing with percentage and counts
|
||||
|
||||
### Phase 8: Chat Agent
|
||||
- [ ] Agent module with Ollama integration
|
||||
- [ ] Integration with retrieval module
|
||||
- [ ] CLI command for chat functionality
|
||||
@@ -115,9 +134,10 @@ The system has been enhanced to properly handle Russian language documents with
|
||||
llamaindex/
|
||||
├── venv/ # Python virtual environment
|
||||
├── cli.py # CLI entry point
|
||||
├── config.py # Configuration module for model strategies
|
||||
├── vector_storage.py # Vector storage configuration
|
||||
├── enrichment.py # Document loading and processing (to be created)
|
||||
├── retrieval.py # Search and retrieval functionality (to be created)
|
||||
├── enrichment.py # Document loading and processing
|
||||
├── retrieval.py # Search and retrieval functionality
|
||||
├── agent.py # Chat agent implementation (to be created)
|
||||
├── EXTENSIONS.md # Supported file extensions and loaders
|
||||
├── .env.dist # Environment variable template
|
||||
@@ -140,4 +160,8 @@ The system expects documents to be placed in `./../../../data` relative to the p
|
||||
- Verify Qdrant is accessible on ports 6333 (REST) and 6334 (gRPC)
|
||||
- Check that the data directory contains supported file types
|
||||
- Review logs in `logs/dev.log` for detailed error information
|
||||
- For Russian/Cyrillic text issues, ensure proper encoding handling is configured in both enrichment and retrieval modules
|
||||
- For Russian/Cyrillic text issues, ensure proper encoding handling is configured in both enrichment and retrieval modules
|
||||
|
||||
## Important Notes
|
||||
- Do not test long-running or heavy system scripts during development as they can consume significant system resources and take hours to complete
|
||||
- The enrich command processes all files in the data directory and may require substantial memory and processing time
|
||||
Reference in New Issue
Block a user