llamaindex update + unpacking archives in data

2026-02-09 19:00:23 +03:00
parent 0adbc29692
commit f9c47c772f
11 changed files with 478 additions and 100 deletions
--- a/services/rag/llamaindex/PLANNING.md
+++ b/services/rag/llamaindex/PLANNING.md
@@ -36,14 +36,19 @@ Chosen data folder: relatve ./../../../data - from the current folder
 - [x] Create file `retrieval.py` with the configuration for chosen RAG framework, that will retrieve data from the vector storage based on the query. Use retrieving library/plugin, that supports chosen vector storage within the chosen RAG framework. Retrieving configuration should search for the provided text in the query as argument in the function and return found information with the stored meta data, like paragraph, section, page etc. Important: if for chosen RAG framework, there is no need in separation of search, separation of retrieving from the chosen vector storage, this step may be skipped and marked done.

 # Phase 6 (models strategy, loading env and update on using openai models)
- [ ] Add `CHAT_STRATEGY`, `EMBEDDING_STRATEGY` fields to .env, possible values are "openai" or "ollama".
- [ ] Add `OPENAI_CHAT_URL`, `OPENAI_CHAT_KEY`, `OPENAI_EMBEDDING_MODEL`, `OPENAI_EMBEDDING_BASE_URL`, `OPENAI_EMBEDDING_API_KEY` values to .env.dist with dummy values and to .env with dummy values.
- [ ] Add in all important .env wise places in the code loading .env file for it's variables
- [ ] Create reusable function, that will return configuration for models. It will check CHAT_STRATEGY and load environment variables accordingly, and return config for usage.
- [ ] Add this function everywhere in the codebase where chat or embedding models configuration needed

-# Phase 7 (chat feature, as agent, for usage in the cli)
+- [x] Add `CHAT_STRATEGY`, `EMBEDDING_STRATEGY` fields to .env, possible values are "openai" or "ollama".
+- [x] Add `OPENAI_CHAT_URL`, `OPENAI_CHAT_KEY`, `OPENAI_EMBEDDING_MODEL`, `OPENAI_EMBEDDING_BASE_URL`, `OPENAI_EMBEDDING_API_KEY` values to .env.dist with dummy values and to .env with dummy values.
+- [x] Add in all important .env wise places in the code loading .env file for it's variables
+- [x] Create reusable function, that will return configuration for models. It will check CHAT_STRATEGY and load environment variables accordingly, and return config for usage.
+- [x] Add this function everywhere in the codebase where chat or embedding models configuration needed

- [ ] Create file `agent.py`, which will incorporate into itself agent, powered by the chat model. It should use integration with ollama, model specified in .env in property: OLLAMA_CHAT_MODEL
- [ ] Integrate this agent with the existing solution for retrieving, with retrieval.py
+# Phase 7 (explicit logging and progressbar)
+
+- [x] Add log of how many files currently being processed in enrichment. We need to see how many total to process and how many processed each time new document being processed. If it's possible, also add progressbar showing percentage and those numbers on top of logs.
+
+# Phase 8 (chat feature, as agent, for usage in the cli)
+
+- [ ] Create file `agent.py`, which will incorporate into itself agent, powered by the chat model. It should use integration with openai, env variables are configure
+- [ ] Integrate this agent with the existing solution for retrieving, with retrieval.py, if it's possible in current chosen RAG framework
 - [ ] Integrate this agent with the cli, as command to start chatting with the agent. If there is a built-in solution for console communication with the agent, initiate this on cli command.