rag-solution/services/rag/langchain/PLANNING.md

# Requirements

Libraries should be installed into the local virtual environment, which is defined in the `venv` folder.
If some libraries are not installed, check online which are best and install them.
Use if possible logging, using library `loguru`, for steps. Use logrotation in file `logs/dev.log`, also log to stdout

Chosen RAG framework: Langchain
Chosen Vector Storage: Qdrant
Chosen data folder: relatve ./../../../data - from the current folder

# Phase 1 (cli entrypoint)

- [x] Create virtual env in the `venv` folder in the current directory.
- [x] Create cli.py file, with the usage of `click` python library. Make default command "ping" which will write output "pong"

# Phase 2 (installation of base framework for RAG solution and preparation for data loading)

- [x] Install langchain as base framework for RAG solution
- [x] Analyze the upper `data` folder (./../../../data), to learn all the possible files extensions of files there. Then, create file in the current directory `EXTENSIONS.md` with the list of extensions - and loader/loaders for chosen framework (this can be learned online - search for the info), that is needed to load the data in the provided extension. Prioriize libraries that work without external service that require API keys or paid subscriptions. Important: skip stream media files extensions (audio, video). We are not going to load them now.
- [x] Install all needed libraries for loaders, mentioned in the `EXTENSIONS.md`. If some libraries require API keys for external services, add them to the `.env` file (create it if it does not exist)

# Phase 3 (preparation for storing data in the vector storage + embeddings)
- [x] Install needed library for using Qdrant connection as vector storage. Ensure ports are used (which are needed in the chosen framework): Rest Api: 6333, gRPC Api: 6334. Database available and running on localhost.
- [x] Create file called `vector_storage.py`, which will contain vector storage initialization, available for import by other modules of initialized. If needed in chosen RAG framework, add embedding model initialization in the same file. Use ollama, model name defined in the .env file: OLLAMA_EMBEDDING_MODEL. Ollama available by the default local port: 11434
- [x] Just in case add possibility to connect via openai embedding, using openrouter api key. Comment this section, so it can be used in the future.

# Phase 4 (creating module for loading documents from the folder)

- [x] Create file `enrichment.py` with the function that will load data with configured data loaders for extensions from the data folder into the chosen vector storage. Remember to specify default embeddings meta properties, such as filename, paragraph, page, section, wherever this is possible (documents can have pages, sections, paragraphs, etc). Use text splitters of the chosen RAG framework accordingly to the documents being loaded. Which chunking/text-splitting strategies framework has, can be learned online.
- [x] Use built-in strategy for marking which documents loaded (if there is such mechanism) and which are not, to avoid re-reading and re-encriching vector storage with the existing data. If there is no built-in mechanism of this type, install sqlite library and use local sqlite database file to store this information.
- [x] Add activation of this function in the cli entrypoint, as a command.

# Phase 5 (preparation for the retrieval feature)

- [x] Create file `retrieval.py` with the configuration for chosen RAG framework, that will retrieve data from the vector storage based on the query. Use retrieving library/plugin, that supports chosen vector storage within the chosen RAG framework. Retrieving configuration should search for the provided text in the query as argument in the function and return found information with the stored meta data, like paragraph, section, page etc. Important: if for chosen RAG framework, there is no need in separation of search, separation of retrieving from the chosen vector storage, this step may be skipped and marked done.

# Phase 6 (chat feature, as agent, for usage in the cli)

- [x] Create file `agent.py`, which will incorporate into itself agent, powered by the chat model. It should use integration with ollama, model specified in .env in property: OLLAMA_CHAT_MODEL
- [x] Integrate this agent with the existing solution for retrieving, with retrieval.py
- [x] Integrate this agent with the cli, as command to start chatting with the agent. If there is a built-in solution for console communication with the agent, initiate this on cli command.

# Phase 7 (openai integration for chat model)
- [x] Create openai integration, with using .env variables `OPENAI_CHAT_URL`, `OPENAI_CHAT_KEY`. First one for openai compatible URL, second one for Authorization Bearer token.
- [x] Make this integration optional, with using .env variable `CHAT_MODEL_STRATEGY`. There can be 2 options: "ollama", "openai". Ollama currently already done and working, so we should write code for checking which option is chosen in .env, with ollama being the default.

# Phase 8 (http endpoint to retrieve data from the vector storage by query)

- [x] Create file `server.py`, with web framework fastapi, for example
- [x] Add POST endpoint "/api/test-query" which will use agent, and retrieve response for query, sent in JSON format, field "query"

# Phase 9 (simple html web page with chat interface)

- [x] Create html webpage called demo.html, with simple UI for chat interface. It can be taken with predefined data from codepen or something
- [x] Adjust demo.html code, so it would in fact work with the API endpoint, as chat with the agent. API endpoint should be asked beforehand in propmt message.
- [x] After accepting API endpont address, it should be used to send requests and process responses to imitate chat with the agent by the provided API endpoint.
- [x] Show API endpoint in the header of the chat.
- [x] If there is error connecting with the API, imitate bot sending message about error with the connection and suggestion to reload page to provide new API endpoint

# Phase 10 (extracting additional metadata from chunks, and filtering where possible with it)

- [x] Create separate function in helpers module (create if does not exist) for retrieving years from the text. It should return found years.
- [x] During enriching vector storage, when loading and splitting documents, extract years from the chunk, and add these years as numbers into metadata field "years" (array of number or best suitable Qdrant type for searching by the year if needed). The helper function for retrieving years from text can be used.
- [x] Updating VectorStoreRetriever._get_relevant_documents: We need to ensure, that when searching for something with the year (user mentiones year in the query, in Russian language), we search vectors with metadata which has these mentioned year in the "years" array of years. The helper function for retrieving years from query can be used to filter out documents with years.
- [x] Create heuristic, regex function in helpers module for extracting name of event, in Russian language. We need to use regex and possible words before, after the event, etc.
- [x] Durint enriching vector storage, try to extract event name from the chunk and save in metadata in field "events", which will contain list of strings, possible evennts. Helper function usage is advised.
- [x] In VectorStoreRetriever._get_relevant_documents add similarity search for the event name, if event name is present in the query. Helper function should be used here to try to extract the event name.

# Phase 11 (adaptive collection, to attach different filesystems in the future)

- [x] Create adaptive collection class and adaptive file class in the helpers, which will be as abstract classes, that should encompass feature of iterating and working with files locally
- [x] Write local filesystem implementation of adaptive collection
- [ ] Write tests for local filesystem implementation, using test/samples folder filled with files and directories for testing of iteration and recursivess
- [ ] Create Yandex Disk implementation of the Adaptive Collection. Constructor should have requirement for TOKEN for Yandex Disk.
- [ ] Write tests for Yandex Disk implementation, using folder "Общая/Информация". .env has YADISK_TOKEN variable for connecting. While testing log output of found files during iterating. If test fails at this step, leave to manual fixing, and this step can be marked as done.
Initial commit 2026-02-03 19:24:41 +03:00			`# Requirements`

			Libraries should be installed into the local virtual environment, which is defined in the `venv` folder.
			`If some libraries are not installed, check online which are best and install them.`
			Use if possible logging, using library `loguru`, for steps. Use logrotation in file `logs/dev.log`, also log to stdout

			`Chosen RAG framework: Langchain`
			`Chosen Vector Storage: Qdrant`
			`Chosen data folder: relatve ./../../../data - from the current folder`

			`# Phase 1 (cli entrypoint)`

			- [x] Create virtual env in the `venv` folder in the current directory.
langchain done cli 2026-02-03 19:51:35 +03:00			- [x] Create cli.py file, with the usage of `click` python library. Make default command "ping" which will write output "pong"
Initial commit 2026-02-03 19:24:41 +03:00
			`# Phase 2 (installation of base framework for RAG solution and preparation for data loading)`

langchain extensions for data files and their libraries 2026-02-03 20:17:13 +03:00			`- [x] Install langchain as base framework for RAG solution`
			- [x] Analyze the upper `data` folder (./../../../data), to learn all the possible files extensions of files there. Then, create file in the current directory `EXTENSIONS.md` with the list of extensions - and loader/loaders for chosen framework (this can be learned online - search for the info), that is needed to load the data in the provided extension. Prioriize libraries that work without external service that require API keys or paid subscriptions. Important: skip stream media files extensions (audio, video). We are not going to load them now.
			- [x] Install all needed libraries for loaders, mentioned in the `EXTENSIONS.md`. If some libraries require API keys for external services, add them to the `.env` file (create it if it does not exist)
Initial commit 2026-02-03 19:24:41 +03:00
			`# Phase 3 (preparation for storing data in the vector storage + embeddings)`
langchain vector storage connection and confguration 2026-02-03 20:42:09 +03:00			`- [x] Install needed library for using Qdrant connection as vector storage. Ensure ports are used (which are needed in the chosen framework): Rest Api: 6333, gRPC Api: 6334. Database available and running on localhost.`
			- [x] Create file called `vector_storage.py`, which will contain vector storage initialization, available for import by other modules of initialized. If needed in chosen RAG framework, add embedding model initialization in the same file. Use ollama, model name defined in the .env file: OLLAMA_EMBEDDING_MODEL. Ollama available by the default local port: 11434
			`- [x] Just in case add possibility to connect via openai embedding, using openrouter api key. Comment this section, so it can be used in the future.`
Initial commit 2026-02-03 19:24:41 +03:00
			`# Phase 4 (creating module for loading documents from the folder)`

langchain loading documents into vector storage 2026-02-03 20:52:08 +03:00			- [x] Create file `enrichment.py` with the function that will load data with configured data loaders for extensions from the data folder into the chosen vector storage. Remember to specify default embeddings meta properties, such as filename, paragraph, page, section, wherever this is possible (documents can have pages, sections, paragraphs, etc). Use text splitters of the chosen RAG framework accordingly to the documents being loaded. Which chunking/text-splitting strategies framework has, can be learned online.
			`- [x] Use built-in strategy for marking which documents loaded (if there is such mechanism) and which are not, to avoid re-reading and re-encriching vector storage with the existing data. If there is no built-in mechanism of this type, install sqlite library and use local sqlite database file to store this information.`
			`- [x] Add activation of this function in the cli entrypoint, as a command.`
Initial commit 2026-02-03 19:24:41 +03:00
			`# Phase 5 (preparation for the retrieval feature)`

Working retrieval with the cli 2026-02-03 23:25:24 +03:00			- [x] Create file `retrieval.py` with the configuration for chosen RAG framework, that will retrieve data from the vector storage based on the query. Use retrieving library/plugin, that supports chosen vector storage within the chosen RAG framework. Retrieving configuration should search for the provided text in the query as argument in the function and return found information with the stored meta data, like paragraph, section, page etc. Important: if for chosen RAG framework, there is no need in separation of search, separation of retrieving from the chosen vector storage, this step may be skipped and marked done.
Initial commit 2026-02-03 19:24:41 +03:00
			`# Phase 6 (chat feature, as agent, for usage in the cli)`

Working chat with AI agent with retrieving data 2026-02-04 00:02:53 +03:00			- [x] Create file `agent.py`, which will incorporate into itself agent, powered by the chat model. It should use integration with ollama, model specified in .env in property: OLLAMA_CHAT_MODEL
			`- [x] Integrate this agent with the existing solution for retrieving, with retrieval.py`
			`- [x] Integrate this agent with the cli, as command to start chatting with the agent. If there is a built-in solution for console communication with the agent, initiate this on cli command.`
Langchain plan phases for openai integration (openai compaible endpoint), server for retrieving data 2026-02-04 21:34:22 +03:00
			`# Phase 7 (openai integration for chat model)`
openai compatible integration done 2026-02-04 22:30:57 +03:00			- [x] Create openai integration, with using .env variables `OPENAI_CHAT_URL`, `OPENAI_CHAT_KEY`. First one for openai compatible URL, second one for Authorization Bearer token.
			- [x] Make this integration optional, with using .env variable `CHAT_MODEL_STRATEGY`. There can be 2 options: "ollama", "openai". Ollama currently already done and working, so we should write code for checking which option is chosen in .env, with ollama being the default.
Langchain plan phases for openai integration (openai compaible endpoint), server for retrieving data 2026-02-04 21:34:22 +03:00
			`# Phase 8 (http endpoint to retrieve data from the vector storage by query)`

preparations for demo html page 2026-02-04 22:50:24 +03:00			- [x] Create file `server.py`, with web framework fastapi, for example
			`- [x] Add POST endpoint "/api/test-query" which will use agent, and retrieve response for query, sent in JSON format, field "query"`

			`# Phase 9 (simple html web page with chat interface)`

Working demo.html with connection to the api endpoint 2026-02-04 23:13:00 +03:00			`- [x] Create html webpage called demo.html, with simple UI for chat interface. It can be taken with predefined data from codepen or something`
			`- [x] Adjust demo.html code, so it would in fact work with the API endpoint, as chat with the agent. API endpoint should be asked beforehand in propmt message.`
			`- [x] After accepting API endpont address, it should be used to send requests and process responses to imitate chat with the agent by the provided API endpoint.`
			`- [x] Show API endpoint in the header of the chat.`
			`- [x] If there is error connecting with the API, imitate bot sending message about error with the connection and suggestion to reload page to provide new API endpoint`
enrichment with years, events 2026-02-10 13:20:19 +03:00
			`# Phase 10 (extracting additional metadata from chunks, and filtering where possible with it)`

			`- [x] Create separate function in helpers module (create if does not exist) for retrieving years from the text. It should return found years.`
			`- [x] During enriching vector storage, when loading and splitting documents, extract years from the chunk, and add these years as numbers into metadata field "years" (array of number or best suitable Qdrant type for searching by the year if needed). The helper function for retrieving years from text can be used.`
			`- [x] Updating VectorStoreRetriever._get_relevant_documents: We need to ensure, that when searching for something with the year (user mentiones year in the query, in Russian language), we search vectors with metadata which has these mentioned year in the "years" array of years. The helper function for retrieving years from query can be used to filter out documents with years.`
			`- [x] Create heuristic, regex function in helpers module for extracting name of event, in Russian language. We need to use regex and possible words before, after the event, etc.`
			`- [x] Durint enriching vector storage, try to extract event name from the chunk and save in metadata in field "events", which will contain list of strings, possible evennts. Helper function usage is advised.`
			`- [x] In VectorStoreRetriever._get_relevant_documents add similarity search for the event name, if event name is present in the query. Helper function should be used here to try to extract the event name.`
Adaptive Collection, and Phase 11 WIP 2026-02-10 20:12:43 +03:00
			`# Phase 11 (adaptive collection, to attach different filesystems in the future)`

			`- [x] Create adaptive collection class and adaptive file class in the helpers, which will be as abstract classes, that should encompass feature of iterating and working with files locally`
			`- [x] Write local filesystem implementation of adaptive collection`
			`- [ ] Write tests for local filesystem implementation, using test/samples folder filled with files and directories for testing of iteration and recursivess`
			`- [ ] Create Yandex Disk implementation of the Adaptive Collection. Constructor should have requirement for TOKEN for Yandex Disk.`
			`- [ ] Write tests for Yandex Disk implementation, using folder "Общая/Информация". .env has YADISK_TOKEN variable for connecting. While testing log output of found files during iterating. If test fails at this step, leave to manual fixing, and this step can be marked as done.`