langchain loading documents into vector storage
This commit is contained in:
@@ -26,9 +26,9 @@ Chosen data folder: relatve ./../../../data - from the current folder
|
||||
|
||||
# Phase 4 (creating module for loading documents from the folder)
|
||||
|
||||
- [ ] Create file `enrichment.py` with the function that will load data with configured data loaders for extensions from the data folder into the chosen vector storage. Remember to specify default embeddings meta properties, such as filename, paragraph, page, section, wherever this is possible (documents can have pages, sections, paragraphs, etc). Use text splitters of the chosen RAG framework accordingly to the documents being loaded. Which chunking/text-splitting strategies framework has, can be learned online.
|
||||
- [ ] Use built-in strategy for marking which documents loaded (if there is such mechanism) and which are not, to avoid re-reading and re-encriching vector storage with the existing data. If there is no built-in mechanism of this type, install sqlite library and use local sqlite database file to store this information.
|
||||
- [ ] Add activation of this function in the cli entrypoint, as a command.
|
||||
- [x] Create file `enrichment.py` with the function that will load data with configured data loaders for extensions from the data folder into the chosen vector storage. Remember to specify default embeddings meta properties, such as filename, paragraph, page, section, wherever this is possible (documents can have pages, sections, paragraphs, etc). Use text splitters of the chosen RAG framework accordingly to the documents being loaded. Which chunking/text-splitting strategies framework has, can be learned online.
|
||||
- [x] Use built-in strategy for marking which documents loaded (if there is such mechanism) and which are not, to avoid re-reading and re-encriching vector storage with the existing data. If there is no built-in mechanism of this type, install sqlite library and use local sqlite database file to store this information.
|
||||
- [x] Add activation of this function in the cli entrypoint, as a command.
|
||||
|
||||
# Phase 5 (preparation for the retrieval feature)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user