enrichment with years, events
This commit is contained in:
@@ -56,3 +56,12 @@ Chosen data folder: relatve ./../../../data - from the current folder
|
||||
- [x] After accepting API endpont address, it should be used to send requests and process responses to imitate chat with the agent by the provided API endpoint.
|
||||
- [x] Show API endpoint in the header of the chat.
|
||||
- [x] If there is error connecting with the API, imitate bot sending message about error with the connection and suggestion to reload page to provide new API endpoint
|
||||
|
||||
# Phase 10 (extracting additional metadata from chunks, and filtering where possible with it)
|
||||
|
||||
- [x] Create separate function in helpers module (create if does not exist) for retrieving years from the text. It should return found years.
|
||||
- [x] During enriching vector storage, when loading and splitting documents, extract years from the chunk, and add these years as numbers into metadata field "years" (array of number or best suitable Qdrant type for searching by the year if needed). The helper function for retrieving years from text can be used.
|
||||
- [x] Updating VectorStoreRetriever._get_relevant_documents: We need to ensure, that when searching for something with the year (user mentiones year in the query, in Russian language), we search vectors with metadata which has these mentioned year in the "years" array of years. The helper function for retrieving years from query can be used to filter out documents with years.
|
||||
- [x] Create heuristic, regex function in helpers module for extracting name of event, in Russian language. We need to use regex and possible words before, after the event, etc.
|
||||
- [x] Durint enriching vector storage, try to extract event name from the chunk and save in metadata in field "events", which will contain list of strings, possible evennts. Helper function usage is advised.
|
||||
- [x] In VectorStoreRetriever._get_relevant_documents add similarity search for the event name, if event name is present in the query. Helper function should be used here to try to extract the event name.
|
||||
|
||||
Reference in New Issue
Block a user