Prep for Phase 12 of loading files for enrichment through the adaptive collections

This commit is contained in:
2026-02-10 21:42:59 +03:00
parent 06a3155b6b
commit e9dd28ad55
3 changed files with 38 additions and 10 deletions

View File

@@ -73,3 +73,14 @@ Chosen data folder: relatve ./../../../data - from the current folder
- [x] Write tests for local filesystem implementation, using test/samples folder filled with files and directories for testing of iteration and recursivess
- [x] Create Yandex Disk implementation of the Adaptive Collection. Constructor should have requirement for TOKEN for Yandex Disk.
- [x] Write tests for Yandex Disk implementation, using folder "Общая/Информация". .env.test has YADISK_TOKEN variable for connecting. While testing log output of found files during iterating. If test fails at this step, leave to manual fixing, and this step can be marked as done.
# Phase 12 (using local file system or yandex disk)
During enrichment, we should use adaptive collection from the helpers, for loading documents. We should not use directly local filesystem, but use adaptive collection as a wrapper.
- [ ] Adaptive file in helper now has filename in it, so tests should be adjusted for this
- [ ] Add conditional usage of adaptive collection in the enrichment stage. .env has now variable ENRICHMENT_SOURCE with 2 possible values: yadisk, local
- [ ] With local source, use env variable for local filesystem adaptive collection: ENRICHMENT_LOCAL_PATH
- [ ] With yadisk source, use env variable for YADISK_TOKEN for token for auth within Yandex Disk, ENRICHMENT_YADISK_PATH for path on the Yandex Disk system
- [ ] We still will need filetypes that we will need to skip, so while iterating over files we need to check their extension and skip them.
- [ ] Adaptive files has filename in them, so it should be used when extracting metadata