Phase 12 done... loading via adaptive collection, yadisk or local

This commit is contained in:
2026-02-10 22:19:27 +03:00
parent e9dd28ad55
commit 1e6ab247b9
5 changed files with 154 additions and 113 deletions

View File

@@ -78,9 +78,9 @@ Chosen data folder: relatve ./../../../data - from the current folder
During enrichment, we should use adaptive collection from the helpers, for loading documents. We should not use directly local filesystem, but use adaptive collection as a wrapper.
- [ ] Adaptive file in helper now has filename in it, so tests should be adjusted for this
- [ ] Add conditional usage of adaptive collection in the enrichment stage. .env has now variable ENRICHMENT_SOURCE with 2 possible values: yadisk, local
- [ ] With local source, use env variable for local filesystem adaptive collection: ENRICHMENT_LOCAL_PATH
- [ ] With yadisk source, use env variable for YADISK_TOKEN for token for auth within Yandex Disk, ENRICHMENT_YADISK_PATH for path on the Yandex Disk system
- [ ] We still will need filetypes that we will need to skip, so while iterating over files we need to check their extension and skip them.
- [ ] Adaptive files has filename in them, so it should be used when extracting metadata
- [x] Adaptive file in helper now has filename in it, so tests should be adjusted for this
- [x] Add conditional usage of adaptive collection in the enrichment stage. .env has now variable ENRICHMENT_SOURCE with 2 possible values: yadisk, local
- [x] With local source, use env variable for local filesystem adaptive collection: ENRICHMENT_LOCAL_PATH
- [x] With yadisk source, use env variable for YADISK_TOKEN for token for auth within Yandex Disk, ENRICHMENT_YADISK_PATH for path on the Yandex Disk system
- [x] We still will need filetypes that we will need to skip, so while iterating over files we need to check their extension and skip them.
- [x] Adaptive files has filename in them, so it should be used when extracting metadata