Prefect client prep for langchain

This commit is contained in:
2026-02-16 15:12:44 +03:00
parent 93d538ecc6
commit 77c578c9e6
6 changed files with 148 additions and 94 deletions

View File

@@ -6,7 +6,7 @@ Use if possible logging, using library `loguru`, for steps. Use logrotation in f
Chosen RAG framework: Langchain
Chosen Vector Storage: Qdrant
Chosen data folder: relatve ./../../../data - from the current folder
Chosen data folder: relative ./../../../data - from the current folder
# Phase 1 (cli entrypoint)
@@ -101,3 +101,13 @@ During this Phase we create asynchronous process of enrichment, utilizing async/
- [x] Function process_adaptive_files_queue should be started in number of threads (defined in .env ENRICHMENT_ADAPTIVE_FILE_PROCESS_THREADS)
- [x] Function upload_processed_documents_from_queue should be started in number of threads (defined in .env ENRICHMENT_ADAPTIVE_DOCUMENT_UPLOADS_THREADS)
- [x] Program should control threads. Function insert_adaptive_files_queue, after adaptive collection ends, then should wait untill all theads finish. What does finish mean? It means when our insert_adaptive_files_queue function realizes that there is no adaptive files left in collection, it marks shared variable between threads, that collection finished. When our other functions in threads sees that this variable became true - they deplete queue and do not go to the next loop to wait for new items in queue, and just finish. This would eventually finish the program. Each thread finishes, and main program too as usual after processing all of things.
# Phase 14 (integration of Prefect client, for creating flow and tasks on remote Prefect server)
- [ ] Install Prefect client library.
- [ ] Add .env variable PREFECT_API_URL, that will be used for connecting client to the prefect server
- [ ] Create prefect client file in `prefect/01_yadisk_analyze.py`. In this file we will work with prefect flows and tasks for this phase.
- [ ] Create prefect flow called "analyze_yadisk_file_urls"
- [ ] Create prefect task "iterate_yadisk_folder_and_store_file_paths" that will connect to yandex disk with yadisk library, analyze everything inside folder `Общая` recursively and store file paths in the ./../../../yadisk_files.json, in array of strings.
- [ ] In our pefect file add function for flow to serve, as per prefect documentation on serving flows
- [ ] Tests will be done manually by hand, by executing this script and checking prefect dashboard. No automatical tests needed for this phase.