more sophisticated chat like retrieval for llamaindex

2026-02-26 19:02:05 +03:00
parent 468d5fb572
commit 6b3fa1cfaa
3 changed files with 390 additions and 2 deletions
--- a/services/rag/llamaindex/PLANNING.md
+++ b/services/rag/llamaindex/PLANNING.md
@@ -69,3 +69,27 @@ Chosen data folder: relatve ./../../../data - from the current folder

 - [x] Create file `server.py`, with web framework fastapi, for example
 - [x] Add POST endpoint "/api/test-query" which will use agent, and retrieve response for query, sent in JSON format, field "query"
+
+# Phase 12 (upgrade from simple retrieval to agent-like chat in LlamaIndex)
+
+- [x] Revisit Phase 5 assumption ("simple retrieval only") and explicitly allow agent/chat orchestration in LlamaIndex for QA over documents.
+- [x] Create new module for chat orchestration (for example `agent.py` or `chat_engine.py`) that separates:
+  1) retrieval of source nodes
+  2) answer synthesis with explicit prompt
+  3) response formatting with sources/metadata
+- [x] Implement a LlamaIndex-based chat feature (agent-like behavior) using framework-native primitives (chat engine / agent workflow / tool-calling approach supported by installed version), so the model can iteratively query retrieval tools when needed.
+- [x] Add a retrieval tool wrapper for document search that returns structured snippets (`filename`, `file_path`, `page_label/page`, `chunk_number`, content preview, score) instead of raw text only.
+- [x] Add a grounded answer prompt/template for the LlamaIndex chat path with rules:
+  - answer only from retrieved context
+  - if information is missing, say so directly
+  - prefer exact dates/years and quote filenames/pages where possible
+  - avoid generic claims not supported by sources
+- [x] Add response mode that returns both:
+  - final answer text
+  - list of retrieved sources (content snippet + metadata + score)
+- [x] Add post-processing for retrieved nodes before synthesis:
+  - deduplicate near-identical chunks
+  - drop empty / near-empty chunks
+  - optionally filter low-information chunks (headers/footers)
+- [x] Add optional metadata-aware retrieval improvements (years/events/keywords) parity with LangChain approach (folder near current folder), if feasible in the chosen LlamaIndex primitives.
+- [x] Update `server.py` endpoint to use the new agent-like chat path (keep simple retrieval endpoint available as fallback or debug mode).