File extensions and libraries for llamaindex

This commit is contained in:
2026-02-04 01:02:21 +03:00
parent fa26d77520
commit c37aec1d99
3 changed files with 52 additions and 6 deletions

View File

@@ -0,0 +1,46 @@
# Supported File Extensions and LlamaIndex Loaders
This document lists the file extensions found in the `./../../../data` directory and the corresponding LlamaIndex loaders that can be used to process them.
## Document Formats
| Extension | File Type | LlamaIndex Loader | Installation Package |
|-----------|-----------|-------------------|---------------------|
| `.pdf` | Portable Document Format | `PDFReader` or `SimpleDirectoryReader` | `llama-index-readers-file` |
| `.docx` | Microsoft Word Document | `DocxReader` or `SimpleDirectoryReader` | `llama-index-readers-file` |
| `.xlsx` | Microsoft Excel Spreadsheet | `PandasExcelReader` or `SimpleDirectoryReader` | `llama-index-readers-file` |
| `.pptx` | Microsoft PowerPoint Presentation | `PptxReader` or `SimpleDirectoryReader` | `llama-index-readers-file` |
| `.odt` | OpenDocument Text | `SimpleDirectoryReader` with `UnstructuredReader` | `llama-index-readers-file` |
## Image Formats
| Extension | File Type | LlamaIndex Loader | Installation Package |
|-----------|-----------|-------------------|---------------------|
| `.png` | Portable Network Graphics | `ImageReader` | `llama-index-readers-file` |
| `.jpg` | JPEG Image | `ImageReader` | `llama-index-readers-file` |
## Archive Formats
| Extension | File Type | LlamaIndex Loader | Installation Package |
|-----------|-----------|-------------------|---------------------|
| `.zip` | ZIP Archive | `SimpleDirectoryReader` with archive support | `llama-index-readers-file` |
## System/Special Files (Ignored)
- `.DS_Store` - macOS system file
- `.gitignore` - Git configuration file
## Audio/Video Formats (Skipped as per requirements)
- `.m4a` - Audio file
- `.mp3` - Audio file
- `.mp4` - Video file
- `.ogg` - Audio/Video file
## Notes
1. Many file types can be loaded using the `SimpleDirectoryReader` which automatically detects and handles multiple file formats.
2. For advanced document parsing, specific readers might offer better performance or more features.
3. All required dependencies have been installed with `llama-index-readers-file` and `patool` for archive support.
4. No external API keys are required for the supported file types, as we're using local processing solutions.
5. The system prioritizes local processing over cloud services as per requirements.