File extensions and libraries for llamaindex
This commit is contained in:
46
services/rag/llamaindex/EXTENSIONS.md
Normal file
46
services/rag/llamaindex/EXTENSIONS.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Supported File Extensions and LlamaIndex Loaders
|
||||
|
||||
This document lists the file extensions found in the `./../../../data` directory and the corresponding LlamaIndex loaders that can be used to process them.
|
||||
|
||||
## Document Formats
|
||||
|
||||
| Extension | File Type | LlamaIndex Loader | Installation Package |
|
||||
|-----------|-----------|-------------------|---------------------|
|
||||
| `.pdf` | Portable Document Format | `PDFReader` or `SimpleDirectoryReader` | `llama-index-readers-file` |
|
||||
| `.docx` | Microsoft Word Document | `DocxReader` or `SimpleDirectoryReader` | `llama-index-readers-file` |
|
||||
| `.xlsx` | Microsoft Excel Spreadsheet | `PandasExcelReader` or `SimpleDirectoryReader` | `llama-index-readers-file` |
|
||||
| `.pptx` | Microsoft PowerPoint Presentation | `PptxReader` or `SimpleDirectoryReader` | `llama-index-readers-file` |
|
||||
| `.odt` | OpenDocument Text | `SimpleDirectoryReader` with `UnstructuredReader` | `llama-index-readers-file` |
|
||||
|
||||
## Image Formats
|
||||
|
||||
| Extension | File Type | LlamaIndex Loader | Installation Package |
|
||||
|-----------|-----------|-------------------|---------------------|
|
||||
| `.png` | Portable Network Graphics | `ImageReader` | `llama-index-readers-file` |
|
||||
| `.jpg` | JPEG Image | `ImageReader` | `llama-index-readers-file` |
|
||||
|
||||
## Archive Formats
|
||||
|
||||
| Extension | File Type | LlamaIndex Loader | Installation Package |
|
||||
|-----------|-----------|-------------------|---------------------|
|
||||
| `.zip` | ZIP Archive | `SimpleDirectoryReader` with archive support | `llama-index-readers-file` |
|
||||
|
||||
## System/Special Files (Ignored)
|
||||
|
||||
- `.DS_Store` - macOS system file
|
||||
- `.gitignore` - Git configuration file
|
||||
|
||||
## Audio/Video Formats (Skipped as per requirements)
|
||||
|
||||
- `.m4a` - Audio file
|
||||
- `.mp3` - Audio file
|
||||
- `.mp4` - Video file
|
||||
- `.ogg` - Audio/Video file
|
||||
|
||||
## Notes
|
||||
|
||||
1. Many file types can be loaded using the `SimpleDirectoryReader` which automatically detects and handles multiple file formats.
|
||||
2. For advanced document parsing, specific readers might offer better performance or more features.
|
||||
3. All required dependencies have been installed with `llama-index-readers-file` and `patool` for archive support.
|
||||
4. No external API keys are required for the supported file types, as we're using local processing solutions.
|
||||
5. The system prioritizes local processing over cloud services as per requirements.
|
||||
Reference in New Issue
Block a user