46 lines
2.3 KiB
Markdown
46 lines
2.3 KiB
Markdown
# Supported File Extensions and LlamaIndex Loaders
|
|
|
|
This document lists the file extensions found in the `./../../../data` directory and the corresponding LlamaIndex loaders that can be used to process them.
|
|
|
|
## Document Formats
|
|
|
|
| Extension | File Type | LlamaIndex Loader | Installation Package |
|
|
|-----------|-----------|-------------------|---------------------|
|
|
| `.pdf` | Portable Document Format | `PDFReader` or `SimpleDirectoryReader` | `llama-index-readers-file` |
|
|
| `.docx` | Microsoft Word Document | `DocxReader` or `SimpleDirectoryReader` | `llama-index-readers-file` |
|
|
| `.xlsx` | Microsoft Excel Spreadsheet | `PandasExcelReader` or `SimpleDirectoryReader` | `llama-index-readers-file` |
|
|
| `.pptx` | Microsoft PowerPoint Presentation | `PptxReader` or `SimpleDirectoryReader` | `llama-index-readers-file` |
|
|
| `.odt` | OpenDocument Text | `SimpleDirectoryReader` with `UnstructuredReader` | `llama-index-readers-file` |
|
|
|
|
## Image Formats
|
|
|
|
| Extension | File Type | LlamaIndex Loader | Installation Package |
|
|
|-----------|-----------|-------------------|---------------------|
|
|
| `.png` | Portable Network Graphics | `ImageReader` | `llama-index-readers-file` |
|
|
| `.jpg` | JPEG Image | `ImageReader` | `llama-index-readers-file` |
|
|
|
|
## Archive Formats
|
|
|
|
| Extension | File Type | LlamaIndex Loader | Installation Package |
|
|
|-----------|-----------|-------------------|---------------------|
|
|
| `.zip` | ZIP Archive | `SimpleDirectoryReader` with archive support | `llama-index-readers-file` |
|
|
|
|
## System/Special Files (Ignored)
|
|
|
|
- `.DS_Store` - macOS system file
|
|
- `.gitignore` - Git configuration file
|
|
|
|
## Audio/Video Formats (Skipped as per requirements)
|
|
|
|
- `.m4a` - Audio file
|
|
- `.mp3` - Audio file
|
|
- `.mp4` - Video file
|
|
- `.ogg` - Audio/Video file
|
|
|
|
## Notes
|
|
|
|
1. Many file types can be loaded using the `SimpleDirectoryReader` which automatically detects and handles multiple file formats.
|
|
2. For advanced document parsing, specific readers might offer better performance or more features.
|
|
3. All required dependencies have been installed with `llama-index-readers-file` and `patool` for archive support.
|
|
4. No external API keys are required for the supported file types, as we're using local processing solutions.
|
|
5. The system prioritizes local processing over cloud services as per requirements. |