Files
rag-solution/services/rag/llamaindex/EXTENSIONS.md

2.3 KiB

Supported File Extensions and LlamaIndex Loaders

This document lists the file extensions found in the ./../../../data directory and the corresponding LlamaIndex loaders that can be used to process them.

Document Formats

Extension File Type LlamaIndex Loader Installation Package
.pdf Portable Document Format PDFReader or SimpleDirectoryReader llama-index-readers-file
.docx Microsoft Word Document DocxReader or SimpleDirectoryReader llama-index-readers-file
.xlsx Microsoft Excel Spreadsheet PandasExcelReader or SimpleDirectoryReader llama-index-readers-file
.pptx Microsoft PowerPoint Presentation PptxReader or SimpleDirectoryReader llama-index-readers-file
.odt OpenDocument Text SimpleDirectoryReader with UnstructuredReader llama-index-readers-file

Image Formats

Extension File Type LlamaIndex Loader Installation Package
.png Portable Network Graphics ImageReader llama-index-readers-file
.jpg JPEG Image ImageReader llama-index-readers-file

Archive Formats

Extension File Type LlamaIndex Loader Installation Package
.zip ZIP Archive SimpleDirectoryReader with archive support llama-index-readers-file

System/Special Files (Ignored)

  • .DS_Store - macOS system file
  • .gitignore - Git configuration file

Audio/Video Formats (Skipped as per requirements)

  • .m4a - Audio file
  • .mp3 - Audio file
  • .mp4 - Video file
  • .ogg - Audio/Video file

Notes

  1. Many file types can be loaded using the SimpleDirectoryReader which automatically detects and handles multiple file formats.
  2. For advanced document parsing, specific readers might offer better performance or more features.
  3. All required dependencies have been installed with llama-index-readers-file and patool for archive support.
  4. No external API keys are required for the supported file types, as we're using local processing solutions.
  5. The system prioritizes local processing over cloud services as per requirements.