Loader
Before you can start indexing your documents, you need to load them into memory.
All "basic" data loaders can be seen below, mapped to their respective filetypes in SimpleDirectoryReader
. More loaders are shown in the sidebar on the left.
Additionally the following loaders exist without separate documentation:
AssemblyAIReader
transcribes audio using AssemblyAI.- AudioTranscriptReader: loads entire transcript as a single document.
- AudioTranscriptParagraphsReader: creates a document per paragraph.
- AudioTranscriptSentencesReader: creates a document per sentence.
- AudioSubtitlesReader: creates a document containing the subtitles of a transcript.
- NotionReader loads Notion pages.
- SimpleMongoReader loads data from a MongoDB.
Check the LlamaIndexTS Github for the most up to date overview of integrations.
SimpleDirectoryReader
LlamaIndex.TS supports easy loading of files from folders using the SimpleDirectoryReader
class.
It is a simple reader that reads all files from a directory and its subdirectories.
import { SimpleDirectoryReader } from "llamaindex/readers/SimpleDirectoryReader";
// or
// import { SimpleDirectoryReader } from 'llamaindex'
const reader = new SimpleDirectoryReader();
const documents = await reader.loadData("../data");
documents.forEach((doc) => {
console.log(`document (${doc.id_}):`, doc.getText());
});
Currently, the following readers are mapped to specific file types:
- TextFileReader:
.txt
- PDFReader:
.pdf
- PapaCSVReader:
.csv
- MarkdownReader:
.md
- DocxReader:
.docx
- HTMLReader:
.htm
,.html
- ImageReader:
.jpg
,.jpeg
,.png
,.gif
You can modify the reader three different ways:
overrideReader
overrides the reader for all file types, including unsupported ones.fileExtToReader
maps a reader to a specific file type. Can override reader for existing file types or add support for new file types.defaultReader
sets a fallback reader for files with unsupported extensions. By default it isTextFileReader
.
SimpleDirectoryReader supports up to 9 concurrent requests. Use the numWorkers
option to set the number of concurrent requests. By default it runs in sequential mode, i.e. set to 1.
Example
import type { Document, Metadata } from "llamaindex";
import { FileReader } from "llamaindex";
import {
FILE_EXT_TO_READER,
SimpleDirectoryReader,
} from "llamaindex/readers/SimpleDirectoryReader";
import { TextFileReader } from "llamaindex/readers/TextFileReader";
class ZipReader extends FileReader {
loadDataAsContent(fileContent: Uint8Array): Promise<Document<Metadata>[]> {
throw new Error("Implement me");
}
}
const reader = new SimpleDirectoryReader();
const documents = await reader.loadData({
directoryPath: "../data",
defaultReader: new TextFileReader(),
fileExtToReader: {
...FILE_EXT_TO_READER,
zip: new ZipReader(),
},
});
documents.forEach((doc) => {
console.log(`document (${doc.id_}):`, doc.getText());
});