Skip to main content
can parse the file types listed in the table below. Support depends on the method you use: the Playground, API, or our Python and TypeScript libraries.
File TypePlaygroundAPI/Library
PDF
PDF
up to 100 pages

see Rate Limits
Images
JPEG
JPG
PNG
APNG×
BMP×
DCX×
DDS×
DIB×
GD×
GIF×
ICNS×
JP2 (JP2000)×
PCX×
PPM×
PSD×
TGA×
TIF×
TIFF×
WEBP×
Text Documents
(see notes here)
DOC (Word)
DOCX (Word)
ODT (OpenDocument Text)
Presentations
(see notes here)
PPT (PowerPoint)
PPTX (PowerPoint)
Spreadsheets
(see notes here)

CSV (comma-separated values)
(up to 10 MB)

(up to 50 MB)
XLSX (Microsoft Excel)
(up to 10 MB)

(up to 50 MB)

Password-Protected PDFs

doesn’t support parsing password-protected PDFs. If you attempt to parse a password-protected PDF, the API responds with a 422 error (unprocessable content):
HTTP-422: Failed to split PDF into pages. Ensure it is a valid PDF file. document closed or encrypted

File Conversion for Text Documents and Presentations

converts text documents and presentations to PDFs before parsing them. This conversion may change the document layout and increase or decrease the number of pages. For example, unsupported fonts may be replaced with larger alternatives, causing text to wrap differently or overflow onto additional pages. While the conversion process may impact the layout, still parses content correctly.

Spreadsheet Considerations

supports XLSX files with up to 65,536 rows and 65,536 columns per file. When you load a spreadsheet in the Playground, a render limit applies and only a truncated version of the spreadsheet is displayed. This does not affect the parsing results.