Request
Request Body
file
filerequiredPDF file to extract text from using OCR
Maximum 10MB, PDF format only
OCR Technology
Powered by Google Document AI for industry-leading accuracy across 50+ languages with advanced text detection and structure analysis.
curl --request POST \
--url https://pdfmage.app/api/v1/document-ocr \
--header 'Accept: application/json' \
--header 'Authorization: Bearer pk_live_abc123...' \
--form 'file=@/path/to/document.pdf'
const formData = new FormData();
formData.append('file', pdfFile);
const response = await fetch('https://pdfmage.app/api/v1/document-ocr', {
method: 'POST',
headers: {
'Authorization': 'Bearer pk_live_abc123...'
},
body: formData
});
if (response.ok) {
const ocrResult = await response.json();
console.log('Extracted text:', ocrResult.extractedText.fullText);
console.log('Pages processed:', ocrResult.extractedText.pages.length);
} else {
const error = await response.json();
console.error('Error:', error);
}
Response
Response Body
fileName
string - Original filename of processed documentprocessedAt
string (ISO 8601) - Processing completion timestampdocumentId
number - Unique identifier for this OCR operationextractedText
objectComplete text extraction results with metadata
fullText
string - Complete extracted text from all pagespages
array - Per-page text extraction resultsmetadata
object - Processing metadatastructuredElements
objectDocument structure analysis results
textBlocks
number - Number of text blocks detectedparagraphs
number - Number of paragraphs detectedlines
number - Number of text lines detectedtokens
number - Number of individual tokens detected{
"fileName": "contract.pdf",
"processedAt": "2024-01-15T10:30:45.123Z",
"documentId": 12345,
"extractedText": {
"fullText": "Employment Agreement\n\nThis Employment Agreement is entered into between John Doe and Acme Corporation...",
"pages": [
{
"pageNumber": 1,
"text": "Employment Agreement\n\nThis Employment Agreement is entered into between John Doe...",
"confidence": 0.98
},
{
"pageNumber": 2,
"text": "Compensation and Benefits\n\nThe Employee shall receive an annual salary of $75,000...",
"confidence": 0.96
}
],
"metadata": {
"pageCount": 2,
"language": "en-US",
"processingTime": 1247
}
},
"structuredElements": {
"textBlocks": 15,
"paragraphs": 8,
"lines": 42,
"tokens": 387
}
}
HTTP/1.1 200 OK
Content-Type: application/json
X-Credits-Used: 0.02
X-Credits-Remaining: 4.98
X-Credits-Currency: USD
X-Processing-Time: 1247
Error Responses
Bad Request
Invalid file format, missing file, or corrupted PDF
Unauthorized
Invalid or missing API key
Payment Required
Insufficient credit balance
Payload Too Large
File exceeds maximum size limit (10MB)
Unprocessable Entity
PDF contains no extractable text (blank pages, images only)
{
"error": "Bad Request",
"message": "Invalid file format",
"details": {
"code": "INVALID_FILE_FORMAT",
"allowedFormats": ["pdf"],
"receivedFormat": "docx"
},
"timestamp": "2024-01-15T10:30:00Z",
"requestId": "req_abc123"
}
{
"error": "Unprocessable Entity",
"message": "No extractable text found",
"details": {
"code": "NO_TEXT_CONTENT",
"pageCount": 3,
"reason": "Document contains only images"
},
"timestamp": "2024-01-15T10:30:00Z",
"requestId": "req_def456"
}
OCR Capabilities
Multi-Language Support
Supports 50+ languages including English, Spanish, French, German, Chinese, and more.
High Accuracy
Industry-leading OCR accuracy with confidence scores for quality assurance.
Structured Output
Get structured text with paragraph, line, and token-level analysis.
Powered by Google Document AI
Our OCR service uses Google's enterprise-grade Document AI technology, the same technology used by major corporations for critical document processing.
Common Use Cases
Document Digitization
Convert scanned PDFs and image-based documents into searchable, editable text for digital archiving and processing.
Data Extraction
Extract structured data from invoices, contracts, forms, and reports for automated processing workflows.
Content Analysis
Analyze document content for compliance, classification, or information extraction in legal and regulatory contexts.
Multi-Language Processing
Process international documents with automatic language detection and accurate text extraction across multiple languages.
Quality Assurance
Use confidence scores and structured output to validate OCR results and ensure data quality in production systems.
Preprocessing
Prepare documents for form filling or data mapping by extracting clean, structured text content.