POST$0.01/page

https://pdfmage.app/api/v1/document-ocr

Document OCR

Extract structured text and data from PDF documents using advanced OCR technology. Get comprehensive text analysis with spatial coordinates, confidence scores, and hierarchical document structure.

Request

multipart/form-data

Request Body

filefilerequired

PDF file to extract text from using OCR

Maximum 10MB, PDF format only

OCR Technology

Industry-leading accuracy across 50+ languages with advanced text detection and structure analysis.

Request Example

curl --request POST \
  --url https://pdfmage.app/api/v1/document-ocr \
  --header 'Accept: application/json' \
  --header 'Authorization: Bearer pk_live_abc123...' \
  --form 'file=@/path/to/document.pdf'

Response

application/json

Response Body

fileNamestring - Original filename of processed document

processedAtstring (ISO 8601) - Processing completion timestamp

documentIdnumber - Unique identifier for this OCR operation

extractedTextobject

Complete text extraction with spatial data

fullTextstring - Complete extracted text from all pages

pagesarray - Comprehensive per-page data

• pageNumber: number - Page number (1-based)

• dimensions: object - Page dimensions (width, height, unit)

• text: string - Extracted text for this page

• confidence: number - Average confidence score (0-1)

• blocks: array - Text blocks with bounding boxes

• paragraphs: array - Paragraphs with bounding boxes

• lines: array - Text lines with bounding boxes

• tokens: array - Individual tokens with bounding boxes

metadataobject

Processing metadata and statistics

pageCountnumber - Total pages processed

processingTimeMsnumber - Processing time in milliseconds

languagestring - Document language (default: 'en')

elementCountsobject - Count of detected elements

• blocks: number - Text blocks detected

• paragraphs: number - Paragraphs detected

• lines: number - Text lines detected

• tokens: number - Individual tokens detected

Element Structureformat

Each text element (block, paragraph, line, token) contains:

textstring - The extracted text content

boundingBoxobject - Normalized coordinates (0-1 range)

• vertices: array - Four corner points [{'x, y}]

• normalized: boolean - Always true (coordinates are normalized)

confidencenumber - OCR confidence score (0-1)

Success Response Example

{
  "fileName": "contract.pdf",
  "processedAt": "2024-01-15T10:30:45.123Z",
  "documentId": 12345,
  "extractedText": {
    "fullText": "Employment Agreement\n\nThis Employment Agreement is entered into between John Doe and Acme Corporation...",
    "pages": [
      {
        "pageNumber": 1,
        "dimensions": {
          "width": 612,
          "height": 792,
          "unit": "px"
        },
        "text": "Employment Agreement\n\nThis Employment Agreement is entered into between John Doe...",
        "confidence": 0.98,
        "blocks": [
          {
            "text": "Employment Agreement",
            "boundingBox": {
              "vertices": [
                {"x": 0.35, "y": 0.08},
                {"x": 0.65, "y": 0.08},
                {"x": 0.65, "y": 0.12},
                {"x": 0.35, "y": 0.12}
              ],
              "normalized": true
            },
            "confidence": 0.99
          },
          {
            "text": "This Employment Agreement is entered into between John Doe and Acme Corporation...",
            "boundingBox": {
              "vertices": [
                {"x": 0.1, "y": 0.15},
                {"x": 0.9, "y": 0.15},
                {"x": 0.9, "y": 0.25},
                {"x": 0.1, "y": 0.25}
              ],
              "normalized": true
            },
            "confidence": 0.97
          }
        ],
        "paragraphs": [...],
        "lines": [...],
        "tokens": [...]
      }
    ]
  },
  "metadata": {
    "pageCount": 2,
    "processingTimeMs": 1247,
    "language": "en",
    "elementCounts": {
      "blocks": 15,
      "paragraphs": 28,
      "lines": 142,
      "tokens": 487
    }
  }
}

Response Headers

HTTP/1.1 200 OK
Content-Type: application/json
X-Credits-Used: 0.02
X-Credits-Remaining: 4.98
X-Credits-Currency: USD
X-Processing-Time: 1247

Error Responses

400

Bad Request

Invalid file format, missing file, or corrupted PDF

401

Unauthorized

Invalid or missing API key

402

Payment Required

Insufficient credit balance

413

Payload Too Large

File exceeds maximum size limit (10MB)

422

Unprocessable Entity

PDF contains no extractable text (blank pages, images only)

Error Response Example

{
  "error": "Bad Request",
  "message": "Invalid file format", 
  "details": {
    "code": "INVALID_FILE_FORMAT",
    "allowedFormats": ["pdf"],
    "receivedFormat": "docx"
  },
  "timestamp": "2024-01-15T10:30:00Z",
  "requestId": "req_abc123"
}

Processing Error Example

{
  "error": "Unprocessable Entity",
  "message": "No extractable text found",
  "details": {
    "code": "NO_TEXT_CONTENT",
    "pageCount": 3,
    "reason": "Document contains only images"
  },
  "timestamp": "2024-01-15T10:30:00Z", 
  "requestId": "req_def456"
}