All posts
|Also available in:DE

Document Scanner App API – Extract Documents from Photos

Build a document scanner app with the MaraDocs API. Automatically detect, crop, and extract documents from smartphone photos – no OpenCV or ML models required. (With Code Examples)

Martin Kurtz
APIDocument ProcessingScannerPDFDeveloper
Document Scanner App API – Extract Documents from Photos

Do you deal with documents sent to your company as photos attached to emails? At our founders' law firm, this was a non-negligible problem tying up manual time. Clients photograph invoices, accident reports, and contracts on the kitchen table and hit send. You end up with skewed JPGs that need cropping, perspective correction, and conversion to PDF before filing.

A document scanner app API that detects the document boundary, corrects perspective, and outputs a crisp PDF would save hours. It sounds simple – but building it yourself rarely is.

Why Building a Document Scanner App Yourself Takes Weeks

If you try to build a document extraction solution in-house, you'll quickly find yourself reaching for OpenCV for edge detection and perspective transforms, Tesseract or a cloud OCR for text recognition, PyMuPDF or reportlab for PDF generation, and Pillow for image handling. You might add a document-detection model (e.g. layout transformers or SAM-based segmenters) for robust detection. Each piece works in isolation; wiring them into a reliable pipeline with proper error handling, virus scanning, and format validation takes weeks. Machine learning models require GPU infrastructure, model hosting, and maintenance. Most developers quickly realize they need more than a weekend hack.

How the MaraDocs Document Scanner API Solves This in Minutes

The MaraDocs Document Processing API turns that workflow into a few API calls. Same upload, validation, and chained processing – but without the integration headache. Upload a photo, validate it (virus scan and format checks included), detect documents in the image, extract and perspective-correct each one, and convert to PDF – all through a single REST API. No ML infrastructure, no Python dependency hell.

Document Scanner Workflow: Upload, Validate, Extract, Convert

Every MaraDocs workflow follows the same pattern: upload a file, validate it (virus scan + format validation), then chain operations. Validation is mandatory and happens before any processing. If a file is infected or corrupted, you get a clear error – no processing of untrusted data.

After validation, you chain operations by passing handles. For a document scanner flow: validate the image, call findDocuments to get quadrilateral coordinates, call extractQuadrilateral for each detected document, then convert to PDF with toPdf or ocrToPdf for searchable output. The entire pipeline stays server-side; you never re-upload the same file.

Get your API key in under a minute

Register for a free account and get your API key in under a minute. Of course we'll provide you with some developer credits.

Try MaraDocs API now →

Why MaraDocs is Different: Workspaces, Webview, and German Data Privacy

Most document APIs force you to upload, download, then re-upload for each processing step. That means extra round-trips, more code to track file identities, and higher latency. MaraDocs uses workspaces instead: your server creates a workspace with a secret key and receives a workspace_secret. The client (browser or backend) uses this token for all operations. Files stay server-side; you pass handles between steps. You chain validate → find → extract → toPdf → download with handles flowing through – no re-upload, fewer network calls, simpler code.

MaraDocs Workspaces are a superior concept.
MaraDocs Workspaces are a superior concept.

Sometimes automation hits an edge case: a tricky angle, multiple documents in one photo, or a format that needs manual adjustment. With MaraDocs, you can open app.maradocs.io and use the workspace secret to view, rearrange, and edit files directly. Your users get full manual control when the pipeline needs a human touch – a rare advantage over APIs that only offer programmatic access.

All processing runs on servers in Germany, under the control of Maramia GmbH. Data is encrypted at rest (SSE-C) and in transit (TLS). Workspaces expire after 7 days. No data leaves the EU. If you need a DPA, one is available. For GDPR- and BDSG-sensitive workloads, this matters.

TypeScript Code for Extracting Documents from Photos

The MaraDocs TypeScript SDK handles polling for async jobs. Full flow with upload, validate, process, and download.

API reference: workspace, data/upload, img/validate, img/find/documents, img/extract/quadrilateral, img/ocr/to/pdf, data/download/pdf

import { MaraDocsServer, MaraDocsClient } from "@maramia/maradocs-sdk-ts";
import { okImg } from "@maramia/maradocs-sdk-ts/models/img";

// Server: create workspace
const server = new MaraDocsServer({ secretKey: process.env.MARADOCS_SECRET_KEY! });
const { workspace_secret } = await server.workspace.create({});

// Client: upload, validate, find documents, extract, OCR, download
const client = new MaraDocsClient({ workspaceSecret: workspace_secret });
const uploaded = await client.data.upload(imageFile);
const validated = await client.img.validate({
  unvalidated_file_handle: uploaded.unvalidated_file_handle,
});
const imgHandle = okImg(validated);

const docs = await client.img.findDocuments({ img_handle: imgHandle });
if (docs.documents.length > 0) {
  const extracted = await client.img.extractQuadrilateral({
    img_handle: imgHandle,
    quadrilateral: docs.documents[0].quadrilateral,
  });
  const pdf = await client.img.ocrToPdf({
    img_handle: extracted.img_handle,
  });
  const blob = await client.data.downloadPdf({ pdf_handle: pdf.pdf_handle });
}

Or use the high-level flow.ocrImg for a full pipeline (upload, validate, find, extract, orient, OCR, optimize) in one call:

const pdfHandle = await client.flow.ocrImg(imageFile);
const blob = await client.data.downloadPdf({ pdf_handle: pdfHandle });

Python Code for Document Extraction

API reference: data/upload, img/validate, img/find/documents, img/extract/quadrilateral, img/ocr/to/pdf, data/download/pdf

import requests
import time

API_URL = "https://api.maradocs.io/v1"
WORKSPACE_SECRET = "..."  # from your server
headers = {"Authorization": f"Bearer {WORKSPACE_SECRET}"}

def poll(job_url, job_id):
    while True:
        r = requests.get(f"{job_url}/{job_id}", headers=headers).json()
        if r["status"] == "complete":
            return r["response"]["response"]
        time.sleep(1)

# 1. Upload
with open("photo.jpg", "rb") as f:
    upload = requests.post(f"{API_URL}/data/upload", headers=headers,
        files={"file": ("photo.jpg", f, "image/jpeg")}).json()
handle = upload["unvalidated_file_handle"]

# 2. Validate
val = requests.post(f"{API_URL}/img/validate", headers=headers,
    json={"unvalidated_file_handle": handle}).json()
img_handle = poll(f"{API_URL}/img/validate", val["job_id"])["img_handle"]

# 3. Find documents, 4. Extract, 5. OCR to PDF (simplified; poll each job)
find_res = requests.post(f"{API_URL}/img/find/documents", headers=headers,
    json={"img_handle": img_handle}).json()
find_data = poll(f"{API_URL}/img/find/documents", find_res["job_id"])
if find_data.get("documents"):
    quad = find_data["documents"][0]["quadrilateral"]
    ext_res = requests.post(f"{API_URL}/img/extract/quadrilateral", headers=headers,
        json={"img_handle": img_handle, "quadrilateral": quad}).json()
    extracted = poll(f"{API_URL}/img/extract/quadrilateral", ext_res["job_id"])
    ocr_res = requests.post(f"{API_URL}/img/ocr/to/pdf", headers=headers,
        json={"img_handle": extracted["img_handle"]}).json()
    ocr_data = poll(f"{API_URL}/img/ocr/to/pdf", ocr_res["job_id"])
    pdf_handle = ocr_data["pdf_handle"]

    # 6. Download
    pdf_resp = requests.get(f"{API_URL}/data/download/pdf", headers=headers,
        params={"pdf_handle": pdf_handle})
    with open("output.pdf", "wb") as out:
        out.write(pdf_resp.content)

Summary: Document Scanner API Without the Integration Headache

A document scanner app API that extracts documents from photos, corrects perspective, and outputs PDFs is within reach. With MaraDocs, you avoid weeks of integrating OpenCV, Tesseract, and ML models. You get validation, workspaces, webview, and German data residency out of the box.

For more use cases, see PDF Handling – Combine, Split, Auto-Rotate, Auto-Rotation and Orientation Detection, Text Recognition (OCR), and Place Image or PDF on a Blank DIN A4 Page.


Try it: MaraDocs API | TypeScript SDK


Subscribe to our newsletter

Stay up to date with us and receive the latest news, articles, and resources by email.