Email Attachment Processing API – Extract and Process Attachments
Process .eml and .msg files with the MaraDocs API. Extract email attachments, validate, and process as images or PDFs. Recursive email-in-email support.
Incoming emails often carry documents as attachments – images, PDFs, sometimes nested .eml or .msg files. At our firm, a single client email could contain a dozen photos of an accident report, a PDF expert opinion, and a forwarded email with its own attachments. An email attachment processing API that extracts, validates, and routes each attachment into the right pipeline would save hours of manual work.
You need to branch images to document extraction or OCR, PDFs to merge or compress – and handle nested emails recursively.
Why Building an Email Attachment Processing Solution Yourself Takes Weeks
If you try to build this yourself, you'll quickly find that parsing .eml and .msg requires mail-parser, extract-msg, or similar. Then you must decode MIME, handle nested messages, extract binaries, detect formats (magika, file signatures), validate for viruses, and branch to image/PDF logic. Each step introduces failure modes: malformed headers, encodings, and nested emails with their own attachments. Building a robust email attachment processing API in-house takes weeks.
How the MaraDocs Email Attachment Processing API Solves This in Minutes
The MaraDocs API validates .eml and .msg files in one call. You receive structured attachment handles – each with type information. From there, branch to image or PDF operations: validate by type, then apply document extraction, OCR, composition, or compression. The API supports recursive processing of emails attached within emails.
Email Attachment Processing Workflow: Upload, Validate, Extract, Route
Upload the email file, call email.validate. The response includes attachment handles – each with content type and metadata. For each attachment, check its type (image, PDF, or nested email). Validate the attachment with img.validate or pdf.validate, then chain into your pipeline: flow.ocrImg for images, pdf.compose to merge PDFs, pdf.optimize for compression. Nested emails can be validated again to extract their attachments recursively. The entire pipeline runs server-side; you pass handles between steps without download and re-upload.
Get your API key in under a minute
Register for a free account and get your API key in under a minute. Of course we'll provide you with some developer credits.
Try MaraDocs API now →Why MaraDocs is Different: Workspaces, Webview, and German Data Privacy
Most document APIs force you to download each attachment, re-upload it elsewhere, and track identities across steps. With MaraDocs, all files – the email and its attachments – live in the same workspace. Validate the email, iterate attachment handles, and pass them to img/pdf/flow endpoints. Handles flow; data stays server-side. No download-re-upload cycles for every attachment.
When a nested email has unexpected structure or an attachment needs manual review, open app.maradocs.io with your workspace secret to inspect and rearrange files. Your users get full manual control when automation hits an edge case.
Processing runs in Germany (Maramia GmbH), with encryption at rest and in transit. Workspaces expire after 7 days. No data leaves the EU. For GDPR-sensitive email processing, this matters.
TypeScript Code for Extracting Email Attachments
API reference: data/upload, email/validate, img/validate, pdf/validate, flow.ocrImg, data/download/pdf
import { MaraDocsClient } from "@maramia/maradocs-sdk-ts";
import { okEmail } from "@maramia/maradocs-sdk-ts/models/email";
import { okImg } from "@maramia/maradocs-sdk-ts/models/img";
import { okPdf } from "@maramia/maradocs-sdk-ts/models/pdf";
const client = new MaraDocsClient({ workspaceSecret: workspace_secret });
// Upload and validate email
const uploaded = await client.data.upload(emailFile);
const validated = await client.email.validate({
unvalidated_file_handle: uploaded.unvalidated_file_handle,
});
const email = okEmail(validated);
const pdfHandles: string[] = [];
for (const att of email.attachments) {
if (att.content_type?.startsWith("image/")) {
const imgVal = await client.img.validate({
unvalidated_file_handle: att.unvalidated_file_handle,
});
const imgHandle = okImg(imgVal);
const pdfHandle = await client.flow.ocrImgHandle(imgHandle);
pdfHandles.push(pdfHandle);
} else if (att.content_type === "application/pdf") {
const pdfVal = await client.pdf.validate({
unvalidated_file_handle: att.unvalidated_file_handle,
});
pdfHandles.push(okPdf(pdfVal));
}
}
// Merge all and download
const composed = await client.pdf.compose({
pdfs: pdfHandles.map((pdf_handle) => ({ pdf_handle })),
});
const blob = await client.data.downloadPdf({ pdf_handle: composed.pdf_handle });
Python Code for Email Attachment Extraction
API reference: data/upload, email/validate, img/validate, pdf/validate, pdf/compose, data/download/pdf
import requests
import time
API_URL = "https://api.maradocs.io/v1"
headers = {"Authorization": f"Bearer {WORKSPACE_SECRET}"}
def poll(url, job_id):
while True:
r = requests.get(f"{url}/{job_id}", headers=headers).json()
if r["status"] == "complete":
return r["response"]["response"]
time.sleep(1)
# 1. Upload and validate email
upload = requests.post(f"{API_URL}/data/upload", headers=headers,
files={"file": ("email.eml", email_bytes, "message/rfc822")}).json()
email_res = requests.post(f"{API_URL}/email/validate", headers=headers,
json={"unvalidated_file_handle": upload["unvalidated_file_handle"]}).json()
email_data = poll(f"{API_URL}/email/validate", email_res["job_id"])
pdf_handles = []
for att in email_data.get("attachments", []):
if att.get("content_type", "").startswith("image/"):
img_res = requests.post(f"{API_URL}/img/validate", headers=headers,
json={"unvalidated_file_handle": att["unvalidated_file_handle"]}).json()
img_handle = poll(f"{API_URL}/img/validate", img_res["job_id"])["img_handle"]
# Then img OCR workflow... (abbreviated)
elif att.get("content_type") == "application/pdf":
pdf_res = requests.post(f"{API_URL}/pdf/validate", headers=headers,
json={"unvalidated_file_handle": att["unvalidated_file_handle"]}).json()
pdf_handles.append(poll(f"{API_URL}/pdf/validate", pdf_res["job_id"])["pdf_handle"])
# Merge and download
compose_res = requests.post(f"{API_URL}/pdf/compose", headers=headers,
json={"pdfs": [{"pdf_handle": h} for h in pdf_handles]}).json()
composed = poll(f"{API_URL}/pdf/compose", compose_res["job_id"])
pdf_resp = requests.get(f"{API_URL}/data/download/pdf", headers=headers,
params={"pdf_handle": composed["pdf_handle"]})
with open("attachments.pdf", "wb") as out:
out.write(pdf_resp.content)
Summary and Next Steps
An email attachment processing API that extracts, validates, and routes attachments is available. MaraDocs handles virus scanning, format detection, and recursive email parsing. Combine with Document Scanner, PDF Handling, and Text Recognition for full automation.
Try it: MaraDocs API | TypeScript SDK
Subscribe to our newsletter
Stay up to date with us and receive the latest news, articles, and resources by email.