vellum

Features

Six extraction modes. One unified output.

Every Vellum extraction returns the same JSON shape regardless of input — form, table, scan, multilingual mix. Your downstream code doesn’t care which mode ran.

Form-aware extraction

W-9, 1040, ACORD claims forms, vendor onboarding packets. Fields snap to your schema even when layouts drift.

  • Pre-built schemas for 200+ common forms
  • Custom schemas via JSON-Schema or Zod
  • Active learning from corrections

Tables across pages

Multi-page tables, nested rows, header propagation. Preserves the cell-to-row relationships your downstream system needs.

  • Page-spanning tables stitched in one row
  • Nested headers preserved (region → quarter → month)
  • CSV / parquet / nested JSON output

Signatures, stamps, dates

Signature presence, signer name, stamp text, notary marks, dated approvals — all extracted with bounding boxes.

  • Signature image + signer name extraction
  • Notary stamp text recognition
  • Date overlap with adjacent fields surfaced

Multilingual OCR

Mixed scripts in one document — English contract with Mandarin appendix and Arabic notary stamp. Output normalized to your locale.

  • 32 languages supported in production
  • Original-language source preserved
  • Currency + date format normalization

Bulk pipelines

Drop a 10K-document folder. Vellum fans it out across pods, fires per-doc webhooks, and rolls up failures into a triage queue.

  • S3 / Azure Blob / GCS triggers
  • Webhook + queue per pipeline
  • Failure triage UI with one-click reprocess

PII / PHI safety

Detect → redact → log per pipeline. Tenant-isolated extraction means your documents never train shared models.

  • 14 PII categories detected by default
  • Custom regex / Presidio policies supported
  • Field-level audit log (who saw what, when)

How it works

Drop a doc. Get JSON. Trust the trace.

Four steps. The first three are sub-second; the fourth runs async and emits a webhook when it’s done.

  1. Step 01

    Detect mode

    Vellum auto-detects whether to run form / table / signature / multilingual mode. Override per pipeline.

  2. Step 02

    Extract + verify

    Field-level confidence. Below your threshold? Auto-routed to human-in-the-loop queue.

  3. Step 03

    Sign the trace

    Every field gets a bounding box reference. Output is signed with a content hash for the audit log.

  4. Step 04

    Deliver

    JSON to your webhook, row to your warehouse, record to your ERP. Pick one or fan out to many.

Integrations

Lands in the system you already use.

Vellum is the layer between your documents and your stack. Every extraction can write to multiple destinations in parallel.

NetSuite

Bi-directional invoice + PO sync

Workday

AP automation, expense routing

Zapier

5,000+ apps via Vellum action

Snowflake

Direct table write, change feed

Salesforce

Contract intake → opportunity

REST + webhooks

OpenAPI spec, signed events

Try it on the doc that’s ruining your week.

Drop a PDF, watch it become structured data, copy the JSON. Three minutes from open to wow.

100 free pages / month, forever