Your Gateway to Digital Text Production

eScriptorium isn’t just an ATR tool; it is a comprehensive pipeline designed to sit at the center of your digital ecosystem. We have removed the barriers to entry, making it effortless to bring your archives, manuscripts, and printed collections into a machine-learning workflow.

Feature image

The IIIF Advantage: Interoperability First

eScriptorium is built with the International Image Interoperability Framework (IIIF) at its core. You don’t need to download gigabytes of images from a library just to re-upload them here.

  • Instant Ingestion: Simply paste a manifest URL from Gallica, The Bodleian, The Library of Congress, or any IIIF-compliant archive.
  • Smart Metadata: We automatically fetch and map the document’s metadata, ensuring your text production remains context-aware.
  • Storage Efficient: Stream images directly from the source repository without clogging your local storage.

Flexible Image Handling

From loose JPEGs to bound PDFs.

Whether you are working with high-resolution archival TIFFs or a quick scan in PDF format, eScriptorium handles the pre-processing for you.

  • Drag-and-Drop: Batch upload thousands of loose images (JPG, PNG, TIFF) in a single drag-and-drop action.
  • PDF Extraction: Upload a PDF book, and our engine automatically extracts and processes the individual pages for transcription—no external tools required.
Feature image
Feature image

Seamless Migration & Ground Truth

Switching tools? We’ve got you covered.

Don’t let your previous work go to waste. eScriptorium is designed to serve as the migration destination for projects moving from legacy ATR tools or proprietary platforms.

  • Full Structure Support (METS): Import complex METS packages to ingest entire document structures, preserving the links between images, metadata, and existing transcriptions.
  • Granular XML Imports: Upload individual ALTO or PAGE XML files to instantly populate specific pages with layout segmentation and text.
  • Model-Ready Data: Turn your existing transcriptions into ground truth to immediately train custom models without starting from zero.