eScriptorium is built on a philosophy of openness. We believe that a digital platform should be a bridge, not a barrier. Unlike proprietary tools that lock your hard work into obscure formats or cloud ecosystems, eScriptorium guarantees that you can extract your data and your custom AI models fully, freely, and in standard formats.
Standardized Data Formats
Ready for archives, libraries, and critical editions.
We support the most widely used standards in digital humanities and library science, ensuring your project is compatible with the broader ecosystem of long-term preservation and publishing.
- ALTO XML: The open industry standard for automatic text recognition output, widely used by libraries for searchable collections.
- PageXML: A rich format that preserves the complex geometry of your document, including detailed layout segmentation and reading order.
- METS: Export complete digital objects. METS containers wrap your images, metadata, and transcriptions into a single, cohesive package ideal for digital repository ingestion.
Model Sovereignty
Your training effort is a research output. Keep it.
The time you spend training a model is valuable. In eScriptorium, the models you create are not locked inside the platform.
- Downloadable Intelligence: You can download your trained models at any time.
- Kraken Compatibility: Because eScriptorium is built on kraken, your models are fully compatible with the open-source ecosystem. You can run them on your local command line, share them with colleagues, or publish them on repositories like Zenodo for others to use.
- Preservation: Archive your models alongside your data to ensure your results are reproducible in the future.
Granular Control
Take only what you need.
Whether you need a quick plain-text dump for a corpus analysis or a fully detailed XML structure for a digital edition, eScriptorium gives you control over the output.
- Full Structural Export: Download the complete geometry of the page (regions, lines, baselines) alongside the text.
- Text Only: Rapidly extract raw text for NLP analysis or simple proofreading.
- Transcription Levels: Export specific versions of your transcription depending on your current needs.