Transkribus is transforming scholarship in the archives

A spinout formed by a team including Professor Melissa Terras is using AI-powered handwritten text recognition (HTR) to give researchers, institutions and the public unprecedented access to written records of global cultural heritage.

Libraries and archives around the world house a treasure trove of handwritten texts ranging from literary works and political essays to census results, medical files, and meteorological reports. This information, while of great historical and cultural importance, is at risk of languishing on the shelves and being unusable if these documents are not transcribed – a process that has typically been laborious, time-consuming and expensive. But bringing our written heritage into today’s digital world is exactly what Transkribus does, and with ease.

Origins

What is now Transkribus originated out of an EU-funded Recognition and Enrichment of Archival Documents (READ) research project by a consortium of leading research groups from all over Europe, coordinated by Dr Günter Mühlberger of the University of Innsbruck, Austria. The convergence of several computational developments has resulted in vast improvements in the recognition of handwritten historical documents, including: statistical advances in the 1980s; advanced pattern recognition combined with artificial intelligence in the 1990s; the development of deep neural network approaches in the 2000s and 2010s; the availability of increased computer processing power, and, very importantly, the availability of large datasets, i.e., scanned images. Transkribus has seized upon, and contributed to, the opportunities afforded by these developments to create a unique platform and user community. It is maintained and developed by READ-COOP SCE, which is chaired by Dr Mühlberger, Innsbruck colleague Dr Andy Stauder, who acts as CEO, and Professor Melissa Terras of the University of Edinburgh.

Community

Transkribus allows users to create “ground truth” data that is suitable for machine learning. From submitted images and transcripts, the HTR engines learn to decipher handwritten or printed text from digital images and can then automatically generate transcripts of similar material. The interests of the different user groups - archivists, humanities scholars, computer scientists and members of the public – overlap, with each making a vital contribution to the platform’s infrastructure. Memory institutions, humanities scholars and the public provide digitised images and transcripts as ground truth for HTR training, whilst computer scientists deliver the necessary research and implementation work to sustain and develop this technology. Each and every contribution improves Transkribus, making it into a more accurate and powerful tool.

Access was a key consideration for Professor Terras, Dr Mühlberger and Dr Stauder, as they were keen to provide a platform that is affordable and open to all once innovation funding came to an end. They established Transkribus as a cooperative (READ-COOP SCE), which means that the company can not only keep the infrastructure operational, develop further tools and services, and provide a high standard of service to its users for a reasonable fee, but also give back to the community through discounts, and by giving free assistance and support to students and Early Career researchers. Most importantly, Transkribus connects its community and facilitates data sharing, which forms the basis of the aforementioned data sets that are crucial for AI training.

Where past and future meet

Today, Transkribus is a comprehensive platform for the digitisation, AI-powered text recognition, transcription and searching of historical documents – from any place, any time, and in any language. Smart search technology can find words in a collection, and even recognise and retrieve results for words where there are historical or personal variations in spelling. The platform now employs over 20 people, has over 100,000 users and over 130 members worldwide that joined the co-operative, including the British Library, the National Library of Scotland and many international libraries and archives, who together have transformed accessibility to our historical past. As of January 2023, 43 million images of handwritten texts have been uploaded to the system for transcription. In 2020 Transkribus was named winner of the Horizon Impact Award, which honours EU-funded projects that have had a societal impact across Europe and beyond.

Handwritten Text Recognition is a mature AI technology in the library and archive space: it works and it's here for us to engage with. The University of Edinburgh is at the forefront in terms of being able to offer HTR to its staff and students because of the relationship that we have through my involvement with Transkribus. Our ongoing research here is helping understand the benefits that AI can bring to the cultural heritage sector. I’m grateful to be part of this team effort, and hugely proud of what the cooperative has achieved together.

Feeling inspired?

  1. Start your innovation journey with an EI event or training course
  2. Keep up to date with the latest opportunities in our newsletter Impact Insider
  3. Contact us to explore the best options for you with an EI expert