Preserving the Past to Shape an Inclusive Digital Future in Serbia

April 24, 2026
Photo credits: UNDP Serbia

Across Serbia, libraries and public institutions safeguard centuries of cultural and historical knowledge. Newspapers, manuscripts, books and official publications document political change, social life and national identity. Yet much of these materials exist only on paper or as scanned images – preserved physically, but largely inaccessible in the digital world.  

For researchers, this means hours of manual searching. For institutions, this limits the ability to scale digitization. For society, this creates a greater risk: as Artificial Intelligence (AI) becomes embedded in everyday public and private services, languages that are not machine-readable risk being left behind.  

While Serbian is spoken by more than 11 million people globally, historical texts written in Serbian Cyrillic are difficult for conventional Optical Character Recognition (OCR) tools to process. Developed and trained primarily for English and other high-resource languages, it is challenging for these systems to process older typographies, mixed alphabets, uneven scans and complex layouts typical of archival material.  

It is not just a technological challenge: it is about whether a language and the people who speak it can fully participate in today’s digital economy and shape an inclusive digital future.  

From digitization to actionable intelligence  

In 2021, UNDP Serbia began collaborating with the local AI ecosystem to respond to this challenge. This included contributing to the design of two national AI strategies, supporting the set up of a National AI Institute with over 50 researchers, and advancing efforts to develop a national AI supercomputer platform.  

With support from the Government of France and the Government of Japan, UNDP Serbia also developed Lorya – an open-source, web-based platform that provides archivists and cultural heritage institutions with access to a unified interface for AI tools that can be used at various stages of document digitization. Lorya supports low-resource and orthographically complex languages, enabling both open-source and proprietary AI models to be plugged in at any stage.   

Recent upgrades supported through UNDP’s Digital X 3.0 have strengthened Lorya’s capabilities. Improved data cleaning and data validation pipelines have increased accuracy, and an enhanced user interface has made the platform more accessible to non-technical users, such as librarians and archivists.  

Lorya is not itself an OCR engine. It is a platform that orchestrates four stages of digitization: improving image quality, analysing layout, recognizing text, and correcting errors after extraction. At each stage, users can choose from open-source, community-built, or proprietary AI models, and retain agency over what kind of systems process their data. The platform was built to reflect the realities of cultural heritage: it tackles older typefaces, mixed scripts, complex page layouts, and degraded scans. Guided by the goal of achieving the highest possible accuracy, Lorya combines the latest AI technologies with deep, contextual expertise from members of the community. As an open-source solution, Lorya enables institutions to move beyond static digitization towards creating reusable datasets that can support research and future AI applications.   

From National Innovation to Global Uptake 

Photo credits: UNDP Serbia

Lorya was designed with global adaptability in mind to expand its application beyond the Serbian context. The platform is scaling through Digital X 3.0, including development of adapters for community-built models, integration of frontier large language models into the post-OCR correction stage, and improvements to cross-language and cross-context performance. 

Built for diverse scripts and writing traditions, the platform has utility for several languages, such as Arabic (more than 400 million speakers), Tamil (around 80 million speakers), Nepali (more than 19 million native speakers), Greek (approximately 13 million speakers), and Georgian (spoken by about 4 million speakers). Together, these represent hundreds of millions of people whose languages are under-represented in AI systems today.  

Following successful pilots and user testing in Serbia, Lorya is preparing deployments in Iraq and Nepal in 2026. This demonstrates its adaptability across distinct linguistic and institutional environments. These efforts are expected to support future scaling in Arabic-speaking countries across the Middle East and Africa, as well as adaptations for the local language scripts of South Asia.   

Why This Matters for Human Security  

Human security is about empowering people to participate fully in society and shape their own futures. The story from Serbia demonstrates how context-specific digital innovation can reinforce human security in practice: expanding access to knowledge while ensuring that technological progress remains grounded in linguistic diversity and local ownership.  

The development of Lorya sits within Serbia’s broader digital transformation efforts, where UNDP has been supporting AI applications across healthcare, transport and energy. In this context, Lorya opens new possibilities for strengthening the data foundations needed for wider AI deployment, ensuring that these systems can better reflect linguistic diversity and be developed and scaled more reliably and responsibly.  

It also points to a broader opportunity: investing in open-source, adaptable digital solutions that enable countries not only to preserve cultural heritage, but to actively shape how their languages and knowledge systems are represented in the digital age. 

Realizing this potential will depend on continued collaboration, including integrating language digitization into national AI strategies, supporting scalable open solutions, and fostering partnerships between technology communities and linguistic groups. Lorya offers a practical starting point: as an open- source solution available through the Digital X catalogue, it provides governments, institutions and partners with a tested model that can be explored, adapted and deployed across different linguistic contexts.  

The authors would like to thank Vid Štimac and Slobodan Marković (UNDP Serbia), Dwayne Carruthers and Xiuzhen Li (UNDP Digital, Innovation and AI Hub) for their contributions to this piece.