Lorya: Including underrepresented languages in the AI revolution
Today’s AI language technology excels in English and only a few other major languages because most under-resourced languages lack sufficient clean, machine-readable text data to train AI systems effectively. As a result, large volumes of cultural and historical materials remain inaccessible, locked in images and scans. Traditional optical character recognition (OCR) tools rarely perform well on complex scripts, older typographies, or mixed orthographies, leaving many communities unable to digitize their heritage at scale. Recognizing the need for a solution that could be successfully used globally, UNDP is building on the solution initially designed for the Serbian language. This initiative is funded with 95,000 USD by the Government of France - Ministry for Europe and Foreign Affair.
What is Lorya?
It is a digital tool that turns written cultural heritage from print - into clean, machine-readable text that can be used to train AI language tools in local languages.
It helps Serbian and other under-represented language [1] speakers to leverage existing cultural and historical resources to take part in the global AI revolution.
It makes information from sources of cultural and historical significance, including old books, newspapers, and manuscripts - more accessible to researchers, historians, students and the public – for study, research and creating new products and services.
Our vision
To help include more languages in the AI technology revolution by developing a digital platform that local teams worldwide can easily and affordably adapt to ensure their own written cultural heritage is incorporated and accessible to all.
For more information on project activities, click here.
[1] Under-represented languages in this context are languages that have limited digital presence - meaning they lack sufficient online content, annotated datasets, or linguistic resources - making it difficult for AI systems to learn and perform well in those languages.