MONDONOMO · NAME-UNDERSTANDING INFRASTRUCTURE

Names are 99% of language.
We built the stack that gets them right.

Nelma is the research surface for Mondonomo's name infrastructure — the data foundation, the models, and the production deployments. One knowledge graph. Two model families. Six business cases. Open demos, open papers.

126,000,000,000 name attestations · 269 countries · 2,410 language codes · 4 peer-reviewed papers
Explore the stack → See business cases
Member of NVIDIA Inception · Microsoft for Startups · Google for Startups · AWS Activate

For investors

99% of language vocabulary is names — and most of it sits outside general LLMs. A defensible moat: 126B name attestations, the world's largest proper-name graph, applied against the $12.9M / org / yr cost of bad name data (Gartner).

Investor inquiry

For partners

Pilot the API at scale — KYC vendors, CDPs, identity platforms, LLM tooling. Co-built integrations, dedicated rate limits, joint go-to-market on cultural-data wins.

Request a pilot

For founders

One REST API replaces a stack of regex, lookup tables, and brittle LLM prompts. Generous free tier, sub-50ms latency, copy-paste from every demo page.

Try the API

For NLP researchers

Four peer-reviewed papers, Hugging Face weights, reproducible eval scripts. Active collaboration with University of Zagreb and Chulalongkorn.

Read the papers

TEST OUR MODELS

Type a full name. Watch the whole stack run.

One call to /api/v1/parse returns language, parse structure, gender posterior, and per-token country and language distributions — all from MondoGraph in ~25 ms. Powered by PNEUMA-DD, our production data-driven model.

Submits to PNEUMA-DD — opens model page with full result.
Input processed in-memory for inference — not stored, not logged with identifying content, not used for training. Data handling.
Try:

THE MOAT

The piece every general-purpose LLM gets wrong.

Proper names are almost entirely outside the reach of foundation models — and the cost of getting them wrong shows up in every fraud loss, every unsubscribed marketing email, and every voice-agent mispronunciation. Mondonomo's stack closes that gap.

99%

of vocabulary is names.

Personal, place, organization, and brand names make up almost all of any language's running vocabulary — and most of it sits outside the reach of general LLMs. Names are the long tail that matters.

170M+

names, 10B+ data points.

Over a decade of academic and industrial research. Every name is tied to scripts, languages, prevalence per country and region, frequency, gender distribution, etymology, and known bearers.

150+

writing systems.

One name can appear as 尤金, Юджин, ユージン, يوجين, or Eugen. Our models cross the scripts so your systems don't have to.

WHERE IT LANDS

Three of the six production deployments.

Every demo on this page maps to a paying use case. The full list — KYC and sanctions, customer onboarding, CRM deduplication, localized marketing, civil and healthcare record linkage, LLM enrichment — lives on the business cases page. Three highlights below.

See all six business cases

THE STACK

One graph. Two model families. One applied surface.

Every tab below is its own page. Each model page links the legacy research that powers it today and the production deployments that pay for it.

Data foundation
01 · KNOWLEDGE GRAPH

MondoGraph

The world's biggest, most complete, most accurate proper-name knowledge graph. 126B attestations across 269 countries and 2,410 language codes. Etymology, prevalence, romanizations, soundalikes — modelled as first-class edges.

EXPLORE THE GRAPH
In development
02 · UNDERSTANDING MODEL

PNEUMA

Proper Names Understanding Model — a hybrid CNN-Transformer for sequence classification (entity type · language · country · gender) and token-level parsing. The hybrid model is in development; the CNN precursor Onomas-CNN X is published.

READ THE ROADMAP
Production
03 · DATA-DRIVEN VARIANT

PNEUMA-DD

Same task surface as PNEUMA, driven directly by MondoGraph token-frequency statistics. Sub-50ms latency, no GPU, every prediction explainable to the source rows. Shipping today.

TRY THE DEMOS
Preview · components live
04 · NAME-MAPPING ENSEMBLE

MondoPhon

Universal proper-name to proper-name mapping — Transformer + WFST ensemble for romanization, deromanization, phonetic transcription, G2P and P2G. PolyIPA and AyutthayaAlpha ship in production today.

SEE THE ENSEMBLE
Research surface
05 · NAME SCIENCE

Onomastics

Quantitative onomastics at civilizational scale. The fuzzy-set similarity paper, the Named by God biblical project (2,365 names, 7 published volumes), and Formalised Etymology (11.3M form nodes, in progress).

SEE THE RESEARCH
Applied
06 · BUSINESS CASES

Business cases

Six production deployments — KYC and sanctions, Smart Forms onboarding, CRM deduplication, localized marketing, civil / healthcare record linkage, LLM enrichment via RAG. Each one shows the model composition and the metric that moves.

SEE THE CASES
Pitch
07 · WORK WITH US

Talk to research.

For investors, technical founders, NLP research groups, and customer-data teams. We co-author, we integrate, and we ship. Generous free tier on the API; commercial licensing for the full graph; collaboration on new languages.

EMAIL US

THE THESIS

If a name is not proper, language is not in accordance with the truth of things. If language be not in accordance with the truth of things, affairs cannot be carried on to success.

孔夫子 (Confucius), The Analects

PEER-REVIEWED & PRE-PRINT

Publications & datasets

JAN 2026

Efficient Multilingual Name Type Classification Using Convolutional Networks

Davor Lauc · arXiv:2601.11090 · the Onomas-CNN X precursor to PNEUMA

92.1% accuracy · 104 languages · 2,813 names/sec · 46× faster than fine-tuned XLM-RoBERTa
arXiv ↗
DEC 2024

PolyIPA — Multilingual Phoneme-to-Grapheme Conversion Model

Davor Lauc · arXiv:2412.09102 · the P2G shipped today inside MondoPhon

Mean CER 0.055 · char-BLEU 0.914 · top-3 beam reduces effective CER by 52.7% (to 0.026)
arXiv ↗
DEC 2024

AyutthayaAlpha — A Thai-Latin Script Transliteration Transformer

Davor Lauc, Attapol Rutherford (Chulalongkorn), Weerin Wongwarawipatr · arXiv:2412.03877

82.32% first-token · 95.24% first-three-token · CER 0.0047 · 2.7M Thai-Latin pairs
arXiv ↗
MAR 2024

Navigating Linguistic Similarities Among Countries Using Fuzzy Sets of Proper Names

Davor Lauc · Names, vol. 72 · a fuzzy-set similarity measure over proper-name sets, applied to language and country comparison

Phonetic commonality of forenames correlates with broader language similarity
Journal ↗
2024

NomoGraph DB / MondoGraph — A Knowledge Graph of Personal Names

Lauc · the substrate behind every Nelma model · 170M+ names, 10B+ data points, 126B token attestations

269 countries · 2,410 language codes · 53M unique given-name forms · 49M unique surnames
Dataset
2023

Handbook of Top Thai Names

Attapol Rutherford (Chulalongkorn) & Mondonomo AI · the corpus behind AyutthayaAlpha

16,000+ Romanized forms · RTGS transcriptions · AI-verified variants
Book
2021 · EACL BSNLP

A Pre-trained Transformer for Croatian, Bosnian, Serbian and Montenegrin

Nikola Ljubešić, Davor Lauc · transformer pre-trained on 8B tokens of crawled web text · evaluated on POS, NER, geo-location, commonsense causal reasoning

Foundation for the Slavic-language NER work that informed PNEUMA
EACL ↗