MONDONOMO · NAME-UNDERSTANDING INFRASTRUCTURE
Nelma is the research surface for Mondonomo's name infrastructure — the data foundation, the models, and the production deployments. One knowledge graph. Two model families. Six business cases. Open demos, open papers.
99% of language vocabulary is names — and most of it sits outside general LLMs. A defensible moat: 126B name attestations, the world's largest proper-name graph, applied against the $12.9M / org / yr cost of bad name data (Gartner).
Investor inquiry →Pilot the API at scale — KYC vendors, CDPs, identity platforms, LLM tooling. Co-built integrations, dedicated rate limits, joint go-to-market on cultural-data wins.
Request a pilot →One REST API replaces a stack of regex, lookup tables, and brittle LLM prompts. Generous free tier, sub-50ms latency, copy-paste from every demo page.
Try the API →Four peer-reviewed papers, Hugging Face weights, reproducible eval scripts. Active collaboration with University of Zagreb and Chulalongkorn.
Read the papers →TEST OUR MODELS
One call to /api/v1/parse returns language, parse structure, gender posterior, and per-token country and language distributions — all from MondoGraph in ~25 ms. Powered by PNEUMA-DD, our production data-driven model.
THE MOAT
Proper names are almost entirely outside the reach of foundation models — and the cost of getting them wrong shows up in every fraud loss, every unsubscribed marketing email, and every voice-agent mispronunciation. Mondonomo's stack closes that gap.
99%
Personal, place, organization, and brand names make up almost all of any language's running vocabulary — and most of it sits outside the reach of general LLMs. Names are the long tail that matters.
170M+
Over a decade of academic and industrial research. Every name is tied to scripts, languages, prevalence per country and region, frequency, gender distribution, etymology, and known bearers.
150+
One name can appear as 尤金, Юджин, ユージン, يوجين, or Eugen. Our models cross the scripts so your systems don't have to.
WHERE IT LANDS
Every demo on this page maps to a paying use case. The full list — KYC and sanctions, customer onboarding, CRM deduplication, localized marketing, civil and healthcare record linkage, LLM enrichment — lives on the business cases page. Three highlights below.
Cross-script name matching for sanctions and AML — يوسف بن أحمد = Yousef Ben Ahmed = Yusef Bin Achmed, with every step auditable.
One free-text name field, four model calls, a structured culturally-correct record. Built for markets where First/Middle/Last doesn't fit — Spain, Brazil, Arabic-speaking, East Asia.
One /enrich tool-call returns type, gender, country, transliterations, pronunciation and known bearers. Built for voice agents and multilingual assistants.
THE STACK
Every tab below is its own page. Each model page links the legacy research that powers it today and the production deployments that pay for it.
The world's biggest, most complete, most accurate proper-name knowledge graph. 126B attestations across 269 countries and 2,410 language codes. Etymology, prevalence, romanizations, soundalikes — modelled as first-class edges.
Proper Names Understanding Model — a hybrid CNN-Transformer for sequence classification (entity type · language · country · gender) and token-level parsing. The hybrid model is in development; the CNN precursor Onomas-CNN X is published.
Same task surface as PNEUMA, driven directly by MondoGraph token-frequency statistics. Sub-50ms latency, no GPU, every prediction explainable to the source rows. Shipping today.
Universal proper-name to proper-name mapping — Transformer + WFST ensemble for romanization, deromanization, phonetic transcription, G2P and P2G. PolyIPA and AyutthayaAlpha ship in production today.
Quantitative onomastics at civilizational scale. The fuzzy-set similarity paper, the Named by God biblical project (2,365 names, 7 published volumes), and Formalised Etymology (11.3M form nodes, in progress).
Six production deployments — KYC and sanctions, Smart Forms onboarding, CRM deduplication, localized marketing, civil / healthcare record linkage, LLM enrichment via RAG. Each one shows the model composition and the metric that moves.
For investors, technical founders, NLP research groups, and customer-data teams. We co-author, we integrate, and we ship. Generous free tier on the API; commercial licensing for the full graph; collaboration on new languages.
THE THESIS
If a name is not proper, language is not in accordance with the truth of things. If language be not in accordance with the truth of things, affairs cannot be carried on to success.
孔夫子 (Confucius), The Analects
PEER-REVIEWED & PRE-PRINT