One model that understands every proper name.

A hybrid CNN-Transformer for sequence classification and token-level parsing — entity type, language, country, gender, and name structure, all in one forward pass. Designed to run on a single CPU core at production scale.

Read the Onomas-CNN X paper ↗ Use PNEUMA-DD today →

TASK SURFACE

Five tasks, one inference.

PNEUMA produces calibrated outputs for every task that operates on a name string. Each task is independently trained on MondoGraph supervision; the shared encoder amortizes computation.

SEQ CLASSIFY · ENTITY TYPE

Entity type

person / organization / place / brand / noise — calibrated 5-way.

Try with PNEUMA-DD → SEQ CLASSIFY · LANGUAGE

Source language

Predict the most likely source language for a name string across 104 trained languages.

Try with PNEUMA-DD → SEQ CLASSIFY · COUNTRY

Country distribution

Predict bearer-country probabilities across 269 countries with population-weighted priors.

Try with PNEUMA-DD → SEQ CLASSIFY · GENDER

Gender inference

Locale-conditioned. Returns calibrated posterior; refuses on ambiguous evidence.

Try with PNEUMA-DD → TOKEN CLASSIFY · PARSE

Name parsing

Slot tagging for given / surname / particle / title across 28 naming conventions.

Try with PNEUMA-DD → USAGE · RAG

Usage in LLMs

Inject name knowledge into LLM context — pronunciation, transliteration, addressing conventions.

See business case →

ARCHITECTURE

CNN front-end, Transformer body, multi-head output.

The CNN front-end gives the speed of Onomas-CNN X; the Transformer body gives the accuracy of fine-tuned XLM-R. A shared encoder serves five task heads. The model is designed to fit in 200MB and run inference on a single CPU core at 2,500+ names per second.

INPUT

UTF-8 name string, script-tagged

CNN STEM

Parallel depthwise-separable convolution branches (1×3, 1×5, 1×7), 64ch

TRANSFORMER BODY

4 layers, 8 heads, d=256 — attention over CNN-extracted tokens

FIVE HEADS

Entity, language, country, gender, parsing — calibrated outputs

CALIBRATION

Temperature scaling on held-out per locale (ECE < 0.02 target)

Legacy & lineage

JAN 2026 arXiv:2601.11090

Efficient Multilingual Name Type Classification Using Convolutional Networks

Davor Lauc · the CNN-only precursor to PNEUMA

92.1% accuracy 104 languages 2,813 names/sec/CPU 46× faster than XLM-R

Demonstrated that specialized CNN architectures remain competitive with fine-tuned large LMs on focused NLP tasks when sufficient supervision is available. PNEUMA extends this with a Transformer body for tasks where context matters (parsing, gender), and a multi-task head for joint training.

2021 · EACL BSNLP

A Pre-trained Transformer for Croatian, Bosnian, Serbian and Montenegrin

Ljubešić & Lauc · 8B-token Slavic transformer · informs the Transformer body

USE IT TODAY

PNEUMA-DD ships every PNEUMA task in production.

The data-driven variant uses MondoGraph token statistics directly to serve every classification and parsing task. Same API, same input schema, ready today. When PNEUMA reaches Q3 2026 GA, it drops in behind the same endpoint with no client changes.

PNEUMA-DD →

# One call, all five tasks.
curl -X POST https://api.mondonomo.ai/v1/pneuma \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"name": "Eugen Schochenmaier"}'

# →
{
  "entity_type": "person",
  "language":    "de",
  "country":     "DE",
  "gender":      "masculine",
  "parts": [
    {"slot":"given",   "value":"Eugen"},
    {"slot":"surname", "value":"Schochenmaier"}
  ]
}