PNEUMA · PROPER NAMES UNDERSTANDING MODEL
A hybrid CNN-Transformer for sequence classification and token-level parsing — entity type, language, country, gender, and name structure, all in one forward pass. Designed to run on a single CPU core at production scale.
TASK SURFACE
PNEUMA produces calibrated outputs for every task that operates on a name string. Each task is independently trained on MondoGraph supervision; the shared encoder amortizes computation.
person / organization / place / brand / noise — calibrated 5-way.
Try with PNEUMA-DD → SEQ CLASSIFY · LANGUAGEPredict the most likely source language for a name string across 104 trained languages.
Try with PNEUMA-DD → SEQ CLASSIFY · COUNTRYPredict bearer-country probabilities across 269 countries with population-weighted priors.
Try with PNEUMA-DD → SEQ CLASSIFY · GENDERLocale-conditioned. Returns calibrated posterior; refuses on ambiguous evidence.
Try with PNEUMA-DD → TOKEN CLASSIFY · PARSESlot tagging for given / surname / particle / title across 28 naming conventions.
Try with PNEUMA-DD → USAGE · RAGInject name knowledge into LLM context — pronunciation, transliteration, addressing conventions.
See business case →ARCHITECTURE
The CNN front-end gives the speed of Onomas-CNN X; the Transformer body gives the accuracy of fine-tuned XLM-R. A shared encoder serves five task heads. The model is designed to fit in 200MB and run inference on a single CPU core at 2,500+ names per second.
Davor Lauc · the CNN-only precursor to PNEUMA
Demonstrated that specialized CNN architectures remain competitive with fine-tuned large LMs on focused NLP tasks when sufficient supervision is available. PNEUMA extends this with a Transformer body for tasks where context matters (parsing, gender), and a multi-task head for joint training.
Ljubešić & Lauc · 8B-token Slavic transformer · informs the Transformer body
USE IT TODAY
The data-driven variant uses MondoGraph token statistics directly to serve every classification and parsing task. Same API, same input schema, ready today. When PNEUMA reaches Q3 2026 GA, it drops in behind the same endpoint with no client changes.