PNEUMA-DD · DATA-DRIVEN VARIANT
Same task surface as PNEUMA — entity-type, language, country, gender, parsing — but powered directly by MondoGraph token-frequency statistics. Transparent, debuggable, and shipping in production today. Sub-50ms typical latency. No training, no GPU — just the corpus.
TRY IT — FULL PNEUMA-DD OUTPUT
One call to /api/v1/parse returns language, parse structure, gender posterior, and
per-token country/language distributions — all from MondoGraph in ~25 ms.
WHY DATA-DRIVEN
Most name-understanding tasks have explicit ground-truth in MondoGraph: Andrea is feminine in Italian because 92% of Italian Andreas are women. PNEUMA-DD returns that posterior directly, with the bearer count as evidence. No black box, no hallucination, and every prediction is auditable down to the source rows.
LIVE DEMOS
Same endpoints we use in production. Type a name, see a real result.
Decide whether a string is a person, organization, place, brand, or noise. The gate that filters CRM and KYC pipelines before any downstream work.
RUN DEMO → TASK 02 · NAME PARSINGSplit a full name into given / surname / particle / title slots across 28 naming conventions — Spanish double-surname, Arabic kunya, Icelandic patronymic.
RUN DEMO → TASK 03 · GENDERLocale-aware. Returns a calibrated distribution and refuses on ambiguous evidence. 187 locales × name forms.
RUN DEMO →METHOD NOTE
For an input name n and an optional locale prior
L, PNEUMA-DD computes
P(class | n, L) by direct count over the 556M-row
token_stats table, with Dirichlet smoothing for low-evidence cases and a temperature parameter for
calibration. Convention detection in the parser uses a CRF over locale-conditioned slot priors. For
out-of-corpus tokens, the model falls back to PNEUMA
(in development) or a character-n-gram nearest-neighbor lookup.
EXPLAIN-WHY
Each API response includes a bearers_in_corpus field
and a list of the supporting row counts. Audit a gender call: Andrea, it-IT → feminine 0.97 · 428,392
bearers. KYC and regulated industries can show the regulator the exact evidence used. Try toggling
explain: true in the demos to see the receipts.