MODEL 01 · MULTILINGUAL G2P + P2G

PolyIPA

Powered by MondoPhone · P2G shipping today

A ByT5-based seq2seq model trained to convert names between their written form (grapheme) and pronunciation (IPA), across more than 100 languages and writing systems. PolyIPA is the production P2G component inside the MondoPhone ensemble (Transformer + WFST) currently in development. The companion models IPA2vec and similarIPA power the soundalike search downstream.

Paper (arXiv 2412.09102) ↗ Hugging Face ↗

Try it — grapheme → IPA

Enter a name in any script. The model returns its IPA pronunciation in the source language (or, when language is ambiguous, the most-likely reading).

EXAMPLES:
Enter a name above and press Convert.

Try it — IPA → graphemes (top-3 with beam search)

Given an IPA transcription and a target language, return three candidate spellings ranked by likelihood. Beam search with 90 beams; top-3 reduces effective CER by 52.7% (to 0.026) according to the paper.

EXAMPLES:
Enter an IPA string (in /slashes/), pick a target language, then Generate.

Model card

Architecture
ByT5-small, fine-tuned
Parameters
300M
Training data
WikiPron (1.7M, 165 langs) + Nomograph augmentations (1M names × 20 langs)
Evaluation
Mean CER 0.055 · char-BLEU 0.914
Top-3 (beam 90)
CER 0.026 (−52.7%)
Inference
~40ms p50, 80ms p95 on A10G
License
Apache 2.0 — weights on HF

API

# Grapheme → IPA curl -X POST https://api.mondonomo.ai/v1/polyipa/g2p \ -H "Authorization: Bearer $TOKEN" \ -d '{"name": "Eugen", "lang": "de"}' # → {"ipa": "/ˈɔʏɡn̩/", "candidates": [...]} # IPA → grapheme (top-3) curl -X POST https://api.mondonomo.ai/v1/polyipa/p2g \ -H "Authorization: Bearer $TOKEN" \ -d '{"ipa": "/ˈjuːdʒɪn/", "lang": "en", "k": 3}'

METHOD NOTE

Training pairs were augmented by passing the top 1M Nomograph names through the model for 20 randomly-sampled languages each, weighted by language size. Each pair was then scored with the panphon feature-based distance, keeping only outputs within an articulatory threshold. The result is a dictionary mapping each language-grapheme combination to its plausible IPA realizations — a generalization signal that pure WikiPron lacked for proper names.

RELATED MODELS

PolyIPA powers downstream tasks.