MODEL 01 · MULTILINGUAL G2P + P2G

PolyIPA

      
      Powered by MondoPhone · P2G shipping today
    

A ByT5-based seq2seq model trained to convert names between their written form (grapheme) and pronunciation (IPA), across more than 100 languages and writing systems. PolyIPA is the production P2G component inside the MondoPhone ensemble (Transformer + WFST) currently in development. The companion models IPA2vec and similarIPA power the soundalike search downstream.

Paper (arXiv 2412.09102) ↗ Hugging Face ↗

Try it — grapheme → IPA

Enter a name in any script. The model returns its IPA pronunciation in the source language (or, when language is ambiguous, the most-likely reading).

EXAMPLES:

Enter a name above and press Convert.

Try it — IPA → graphemes (top-3 with beam search)

Given an IPA transcription and a target language, return three candidate spellings ranked by likelihood. Beam search with 90 beams; top-3 reduces effective CER by 52.7% (to 0.026) according to the paper.

EXAMPLES:

Enter an IPA string (in /slashes/), pick a target language, then Generate.

Model card

Architecture: ByT5-small, fine-tuned
Parameters: 300M
Training data: WikiPron (1.7M, 165 langs) + Nomograph augmentations (1M names × 20 langs)
Evaluation: Mean CER 0.055 · char-BLEU 0.914
Top-3 (beam 90): CER 0.026 (−52.7%)
Inference: ~40ms p50, 80ms p95 on A10G
License: Apache 2.0 — weights on HF

API

# Grapheme → IPA
curl -X POST https://api.mondonomo.ai/v1/polyipa/g2p \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"name": "Eugen", "lang": "de"}'

# → {"ipa": "/ˈɔʏɡn̩/", "candidates": [...]}

# IPA → grapheme (top-3)
curl -X POST https://api.mondonomo.ai/v1/polyipa/p2g \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"ipa": "/ˈjuːdʒɪn/", "lang": "en", "k": 3}'

METHOD NOTE

Training pairs were augmented by passing the top 1M Nomograph names through the model for 20 randomly-sampled languages each, weighted by language size. Each pair was then scored with the panphon feature-based distance, keeping only outputs within an articulatory threshold. The result is a dictionary mapping each language-grapheme combination to its plausible IPA realizations — a generalization signal that pure WikiPron lacked for proper names.

RELATED MODELS

PolyIPA powers downstream tasks.

MODEL 06

Soundalike Search

IPA2vec phonetic embeddings ride on PolyIPA's output to find soundalikes across languages.

echoes→

MODEL 02

Transliterator

Cross-script conversion uses PolyIPA as an intermediate phonetic representation when no direct corpus exists.

scripts→

MODEL 04

Name Parser

For ambiguous tokens (is "de la" a particle or part of a given name?), phonetic signal disambiguates.

structured-extraction→