MODEL 01 · MULTILINGUAL G2P + P2G
A ByT5-based seq2seq model trained to convert names between their written form (grapheme) and pronunciation (IPA), across more than 100 languages and writing systems. PolyIPA is the production P2G component inside the MondoPhone ensemble (Transformer + WFST) currently in development. The companion models IPA2vec and similarIPA power the soundalike search downstream.
Enter a name in any script. The model returns its IPA pronunciation in the source language (or, when language is ambiguous, the most-likely reading).
Given an IPA transcription and a target language, return three candidate spellings ranked by likelihood. Beam search with 90 beams; top-3 reduces effective CER by 52.7% (to 0.026) according to the paper.
METHOD NOTE
Training pairs were augmented by passing the top 1M Nomograph names through the model for 20 randomly-sampled
languages each, weighted by language size. Each pair was then scored with the panphon
feature-based distance, keeping only outputs within an articulatory threshold. The result is a dictionary mapping
each language-grapheme combination to its plausible IPA realizations — a generalization signal that pure WikiPron lacked
for proper names.
RELATED MODELS
IPA2vec phonetic embeddings ride on PolyIPA's output to find soundalikes across languages.
Cross-script conversion uses PolyIPA as an intermediate phonetic representation when no direct corpus exists.
For ambiguous tokens (is "de la" a particle or part of a given name?), phonetic signal disambiguates.