MODEL 06 · IPA2VEC · PHONETIC EMBEDDINGS

Soundalike Search

Powered by MondoPhone (IPA2vec embeddings on phonetic output)

Find names that sound like yours across world languages. The engine behind echoes.mondonomo.ai uses IPA2vec — articulatory-feature embeddings on PolyIPA's phonetic output — to cluster names across scripts by phonetic similarity. The same primitive powers KYC entity-resolution: Yousef, Yusuf and يوسف are the same person.

Model card Try echoes.mondonomo.ai ↗

Try it — name → soundalikes across languages

Enter a name. The model returns the closest phonetic matches in 6+ languages, ranked by IPA2vec cosine similarity.

EXAMPLES:
Enter a name and press Find echoes.

Model card

Architecture
IPA2vec — fastText on character n-grams of IPA strings, articulatory-feature regularized
Helper model
similarIPA — handles phonetic notation variations (broad vs narrow transcription)
Embedding dim
128
Distance
Cosine, calibrated to panphon articulatory-feature distance
Corpus
1M Nomograph names × 20 sampled languages → ~20M (name, IPA) pairs
Latency
~12ms for top-50 (ANN-indexed)
Use cases
KYC matching · echoes search · fuzzy dedup · name discovery

API

curl -X POST https://api.mondonomo.ai/v1/soundalikes \ -H "Authorization: Bearer $TOKEN" \ -d '{ "name": "Yusuf", "languages": ["en","fr","de","ru","ja","th","zh","hi"], "k": 10 }' { "query": "Yusuf", "query_ipa": "/ˈjuː.suf/", "matches": [ {"name": "Joseph", "lang": "en", "ipa": "/ˈdʒoʊzəf/", "sim": 0.73}, {"name": "Youssef", "lang": "fr", "ipa": "/jusɛf/", "sim": 0.91}, ... ] }
How it relates to PolyIPA. Soundalike sits one layer above PolyIPA in the stack: Name → PolyIPA → IPA → IPA2vec → 128-dim vector → ANN search → ranked candidates. Because the embedding is trained on articulatory features (not raw characters), it generalizes across writing systems that share no surface form — JosephيوسفЙосиф all cluster together.

RELATED

Built on the same pipeline.