Index Phonemica (IPHON) is a database of phoneme inventories and allophonic rules extracted from source documents, focusing primarily on linguistic areas underrepresented in existing databases such as PHOIBLE, WALS, and PBase. As a consequence of this focus, no recycling of existing databases is undertaken.

The current version, v0.5.0, contains 681 entries, representing data from 455 languages. These entries contain a total of 1336 distinct segments, each of which is mapped to a set of features. (This mapping currently uses a modified version of PHOIBLE's featuralization code and feature set, but this will change before v1.0.)

Using the Index

The Index can be browsed by language, doculect, or segment view, or searched with Pshrimp.

The Index makes no attempt at representative sampling of languages, and thus should not be used to establish overall statistical patterns.

Languages and doculects

The distinction between 'language' and 'doculect' is largely borrowed from PHOIBLE; however, the Index uses Glottocodes rather than ISO codes to uniquely identify languages. In the Index, a language is simply a language-level glottocode, although dialect-level glottocodes are assigned in the dialect_name field when available. Language metadata, such as language family, latitude, and longitude, are imported from Glottolog.

A doculect in the Index corresponds to a single source. As a result, there may be many different doculect entries for one language. Doculects are uniquely identified by IPHON IDs, which consist of the Glottocode of the associated language and a chronologically incremented index, separated by a hyphen. (Chronological incrementing allows for unambiguous reference to specific entries in source documents that contain multiple inventories for the same language.)

Segmental transcription

The Index uses a slightly modified version of IPA. The differences are as follows:

Using the data

The data repository is here. To report an error in the data, file an issue.

Release: v0.5.0 (2020-05-30)

Languages: 455

Doculects: 681

Segments: 1336

Allophonic rules: 5825