File formats & software

A key goal of CorMS is to make data as accessible as possible across different platforms and users. Accordingly, all files associated with this site use open, non-proprietary formats. Audio files were recorded as and are presented here as .wav files; transcript files are given in ELAN and Praat formats, both of which are readable as plain text files. ReadMe and metadata files are also saved in formats that may be opened as text files (.txt or .csv).

Time-aligned, orthographic transcripts delimiting breath units are provided. No other annotations are provided here (e.g., word- or phone-level segmentations, phonetic transcriptions, morphosyntactic parses, etc.).  Any additional segmenting or coding along these lines necessarily involves additional theoretical assumptions or methodological decisions which may or may not be shared by other researchers, and approaches to data processing are constantly evolving. We at CorMs suspect that many linguists will want to “roll their own” analysis using whatever methods of data processing and analysis are considered to be best practices at the time, or that allows them maximal comparability with other studies they may be carrying out.  


Our interview recordings were transcribed using ELAN , free audio and video annotation software provided by The Language Archive at the Max Planck Institute for Psycholinguistics). Our corpora transcripts are stored as .eaf (ELAN annotation format) files containing annotations which are time-linked to accompanying .wav audio files. The .eaf files may be viewed and manipulated in various ways in ELAN itself, or exported to a number of different formats , but they are essentially (marked-up) text files, which means that they can be opened in any text editor and manipulated by any number of programs or scripts.


Praat is free software commonly used for speech analysis. Our ELAN transcripts have also been exported as Praat .TextGrid files, for easy opening and manipulation alongside the audio .wav files in Praat. Praat TextGrids are also just fancy text files which can be opened in and manipulated by any program that deals with text.