Releases

childes-db will be updated every year with a most recent parse of the CHILDES database to reflect new corpora and to revise existing annotations (more details can be found in the documentation). These releases may also include new features (e.g., 2020.1 introduces phonological transcriptions for datasets in Phonbank). Visualizations and the childesr package will always use the most recent database version by default. The R API childesr and the Python API childespy can be directed to use any recent database version by using the version parameter.

2018.1

    Initial release

2019.1

Re-parsed to reflect 2019 changes in CHILDES. Note that this excludes key datasets like Providence which were moved to Phonbank.

2020.1

Re-parsed to reflect 2020 changes in CHILDES, as well as the 2020 verison of Phonbank.

2021.1

Re-parsed to reflect 2021 changes in CHILDES and PhonBank, using a new version of the corpus procesing code and a better set of tests.

Using childes-db Locally

For intensive use cases, e.g. repeatedly transferring more than 5 GB of data, users may wish to download one or more yearly releases of the database for installation on a local MySQL server (either on their own machine or a machine on their local network). The release databases can be downloaded mysqldump command:

mysqldump -v -u $USER_FROM_JSON -p$PASSWORD_FROM_JSON -h $HOST_FROM_JSON --single-transaction --no-tablespaces -C --quick --databases $DATABASES | mysql -u $LOCAL_USER -p$LOCAL_PASSWORD

Depending on your mysqlclient version, you might have to add the --column-statistics=0 option.

The first part of this command (mysqldump) outputs the content of the database as a text stream of SQL statements. The second part reads it into end-user's local MySQL server. Each yearly release is around 40 GB in size. We leave it as an exercise to the reader to replace the variables above (such as $HOSTNAME) with the correct values from the the JSON file that is used by the R and Python APIs to coordinate and authorize MySQL access, childes-db.json. The corresponence between variables is as follows:

$HOST_FROM_JSON"host" field in JSON
$USER_FROM_JSON"user" field in JSON
$PASSWORD_FROM_JSON"password" in JSON
$DATABASES{2020.1, 2019.1, 2018.1}
$LOCAL_USERLocal MySQL user (possibly root)
$LOCAL_PASSWORDLocal MySQL password for user


Once you have a local MySQL installation, refer to the documentation for childesr or childespy regarding how to use a local database server. For most uses cases, using the API with the default remote server (hosted on Amazon on EC2) should be sufficient.