Comparing models for discrimination of infant cries for the early detection of developmental disorders
Crying is, for babies, the earliest way of expressing and communicating needs like hunger, pain, discomfort or tiredness. Apart from that, the cry is an acoustic signal containing information that provides insights into the medical status of an infant. Much research has been conducted to explore the acoustic properties of infant cries and the potential to identify differences in those properties between healthy and non-healthy cries, using computational models and algorithms as well as by human listeners. Although the previous researches did not examine sufficiently to see if human listeners are able to differentiate not only between healthy and non-healthy cries but also between different types of pathologies and a comparison of the classification skills of computational models in contrast to the skills of human listeners.
The authors of the paper analyzed and compared the ability of human listeners and automatic classification models to rate the health state of infants by their crying. During the experiment the listeners, such as naïve listeners (students and parents) and expert listeners (nurses/midwives and therapists), were trained to auditorily discriminate the cries of healthy infants, as well as infants with various pathologies like hearing impairment (HI), cleft-lip-and palate (CLP), asphyxia (AS), laryngomalacia (LA), brain damage (BD), etc . After training, the listeners rated cries of infants with different health states and their rating skills were compared to the classification skills of computation models.
Generally the infant cry classification can be performed in two ways: computational classification of cries or auditory discrimination by human listeners. This article compares both of them. During the experiment a total number of 120 participants were divided into the 4 groups: naïve listeners (group 1), parents (group 2), nurses/midwives (group 3) and therapists (group 4).
Based on the following inclusion and exclusion criteria, these groups were chosen to capture listeners with varying experience in hearing infant cries:
- Naïve listeners: no experience in hearing infant crying
- Parents: frequent long-term contact to a limited, familiar group of healthy infants
- Nurses, midwives: frequent short-term contact to many healthy and rare contact to non-healthy infants
- Therapists: frequent long-term contact to many non-healthy infants
All participants were female and German without hearing impairments.
All listeners were trained in hearing cries of healthy infants and cries of infants suffering from cleft-lip-and-palate, hearing impairment, laryngomalacia, asphyxia and brain damage. After training, a listening experiment was performed by allocating 18 infant cries to the cry groups. All infant cry samples used in this study were taken from a dataset of infant cries, created during research by authors on infant cry classification. The authors collected cry samples of 69 infants between 1 and 7 months of age, in total, 6 different infant groups were recorded: 31 infants were healthy, without any developmental disorders, 10 infants had an unilateral cleft-lip-and palate (CLP), 19 infants were hearing impaired (HI, threshold of -60dB hearing loss), 4 infants were suffering from laryngomalacia, 3 were asphyxiated infants and 2 infants had brain damage.
The cries of the infants were recorded with a sampling rate of 48 kHz and 24-bit digital resolution on a Zoom H2n recorder. The Zoom H2n recorder features a built-in microphone. The microphone was held about 30 cm away from the infants' mouths. The infants lay in a supine position during the recording. Recordings were made in similar environments. One full episode of crying was recorded for each infant. Recordings started with the first cry of the infant (using the H2n's pre-recording function).
Recordings were stopped when there was a 15 second pause with no crying. Each recording lasted about 10 to 30 seconds.
The multiple supervised-learning classifications models used in the experiment were calculated on the basis of the cries' acoustic properties. The accuracy of the models was compared to the accuracy of the human listeners.
The study showed interesting results for using the infant cry as a screening instrument, the human hearing can only give the first hints to an existing pathology. The listeners were not able to identify various pathologies with a high accuracy by hearing the infants' cry. However, human listeners acted better when selecting if the cries were healthy or not healthy.
The highest precision in rating infant cries was achieved by computational supervised-learning models. These were able to rate healthy and non-healthy cries and were able to distinguish various pathologies with higher accuracy. Supervised-learning classification models performed significantly better than the human listeners when categorise infant cries.