Bias and privacy in AI’s cough-based COVID-19 recognition

Artificial intelligence in clinical care amidst COVID-19 pandemic: A  systematic review - ScienceDirect

We read with interest the Comment by Coppock and colleagues,


in which the authors express their thoughtful opinion about several simultaneous works by independent research groups worldwide (eg, Massachusetts Institute of Technology, National Research Council of Canada, University of Cambridge, and Swiss Federal Institute of Technology Lausanne). One of these works was our own; a pioneering, multicentre, international study


with a clinically validated dataset of forced coughs alongside quantitative RT-PCR from participants who physically attended a test centre. Participant control was performed on site by health personnel at the partner health centres that contributed to this study.

The Comment


focuses on human audio biometrics in general, albeit eight out of ten referenced works made use of coughs as their audio source. The use of cough sounds to detect respiratory system abnormalities had been investigated for at least 3 years before the COVID-19 pandemic.


Cough analysis is a particular form of audio biometrics, and other forms of audio biometrics cannot be discussed interchangeably.

Although person authentication via speech has been achieved to some degree; to date, recognising an individual with ease in a large database solely by the sound of their cough is inconclusive. The same applies to inferring emotional traits.
The prospect of re-identifying patients from cough sounds raises several concerns. Will participants whose data is used in developing these pre-screening systems be able to benefit from it in the future in an unbiased manner? Additionally, how can personal biometric data be made public, ensuring that it always remains non-identifiable? On the basis of the current health context, official calls have been made not to trivialise the privacy and protection of patient data.



Subsequent research initiatives from public bodies must now help to enable inclusive research clusters and secure collaborative infrastructures in the domain of audio biometrics.

Our training, development (validation), and holdout (test) sets do not contain data from the same participant (ie, they are participant-independent) to avoid spurious discerning patterns that could compromise classification scores.


However, a general assumption is that a competent biometrics classifier should maximise the distinction between divergent patterns within participants, while minimising that of different participants within the same class. The inclusion of divergent observations (negative and positive) from the same participant, in the training set only, could help to satisfy this assumption; however, this effect requires further study.

To conclude, there is hope that rapid, point-of-need pre-screening for COVID-19 via forced cough sounds captured from smartphones or portable devices could be feasible in the short term. Online interventional studies are now necessary to explore the potential of this novel technology and to evaluate its real-life performance, health impact assessment, end-user utility, and acceptability.