Ajaya Neupane (University of California Riverside), Nitesh Saxena (University of Alabama at Birmingham), Leanne Hirshfield (Syracuse University), Sarah Elaine Bratt (Syracuse University)

A new generation of scams has emerged that uses voice impersonation to obtain sensitive information, eavesdrop over voice calls and extort money from unsuspecting human users. Research demonstrates that users are fallible to voice impersonation attacks that exploit the current advancement in speech synthesis. In this paper, we set out to elicit a deeper understanding of such human-centered “voice hacking” based on a neuro-scientific methodology (thereby corroborating and expanding the traditional behavioral-only approach in significant ways). Specifically, we investigate the *neural underpinnings* of voice security through *functional near-infrared spectroscopy* (fNIRS), a cutting-edge neuroimaging technique, that captures neural signals in both temporal and spatial domains. We design and conduct an fNIRS study to pursue a thorough investigation of users’ mental processing related to *speaker legitimacy detection* – whether a voice sample is rendered by a target speaker, a different other human speaker or a synthesizer mimicking the speaker. We analyze the neural activity associated within this task as well as the brain areas that may control such activity.

Our key insight is that there may be no statistically significant differences in the way the human brain processes the *legitimate speakers vs. synthesized speakers*, whereas clear differences are visible when encountering *legitimate vs. different other human speakers*. This finding may help to explain users’ susceptibility to synthesized attacks, as seen from the behavioral self-reported analysis. That is, the impersonated synthesized voices may seem *indistinguishable* from the real voices in terms of both behavioral and neural perspectives. In sharp contrast, prior studies showed *subconscious* neural differences in other real vs. fake artifacts (e.g., paintings and websites), despite users failing to note these differences behaviorally. Overall, our work dissects the fundamental neural patterns underlying voice-based insecurity and reveals users’ susceptibility to voice synthesis attacks at a biological level. We believe that this could be a significant insight for the security community suggesting that the human detection of voice synthesis attacks may not improve over time, especially given that voice synthesis techniques will likely continue to improve, calling for the design of careful machine-assisted techniques to help humans counter these attacks.

View More Papers

Privacy Attacks to the 4G and 5G Cellular Paging...

Syed Rafiul Hussain (Purdue University), Mitziu Echeverria (University of Iowa), Omar Chowdhury (University of Iowa), Ninghui Li (Purdue University), Elisa Bertino (Purdue University)

Read More

Measuring the Facebook Advertising Ecosystem

Athanasios Andreou (EURECOM), Márcio Silva (UFMG), Fabrício Benevenuto (UFMG), Oana Goga (Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG), Patrick Loiseau (Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LIG & MPI-SWS), Alan Mislove (Northeastern University)

Read More

REDQUEEN: Fuzzing with Input-to-State Correspondence

Cornelius Aschermann (Ruhr-Universität Bochum), Sergej Schumilo (Ruhr-Universität Bochum), Tim Blazytko (Ruhr-Universität Bochum), Robert Gawlik (Ruhr-Universität Bochum), Thorsten Holz (Ruhr-Universität Bochum)

Read More

MBeacon: Privacy-Preserving Beacons for DNA Methylation Data

Inken Hagestedt (CISPA Helmholtz Center for Information Security), Yang Zhang (CISPA Helmholtz Center for Information Security), Mathias Humbert (Swiss Data Science Center, ETH Zurich/EPFL), Pascal Berrang (CISPA Helmholtz Center for Information Security), Haixu Tang (Indiana University Bloomington), XiaoFeng Wang (Indiana University Bloomington), Michael Backes (CISPA Helmholtz Center for Information Security)

Read More