Thursday, 28 Mar 2024
blog

Voice Is the Next Big Platform, Unless You Have an Accent | WIRED

That’s why, very often, voice recognition technology reacts to accents differently than humans, says Anne Wootton, co-founder and CEO of the Oakland-based audio search platform Pop Up Archive, “Oftentimes the software does a better job with like, Indian accents than deep Southern, like Shenandoah Valley accents,” she says. “I think that’s a reflection of what the training data includes or does not include.”

Rachael Tatman, a PhD candidate at the University of Washington’s Department of Linguistics who focuses on sociolinguistics, noted that the underrepresented groups in these data sets tend to be groups that are marginalized in general. A typical database of American voices, for example, would lack poor, uneducated, rural, non-white, non-native English voices. “The more of those categories you fall into, the worse speech recognition is for you,” she says.

Still, Jeffrey Kofman, the CEO and co-founder of Trint, another automated speech-to-text software based in the UK, is confident accent recognition is something speech science will be able to eventually solve. We video chatted on the Trint platform itself, where Australian English is now available alongside British and North American English as transcription accents. Trint also offers speech-to-text in a dozen European languages, and plans to add South Asian English sometime this year, he said.

Collecting data is expensive and cumbersome, which is why certain key demographics take priority. For Kofman, that’s South Asian accents, “because there are so many people from India, Pakistan, and those countries here in England, in the US and Canada, who speak very clearly but with a distinct accent,” he says. Next, he suspects, he’ll prioritize South African accents.

Obviously, it’s not just technology that discriminates against people with accents. It’s also other people. Mass media and globalization are having a huge effect on how people sound. Speech experts have documented the decline of certain regional American accents since as early as 1960, for example, in favor of a more homogenous accent fit for populations from mixed geographic areas. This effect is exacerbated when humans deal with digital assistants or operators; they tend to use a voice devoid of colloquialisms and natural cadence.

Or, in other words, a voice devoid of an identity and accent.

As voice recognition technology becomes better, using a robotic accent to communicate with a device stands to be challenged — if people feel less of a need to talk to their devices as if they are machines, they can start talking to them as naturally as they would a friend. And while some accent reduction coaches find their clients use voice assistants to practice neutralizing their thick foreign or regional accents, Lisa Wentz, a public speaking coach in San Francisco who works in accent reduction, says that she doesn’t recommend it.

That’s because, she tells me, most of her clients are aiming for other people to understand them. They don’t want to have to repeat themselves or feel like their accents prevent others from hearing them. Using devices that aren’t ready for different voices, then, only stands to make this feeling echo.

My mother and I set up her Alexa app together. She wasn’t very excited about it. I could already imagine her distrust and fear of a car purported to drive by the command of her voice. My mother would never ride in it; the risk of crashing would be too real. Still, she tried out a couple of questions on the Echo.

“Alexa, play ‘Que sera sera,’” my mother said.

“I can’t find the song ‘Kiss your ass era.’”

My mom laughed, less out of frustration and more out of amusement. She tried again, this time speaking slower, as if she were talking to a child. “Alexa, play ‘Que sera sera.’” She sang out the syllables of sera in a slight melody, so that the device could clearly hear “se-rah.”

Alexa understood, and found what my mom was looking for. “Here’s a sample of ‘Que sera sera,’ by Doris Day,” she said, pronouncing the sera a bit harsher — “se-raw.”

The 1964 hit started to play, and my mother smiled at the pleasure of recognition.

Post Comment