AI-Detection Products May Harm Non-Native English Speakers

Hal 9000

As AI continues its advance into more aspects of HR and talent acquisition, more questions are being raised about the technology’s unintended consequences.

When it comes to screening, for example, AI can automate the process of sifting through resumes, assessing a candidate’s qualifications, arranging – and even directing – interviews. As a result, proponents say, recruiters and other TA professionals can devote more time to more complex, strategic tasks.

But now research from Stanford University reveals that at least some software solutions used to uncover text generated by AI can discriminate against non-native English speakers.

According to the Guardian, tests on seven software products used to detect AI-generated text can discriminate against people whose first language is not English. The researchers said that the “99%” accuracy rate touted by some detection products is “misleading at best.”

Inflated Expectations

A team led by Stanford Assistant Professor James Zhou processed 91 essays written by non-native English speakers through the seven TOEFL – or Test of English as a Foreign Language – programs in order to evaluate the results. More than half were flagged as AI-generated. More than 90% of the essays written by native-speaking American eighth graders were identified as human generated.

To do this, the researchers looked at “text perplexity,” a metric that examines how well a program predicts the next word in a sequence. Texts with higher perplexities are more likely to be written by humans, while lower perplexities indicate a machine-driven author. As the Guardian puts it, these programs look at: “how ‘surprised’ or ‘confused’ a generative language model is when trying to predict the next word in a sentence. If the model can predict the next word easily, the text perplexity is ranked low, but if the next word proves hard to predict, the text perplexity is rated high.”

Mistaken Identity

Large language models like ChatGPT produce low perplexity text, the Guardian said. That means writing that includes many common word in familiar patterns are more likely to be categorized as AI-generated. That scenario is more likely to take place when the writing of non-native English speakers is reviewed, the researchers said in the journal Patterns. When the essays were rewritten – by ChatGPT – using language more sophisticated than required by TOFEL, they were determined to be written by humans.

“Paradoxically,” the researchers noted, “GPT detectors might compel non-native writers to use GPT more to evade detection.” they said.

The implications of all this are “serious,” the researchers wrote. For one thing, search engines – like Google – downgrade content they believe is created by AI. And, in a related article, Jahna Otterbacher, an accociate professor at the Open University of Cyprus, pointed out that “ChatGPT is constantly collecting data from the public and learning to please its users; eventually, it will learn to outsmart any detector.”

Image: Wikimedia

Previous articleNew Leapsome Program Offers ‘People Enablement’ Tools to Startups
Next articleHR Professionals Want to See More Training, Education and Laws for AI