Amazon Debacle Spotlights Machine Learning’s Achilles Heel

Dominos Falling

 

AI and machine learning enthusiasts don’t like to talk about it much, but get a drink into them and sooner or later they’ll admit metrics have biases. Amazon’s just learned that lesson the hard way.

About two weeks ago Amazon, which has pushed the use of AI into nearly every corner of its operations, took its machine-learning recruiting tool offline after discovering that, as Reuters said, it didn’t like women.

Sources inside Amazon told Reuters the system had been under development since 2014. The goal was obvious: review resumes by machine so recruiters could spend more time developing relationships and actually hiring people. Given that Amazon has all-but perfected the use of AI in recommendation engines, customer service, order-processing and logistics, the company’s applying advanced technology to hiring was a no-brainer.

But this time Amazon either aimed too high or just flat-out blew it. “Everyone wanted this holy grail,” one of Reuters’ sources said. “They literally wanted it to be an engine where I’m going to give you 100 resumes, it will spit out the top five, and we’ll hire those.”

How Machine Learning Becomes Biased

Somewhere along the way, however, the idea that many, if not most, “AI” recruiting engines set their baselines using historical data got lost. In Amazon’s case, Reuters said, the system compared applicants against patterns found in resumes submitted over a 10-year period. Since so many more men than women populate the tech workforce, the system inevitably machine-taught itself that male candidates were stronger than their female counterparts.

According to Reuters, resumes that included the word “women’s” were downgraded, as were the graduates of two all-women’s colleges. While the company revised the code involved to address such issues, it couldn’t be sure its machine learning wouldn’t teach itself new ways to screen candidates in a discriminatory fashion. Early last year, the company abandoned the effort, Reuters said.

Building a predictive model based on existing employees will inherently bias a model in that direction, observed Nick Possley, head of data products and engineering for AllyO, the Sunnyvale, Calif., provider of an “AI-based” recruiting platform. “In general, whatever you measure and use as a source of data will tend to optimize based on that data, so this is something that needs to be monitored in any system whether it be machine-based or human-based.”

Amazon wouldn’t comment except to say it remains committed to workplace diversity and equality. But one workforce-diversity consultant, who asked not to be identified and isn’t involved in HR tech—in other words, she’s a mere user—wondered how any intelligent system could downgrade applicants based on words like “women.” Whatever the technical explanation, the episode’s left a bad taste in some people’s mouths.

Machine Learning and the Human Factor

We don’t need to dwell on the fact that Amazon created a tool that was intended to minimize bias and ended up falling victim to it. The company certainly isn’t the first to find itself in the middle of that particular swamp. But since Amazon is Amazon, whenever it crashes, it crashes hard. In this case, the lesson isn’t particularly startling: The results of machine learning are only as good as the data it’s got to work with and how the system is being taught.

“If the measurement of the model or selection criteria is changed to focus on the base skill sets required for the job, then, at the very least you may get better candidates that can get the job done without undesired biases such as gender or race,” said Possley.

Monitoring the data being used to build a model and make decisions, then measuring the results of those decisions, are key to ensuring the system works as advertised. For AI, Possley said, it’s easy to monitor the source data and results. At the same time, it’s almost impossible for human-based systems to know source data. “People have many unknown factors that bias their decisions, so measuring results becomes important on an individual decision-making level,” Possley said.

“Business people have been sold on the notion that AI’s advanced algorithms magically analyze information in a black box and then spit out reliable insights. How? They just do,” observed John Harney, co-founder and chief technology officer of DataScava, a New York-based company that’s developed what it calls an unstructured data miner. “But really, machines only work when humans review their work and teach them how to provide better results. At its best, AI is a rookie on your team that presents problems with accuracy, ambiguity and accountability.”

We don’t expect such pesky details to put any kind of brake on the market for advanced recruiting tools. “The ability to find the right match in the least amount of time possible and enhance the overall recruiting experience for recruiters and hiring managers, as well as for candidates, are some of the key deciding factors for customers to select the right recruiting platform,” Madhur Mayank Sharma, SAP’s head of machine learning for HR, told TechTarget earlier this year.

True as that may be, many recruiters remain wary. “While it’s understandable that Amazon wants to create this kind of ‘direct matching’ tool, the project’s failure confirms that direct candidate matching cannot (currently) be effective due to the lack of objective unbiased data in resumes and job descriptions,” said Ken Lazarus, CEO of Scout Exchange, a platform that connects employers with search firms. “This is why human recruiters will continue to play a vital role in recruiting, and the key to better recruiting results is to find the human recruiter who has the best track record matching candidates to each job.”

Sign up for our newsletter here.

Image: iStock