This book is an intervention -
Chapter 3
The bodies underneath the rubble
By Deborah Raji
“The individuals caught in the web of AI’s false promises are just the latest casualties of technological collapse. It will never be enough to feign ignorance, deflect blame, and keep going. For the sake of those that suffer, responsibility is required.”
On August 29th 1907, the Quebec Bridge collapsed. Bound for the record books as the longest cantilever bridge ever to be built, the disaster cost 75 of the 86 lives working on its construction that day. In the formal inquiry that followed, the cause of the accident was traced to the poor oversight of the two lead engineers, Theodore Cooper and Peter Szlapka. The landmark declaration on engineering responsibility called out both by name. Their avoidable missteps, careless miscalculations, and chronic disregard for safety were described in detail and ultimately identified as key to the bridge’s structural failure.
Decades later, in 1965, US consumer advocate Ralph Nader published Unsafe at Any Speed, a book which would transform the automotive industry. In it, Nader revealed the alarming instability of the Chevrolet Corvair—a vehicle which, for him, exemplified the auto industry’s indifference towards safety. He noted an absence of foresight in how cars were designed and actually used, and connected this to a string of avoidable collisions and accidents. He flagged this as a new era, one marked by its own kind of collapse. In lieu of careful planning and precaution, he observed instead only “an explosion of what is actually done”.
Such “explosions” persist today in the engineering of algorithmic systems. The rubble is everywhere. The deployment of the MiDAS algorithm led to over 20,000 Michiganians being erroneously accused of unemployment fraud.1 A model widely used in US hospitals to allocate healthcare systematically discriminates against Black patients.2 Several facial recognition products fail on Black female faces.3 Uber’s self-driving car software has led to dozens of accidents and even a human death.4 In each case, the investigated causes are increasingly inexcusable—uninspected “corrupt and missing” data, the admittedly incorrect framing of a task, a remarkable lack of oversight in testing, or an unimplemented feature.
The harm inflicted by these products is a direct consequence of shoddy craftsmanship, unacknowledged technical limits, and poor design—bridges that don’t hold, cars that can’t steer. Contrary to popular belief, these systems are far more likely to cause harm when they do not work than on the rare occasion in which they work “too well”, or in truly unaccountable ways.
Even in light of these obvious harms, vendors remain stubborn—and defensive. Despite a 93% error rate, the MiDAS algorithm was used for at least three years. It took several highly publicised lawsuits for its use to finally be questioned. Optum, the company behind the healthcare prioritisation algorithm, called audit results helpful but “misleading” and continues to deploy the system on millions of patients.5 In spite of a peer reviewed study revealing an over 30% error rate on darker female faces, Amazon still pitches its biased facial recognition technology for nation-wide partnerships with law enforcement. Uber’s automated vehicles, which are still being tested on public streets, continue to run on its flawed software.
Just as the car manufacturer called out by Nader shifted blame onto car dealerships for failing to recommend tyre pressures to “correct” the Corvair’s faulty steering, algorithm developers also seek scapegoats for their own embarrassing failures. Optum states that the quality of its healthcare algorithm’s assessments actually depends on “the doctor’s expertise”. Amazon discusses setting higher accuracy thresholds for police clients, and MiDAS’s creators point to the difficulty of migrating from an independent legacy system. Even the fatal Uber self-driving car crash was ultimately blamed on a distracted test driver and the victim’s own jaywalking. Those creating these algorithms will point to anything—the human operator, institutional infrastructure, society-at-large, or even the actions of the affected population itself—before admitting to inherent flaws in the product’s design and implementation. Eyes averted, they claim the issues reported are really someone else’s responsibility.
However, just as it was General Motors’ responsibility to ensure that the Corvair’s required tyre pressures were within the range of recommended tolerance, it is also Amazon’s responsibility to inform police clients to operate at a level other than the default accuracy threshold. It is Uber’s responsibility to ensure drivers remain adequately attentive at the wheel, the MiDAS developers’ task to make their algorithm portable, and Optum’s role to analyse the causal assumptions in their data. To forget a test, neglect design or dismiss the data is not just unfortunate, it’s irresponsible.
Any institutional stakeholder involved in the development of a product—be they engineer, executive business authority, or product or marketing participant—does, ultimately, have an impact on outcomes. The technical community tends to simultaneously anthropomorphise AI systems, ascribing to them a false ability to “think” and “learn” independently of the provided input, while also downplaying, with language of data “bricks”, “moats” and “streams”, the existence of real humans in the data and human participation in the decisions that shape these systems. This reluctance to admit to the human influence on AI functionality makes the field stubbornly blind to its contribution to these systems. Technologists are not like doctors, looking each patient in the eye. They stand at a distance, the relationship between their judgement and system outcomes blurred by digitised abstraction, their sense of responsibility dampened by scale, the rush of agile innovation, countless undocumented judgements, and implicit feature engineering. The result is an imagined absolution of responsibility, a false narrative in which they’ve created an artificial system outside of anyone’s control, while the human population affected by their decisions and mistakes is inappropriately erased.
As a former participant in an Applied Machine Learning team, I’ve witnessed the model development process up close. It’s a chaotic enterprise. Whether building a moderation model that disproportionately filtered out the content of people of colour, or training a hair classifier that did not have inclusive categories, we regularly disappointed our clients and ourselves, frequently falling short of expectations in practice. Even after a live pilot failed or unacceptable bias was discovered, at the client’s request we would deploy the model anyway. The fact is, in this field, inattentive engineers often get away with it. No data review is imposed, nor reporting requirements. There is no sanctioned communication protocol with operators, no safety culture to speak of, no best practices to adhere to, no restrictive regulations or enforced compliance, and no guide for recall—voluntary or imposed—to remove from our daily lives the models that cannot be steered, the algorithms without brakes.
Of the 75 construction workers who died in the Quebec Bridge catastrophe, up to 35 were Native Americans, Mohawks from the Kahnawake community who faced a dearth of employment options at the time and received poverty wages. This is what happens when systems collapse: they fall on the most vulnerable. In his book, Nader described Ms. Rose Pierini “learning to adjust to the loss of her left arm” after a crash caused by the unaddressed steering challenges of the Chevrolet Corvair. He profiled Robert Comstock, a “veteran garage mechanic”, whose leg was amputated after he was run over by a Buick with no brakes. Today, we discuss Robert Williams, a Black man wrongfully arrested due to an incorrect facial recognition “match”; Carmelita Colvin, a Black woman falsely accused of unemployment fraud; Tammy Dobbs, a sickly older lady who lost her healthcare due to a program glitch, and Davone Jackson, locked out of the low-income housing he and his family needed to escape homelessness due to a false report from an automated tenant screening tool.
The fact is, there are bodies underneath the rubble. The individuals caught in the web of AI’s false promises are just the latest casualties of technological collapse. It will never be enough to feign ignorance, deflect blame, and keep going. For the sake of those that suffer, responsibility is required. It’s a concurrent goal, of a differing urgency, to address alongside any ambition for a more hopeful future.
Deborah Raji is a Mozilla fellow, interested in algorithmic auditing. She also works closely with the Algorithmic Justice League initiative to highlight bias in deployed products.
Notes
1. Charette, R. N. (2018, 24 January) Michigan’s MiDAS Unemployment System: Algorithm Alchemy Created Lead, Not Gold, IEEE Spectrum. https://spectrum.ieee.org/riskfactor/computing/software/michigans-midas-unemployment-system-algorithm-alchemy-that-created-lead-not-gold
2. Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. (2019) Dissecting racial bias in an algorithm used to manage the health of populations. Science 366.6464: 447-453. DOI: 10.1126/science.aax2342
3. Raji, I. D. & Buolamwini, J. (2019) Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial AI products. Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. DOI: 10.1145/3306618.3314244
4. Lee, K. (2019, 6 November) Uber’s Self-Driving Cars Made It Through 37 Crashes Before Killing Someone. Jalopnik.https://jalopnik.com/ubers-self-driving-cars-made-it-through-37-crashes-befo-1839660767
5. Ledford, H. (2019) Millions of black people affected by racial bias in health-care algorithms. Nature 574.7780: 608-610. DOI: 10.1038/d41586-019-03228-6
Decades later, in 1965, US consumer advocate Ralph Nader published Unsafe at Any Speed, a book which would transform the automotive industry. In it, Nader revealed the alarming instability of the Chevrolet Corvair—a vehicle which, for him, exemplified the auto industry’s indifference towards safety. He noted an absence of foresight in how cars were designed and actually used, and connected this to a string of avoidable collisions and accidents. He flagged this as a new era, one marked by its own kind of collapse. In lieu of careful planning and precaution, he observed instead only “an explosion of what is actually done”.
Such “explosions” persist today in the engineering of algorithmic systems. The rubble is everywhere. The deployment of the MiDAS algorithm led to over 20,000 Michiganians being erroneously accused of unemployment fraud.1 A model widely used in US hospitals to allocate healthcare systematically discriminates against Black patients.2 Several facial recognition products fail on Black female faces.3 Uber’s self-driving car software has led to dozens of accidents and even a human death.4 In each case, the investigated causes are increasingly inexcusable—uninspected “corrupt and missing” data, the admittedly incorrect framing of a task, a remarkable lack of oversight in testing, or an unimplemented feature.
The harm inflicted by these products is a direct consequence of shoddy craftsmanship, unacknowledged technical limits, and poor design—bridges that don’t hold, cars that can’t steer. Contrary to popular belief, these systems are far more likely to cause harm when they do not work than on the rare occasion in which they work “too well”, or in truly unaccountable ways.
Even in light of these obvious harms, vendors remain stubborn—and defensive. Despite a 93% error rate, the MiDAS algorithm was used for at least three years. It took several highly publicised lawsuits for its use to finally be questioned. Optum, the company behind the healthcare prioritisation algorithm, called audit results helpful but “misleading” and continues to deploy the system on millions of patients.5 In spite of a peer reviewed study revealing an over 30% error rate on darker female faces, Amazon still pitches its biased facial recognition technology for nation-wide partnerships with law enforcement. Uber’s automated vehicles, which are still being tested on public streets, continue to run on its flawed software.
Just as the car manufacturer called out by Nader shifted blame onto car dealerships for failing to recommend tyre pressures to “correct” the Corvair’s faulty steering, algorithm developers also seek scapegoats for their own embarrassing failures. Optum states that the quality of its healthcare algorithm’s assessments actually depends on “the doctor’s expertise”. Amazon discusses setting higher accuracy thresholds for police clients, and MiDAS’s creators point to the difficulty of migrating from an independent legacy system. Even the fatal Uber self-driving car crash was ultimately blamed on a distracted test driver and the victim’s own jaywalking. Those creating these algorithms will point to anything—the human operator, institutional infrastructure, society-at-large, or even the actions of the affected population itself—before admitting to inherent flaws in the product’s design and implementation. Eyes averted, they claim the issues reported are really someone else’s responsibility.
However, just as it was General Motors’ responsibility to ensure that the Corvair’s required tyre pressures were within the range of recommended tolerance, it is also Amazon’s responsibility to inform police clients to operate at a level other than the default accuracy threshold. It is Uber’s responsibility to ensure drivers remain adequately attentive at the wheel, the MiDAS developers’ task to make their algorithm portable, and Optum’s role to analyse the causal assumptions in their data. To forget a test, neglect design or dismiss the data is not just unfortunate, it’s irresponsible.
Any institutional stakeholder involved in the development of a product—be they engineer, executive business authority, or product or marketing participant—does, ultimately, have an impact on outcomes. The technical community tends to simultaneously anthropomorphise AI systems, ascribing to them a false ability to “think” and “learn” independently of the provided input, while also downplaying, with language of data “bricks”, “moats” and “streams”, the existence of real humans in the data and human participation in the decisions that shape these systems. This reluctance to admit to the human influence on AI functionality makes the field stubbornly blind to its contribution to these systems. Technologists are not like doctors, looking each patient in the eye. They stand at a distance, the relationship between their judgement and system outcomes blurred by digitised abstraction, their sense of responsibility dampened by scale, the rush of agile innovation, countless undocumented judgements, and implicit feature engineering. The result is an imagined absolution of responsibility, a false narrative in which they’ve created an artificial system outside of anyone’s control, while the human population affected by their decisions and mistakes is inappropriately erased.
As a former participant in an Applied Machine Learning team, I’ve witnessed the model development process up close. It’s a chaotic enterprise. Whether building a moderation model that disproportionately filtered out the content of people of colour, or training a hair classifier that did not have inclusive categories, we regularly disappointed our clients and ourselves, frequently falling short of expectations in practice. Even after a live pilot failed or unacceptable bias was discovered, at the client’s request we would deploy the model anyway. The fact is, in this field, inattentive engineers often get away with it. No data review is imposed, nor reporting requirements. There is no sanctioned communication protocol with operators, no safety culture to speak of, no best practices to adhere to, no restrictive regulations or enforced compliance, and no guide for recall—voluntary or imposed—to remove from our daily lives the models that cannot be steered, the algorithms without brakes.
Of the 75 construction workers who died in the Quebec Bridge catastrophe, up to 35 were Native Americans, Mohawks from the Kahnawake community who faced a dearth of employment options at the time and received poverty wages. This is what happens when systems collapse: they fall on the most vulnerable. In his book, Nader described Ms. Rose Pierini “learning to adjust to the loss of her left arm” after a crash caused by the unaddressed steering challenges of the Chevrolet Corvair. He profiled Robert Comstock, a “veteran garage mechanic”, whose leg was amputated after he was run over by a Buick with no brakes. Today, we discuss Robert Williams, a Black man wrongfully arrested due to an incorrect facial recognition “match”; Carmelita Colvin, a Black woman falsely accused of unemployment fraud; Tammy Dobbs, a sickly older lady who lost her healthcare due to a program glitch, and Davone Jackson, locked out of the low-income housing he and his family needed to escape homelessness due to a false report from an automated tenant screening tool.
The fact is, there are bodies underneath the rubble. The individuals caught in the web of AI’s false promises are just the latest casualties of technological collapse. It will never be enough to feign ignorance, deflect blame, and keep going. For the sake of those that suffer, responsibility is required. It’s a concurrent goal, of a differing urgency, to address alongside any ambition for a more hopeful future.
Deborah Raji is a Mozilla fellow, interested in algorithmic auditing. She also works closely with the Algorithmic Justice League initiative to highlight bias in deployed products.
Notes
1. Charette, R. N. (2018, 24 January) Michigan’s MiDAS Unemployment System: Algorithm Alchemy Created Lead, Not Gold, IEEE Spectrum. https://spectrum.ieee.org/riskfactor/computing/software/michigans-midas-unemployment-system-algorithm-alchemy-that-created-lead-not-gold
2. Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. (2019) Dissecting racial bias in an algorithm used to manage the health of populations. Science 366.6464: 447-453. DOI: 10.1126/science.aax2342
3. Raji, I. D. & Buolamwini, J. (2019) Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial AI products. Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. DOI: 10.1145/3306618.3314244
4. Lee, K. (2019, 6 November) Uber’s Self-Driving Cars Made It Through 37 Crashes Before Killing Someone. Jalopnik.https://jalopnik.com/ubers-self-driving-cars-made-it-through-37-crashes-befo-1839660767
5. Ledford, H. (2019) Millions of black people affected by racial bias in health-care algorithms. Nature 574.7780: 608-610. DOI: 10.1038/d41586-019-03228-6