AI, “go” fever, and normalization of deviance

The tech industry is looking more and more like NASA in its darkest hours

Mar 06, 2024

Artificial intelligence has a blistering case of “go” fever, a term coined to describe the culture at NASA following the Apollo 1 disaster that claimed the lives of three astronauts. Sometimes called “launch” fever (as in, ‘go for launch’), it’s an organizational behavior phenomenon marked by a collective group determination to finish a project, release a product, launch a spacecraft, etc. at the expense of risk awareness or consideration of potential problems, shortcuts, or mistakes. If you’ve ever worked in any organization … public sector, private sector, or nonprofit … that was subject to time pressures or rewards for moving quickly, you’ve likely observed this variant of groupthink firsthand. Tech firms sometimes call it “product velocity.” Memorably, Facebook used to call it “move fast and break things.” More recently, the Gemini debacle at Google has all the signs of shortcuts with testing and evaluation in order to meet a schedule or keep pace with competitors. It’s hard to look at the state of commercial AI these days and not see go fever as helping drive the mind-boggling advances coming to market on a regular basis, as they’re increasingly followed by problems, failures, company apologies, and statements promising to do better next time. It does not help when myopic tech punditry opines on engineering fixes to these problems, thus masking the bigger issue: commercial AI has a crisis of culture.

This essay proposes a change of culture in AI that starts in engineering and draws upon mature practices in risk mitigation and safety that belong in the core engineering teams shipping AI systems. This change will take place, however, at the expense of standalone AI ethics or responsible AI teams in these firms. Many have argued that the discipline of AI ethics isn’t to blame for what happened with Google and Gemini and thus shouldn’t be a casualty, rebutting the theory that the product failure was a natural consequence of in-house activism, but that is beside the point. This org model doesn’t work in engineering-led firms (which most tech companies are, and many companies in vertical industries are becoming). For those inclined to circle the wagons around responsible AI as a critical org entity, I would ask, “How’s that working?” To date, go fever in tech has cut through responsible AI like a buzzsaw. A cultural reckoning with the lack of risk awareness in engineering is the only way to change it.

Let’s start with what history can teach us about engineering culture gone awry …

Apollo 1

Around midday on January 27^th, 1967, three astronauts boarded the Apollo 1 command module for a launch rehearsal called a “plugs-out” test to see if the spacecraft would operate properly on internal power once it was disconnected from launch pad cables. The test was not considered risky because of the absence of fuels, but once sealed inside the spacecraft, an electrical short ignited the pure oxygen-pressurized environment, instantly filling it with flames. The astronauts in the capsule were burned alive; it was over in about ten seconds. NASA asked President Johnson to allow the agency to investigate itself, promising to leave no stone unturned. The final report documented the technical causes of the disaster, but it also included documentation of prior concerns relating to designs, safety procedures, and testing. After the accident, NASA’s Gene Kranz (who would later become well-known as the flight director of Apollo 13) said at the time,

“We were too ‘gung-ho’ about the schedule and we blocked out all of the problems we saw each day in our work. Every element of the program was in trouble and so were we.”

Apollo astronaut Wally Schirra was a vocal critic of the overall design before the accident, and would later say,

“I was annoyed at the way what became Apollo 1 came out of the plant … it was not finished. And that, of course, caused this whole atmosphere of … ‘go’ fever. ‘Go’ fever, meaning that we’ve got to keep going, keep going …. And there were things going on that I didn’t like at all. I was no longer annoyed; I was really pretty goddamn mad. There were glitches, electronic things that just didn’t come out right.”

Given NASA’s success with the Mercury and Gemini programs, the space race with the Soviet Union, and President Kennedy’s stated goal of getting to the Moon and back by the end of the decade, go fever was now an omnipresent factor in NASA’s engineering culture amidst massive time pressures on the agency.

Challenger and normalization of deviance

After the space shuttle Challenger exploded on January 28^th, 1986, President Reagan appointed former Secretary of State William Rogers to investigate the disaster, leading to the formation of the Rogers Commission. Reagan told Rogers, “Whatever you do, don’t embarrass NASA.” Rogers adhered to this directive until engineers from solid rocket booster (SRB) manufacturer Morton Thiokol became frustrated by the obvious whitewash and began speaking publicly about the systemic failures in decision-making and unacceptable tolerance for known risk of SRB O-ring failures that led to the tragedy. The Rogers Commission Report was a damning indictment of how NASA operated, with named actors bearing specific accountabilities for their decisions. But it stopped short of indicting NASA’s culture, insisting a few individuals were to blame. Columbia University sociologist Diane Vaughan took exception to what she felt was an insufficient accounting of root cause and, obviously familiar with go fever at NASA, researched the disaster herself. The result was a new theory of organizational behavior called normalization of deviance, and a book chronicling how culture shaped the decision to launch. As Vaughan describes it,

“Social normalization of deviance means that people within the organization become so much accustomed to a deviation that they don’t consider it as deviant, despite the fact that they far exceed their own rules for the elementary safety.”

She described a typical decision-making process in these organizations whereby a clearly unsafe practice becomes considered normal if it does not immediately cause a catastrophe. What follows is an “incubation period” that precedes the final disaster during which early warning signs are “either misinterpreted, ignored or missed completely”. Vaughan’s theory did not go over well at NASA and was widely dismissed. NASA was no doubt emboldened by what followed: 87 shuttle launches over 17 years without a single accident or loss of life. That all ended on February 1^st, 2003.

Columbia

The cause of space shuttle Columbia’s fiery disintegration upon re-entry was quickly identified by video images of its launch two weeks prior, which showed a large chunk of foam insulation breaking free from the external fuel tank 82 seconds into flight and striking the leading edge of the orbiter’s left wing. The damage to the wing allowed superheated gases to enter the wing structure during re-entry and ultimately breach the orbiter’s protections. It disintegrated, killing all aboard. The formation of the Columbia Accident Investigation Board (CAIB) soon followed, which heard testimony from various subject matter experts. When Diane Vaughan came to testify, it was not necessarily a warm welcome. Some members of the CAIB were defensive of NASA and still unconvinced by her normalization of deviance theory, with one member rudely asking her, “How are book sales going, Dr. Vaughan?” Her testimony was nonetheless compelling, drawing comparisons to Challenger, specifically in light of the revelation that foam debris had broken loose on 65 of the 79 missions for which imagery was available, but was deemed a maintenance issue, not a safety-of-flight issue. In other words, the acceptance of outsized risks became normalized.

In the end, Vaughan was asked to join the CAIB and authored Chapter 8 of its final report, History As Cause: Columbia and Challenger. Unfortunately, it took a lost shuttle and seven more fatalities to validate her thesis about Challenger, which has since been applied to a wide variety of scenarios: flood control failures in New Orleans after Hurricane Katrina, the BP Deepwater Horizon blowout, the 737-MAX crashes, and even the public’s deviation from generally accepted health measures during the COVID-19 pandemic.

A culture of “go”

Why all this talk about the space program in an essay about AI? Because the organizational behavior parallels are unmistakable: go fever and normalization of deviance are all *easily* mapped onto the AI product failures we’ve observed in the last year or two, and what’s been reported publicly about the cultural attributes of the firms bringing these products to market. Some of these failures are extremely consequential, up to and including physical injury and loss of life. For example, a telltale sign of normalization of deviance in an organization is burden-shifting with respect to safety. Related to AI, the burden of proof in many firms seems to have been shifted from the product team having to prove a model is safe to the responsible AI team having to prove it’s not safe. This is classic normalization of deviance, and not a winning formula to foster a risk-aware corporate culture. I’m not the only one to have taken notice of go fever in AI, with the New York Times having recently chronicled the rise of the “effective acceleration” movement, described as:

“… [a] no-holds-barred pursuit of technological progress … that artificial intelligence and other emerging technologies should be allowed to move as fast as possible, with no guardrails or gatekeepers standing in the way of innovation.”

The piece goes on to describe “shared scorn for the people they call ‘decels’ and ‘doomers’ — the people who worry about the safety of A.I., or the regulators who want to slow it down.” To be fair, this particular view of progress in AI is extreme and not necessarily representative of commercial AI writ large, but it does provide a lens into rationales behind the deprecation of risk awareness and safety consideration in order to move faster.

Even amidst a backdrop of go fever, tech firms are still maintaining responsible AI teams, despite the high-profile layoffs and reorganizations we’ve seen. But this is not an operational approach that will prove effective in the long run.

The police

All commercial organizations of a certain size and maturity have the moral equivalents of law enforcement. Human resources, legal affairs, and the compliance group, to name a few, all bear the responsibility of protecting the company from employee conduct and are generally viewed by employees as ‘the police’. This is not an insult or a dig. It’s how mature companies work and generally speaking, we like the concept of policing to the extent we like rules and law and order. But in a company, let’s be real: no one wants an email or a phone call from the police. Regardless, many firms operationalize responsible AI as a standalone team to review sensitive uses, thus creating another little police department in the company. Did I mention that product groups would rather not have to deal with the police?

If a company demands that product groups talk to the police as a matter of corporate policy, then OK, fine … let’s see what we can learn from the privacy domain by considering the rise of check-box privacy culture in the wake of GDPR in 2018. As UC Irvine law professor Ari Waldman describes in his 2021 book Industry Unbound, companies were found to be more interested in compliance with the regulation than privacy itself, and effectively devolved into a “check-box” culture that looked, walked, talked, and acted like a full-throated commitment to privacy, but in reality was just a full-throated commitment to compliance. All the privacy people were generally put in one place, working for a Chief Privacy Officer, and perceived in the organization the same way any compliance function would be viewed. It was an operational change, not a culture change. If you’re skeptical of this thesis, look no further than the continued deterioration of online privacy since GDPR was enacted. The regulation was and still is easily gamed, with monetary fines having become an acceptable cost of doing business while our personal data is bought & sold all day long.

Responsible AI is headed down the same path. Companies will need to make a choice: A.) ignore the obvious lessons of GDPR and continue prioritizing time-to-market while putting out blog posts about “Our commitment to responsible AI...”, or B.) prioritize image, brand, reputation, and (dare I say) public interest to invest in proper testing and evaluation of these systems at the expense of the product schedule.

So far, the results are overwhelmingly in favor of A. And as Steven Kerr wrote in his seminal 1995 paper, “On the folly of rewarding A while hoping for B” …

“[N]umerous examples exist of reward systems that are fouled up in that behaviors which are rewarded are those which the rewarder is trying to discourage, while the behavior he desires is not being rewarded at all.”

The corporate culture of go fever is at the very heart of the tech industry’s reward system, with normalization of deviance as its key cultural enabler. Responsible AI in its current, standalone operational form is no match for go fever.

From standalone to core

What should firms do? They can start by not rewarding A while hoping for B. If you want B, then reward B. If you don’t want B, then stop claiming your company is committed to B, because that would be deceptive marketing. But I digress.

Core disciplines relating to AI risk management need to move from a standalone function outside the product team to one that is core to engineering.

In a company with a risk-aware corporate culture, there will eventually be no need for an organizationally ring-fenced responsible AI team. There is no need for a little police department. Risk awareness would ideally be part and parcel to the corporate culture, from the executive suite to the mail room and everywhere in between, but engineering is where the rubber really meets the road. There is recent precedent here: security. One imagines you’d be hard-pressed to hold down an engineering job in a technology company without thinking about security all day long. That’s the world we live in, and secure-by-design is now a must-have product attribute for companies to have success in the market.

Management of AI risk is fast-becoming a company imperative for a number of reasons separate and distinct from “doing the right thing for society”, for which there is frankly no financial incentive. Things that do, however, carry strong financial incentives are the impact of diminished image, brand, and reputation in the market, and the hidden financial obligation of technical debt for models that will have to be repaired at a minimum but perhaps re-trained altogether at great expense.

A change in culture entails a reimagining of core roles in product development and engineering. Let’s consider product management as an example, the core function of which is to go out to the field and talk to prospective customers in the target market to understand their points of pain, value levers, willingness to pay at certain price levels, and required features, among other things. These findings (both quant and qual) are then incorporated into a market requirements document (MRD), which is then given to engineering to develop the product specification. What exactly are the new “market requirements” in the AI era? As we’ve seen, they go way beyond feature set and price-performance characteristics. The market is demanding a new set of minimum bars for this generation of models, such as not producing basic harms like privacy intrusions, dis/misinformation generation, and denigration of groups of people. Shouldn’t an MRD for an AI product include an assessment of these risks, as opposed to relying on a separate, bolt-on risk assessment produced by a responsible AI workstream or a standalone safety team? Shouldn’t product managers in AI have a baseline understanding of AI risks and societal impacts as part of their efforts to understand why a customer in the target market would or wouldn’t buy the product under development? Shouldn’t current traction for approaches like datasheets for datasets or ‘model cards’ be folded into core product management? We can really go out on a limb … how about hiring a product manager or two with a social sciences or safety engineering background?

Let’s take system testing as another example. Functional and unit testing in traditional software engineering does not easily map to AI and machine learning models, thus the current shift to red-teaming and evaluation for certain types of models. Setting aside the merits or shortfalls of these approaches, the way in which they’re operationalized matters: are they core disciplines in engineering or are they additive workstreams run by a standalone team that makes it easier for engineering leadership to opt for inaction? Why is red-teaming so often post-release? The software industry has been sharing pre-release products to invite-only community insiders for decades. This may sound like splitting hairs, but lessons from NASA apply here, as Diane Vaughan described its culture as having “provided a way of seeing that was simultaneously a way of not seeing”. It’s hard to not ponder AI and the risk of ignored test and evaluation results in this succinct descriptor.

The partitioning between core engineering and ancillary risk management and safety efforts around it needs to dissolve if a risk-aware culture is to ever take shape. Specifics about roles, responsibilities, and incorporation of proven approaches into an established function is a topic for a later, in-depth post. But generally speaking, AI risk management needs to reside in engineering, and PM and test need to own it.

Concluding thoughts

Weakness in organizational safety culture seems to be at the root of every engineering failure ever studied, regardless of vertical industry or domain. Waring (2013) describes how culture provides employees with a reference frame through which they “interpret their existence within that organization and which enables them to consider what is good and bad, right and wrong, acceptable and unacceptable, imperative and taboo.” When deviance is normalized within an organization, it impacts the reference frame. The body of research into engineering disasters and the role of safety culture is massive but is filled with applicable lessons like these.

Engineering practice for AI is increasingly well-studied. In The Fallacy of AI Functionality (2022), Raji, et al describe how the responsible AI fixation on “ethics” and “value alignment” distracts from the reality of AI systems that are “constructed haphazardly, deployed indiscriminately, and promoted deceptively.” Under normal circumstances, rushing products to market followed by deceptive marketing would be considered “deviant” in a business context, but these deviances have been normalized in commercial AI. The arms race is on in full force, time-to-market is paramount, and company cultures reinforce the behaviors. Reversing this trend (or at a minimum, slowing it down) may unfortunately take a string of failures.

It took NASA multiple disasters to eventually realize it needed a cultural reckoning to get to the root of the agency’s problems. Boeing is going through this now: despite two 737-MAX crashes in 2018/2019 that killed 346 people, the MAX’s continued quality problems have come to light following a near-disastrous close call, followed by a damning report from the FAA criticizing the company’s safety culture. It always comes down to culture, every time.

A final note on the tech industry’s proclivity for trying to PR its way out of these embarrassing product failures: don’t do that anymore … fix your culture instead. Renowned physicist Richard Feynman was part of the Rogers Commission that investigated the Challenger disaster and alluded to the use of outbound communications to paper over cultural infirmities and troubling tolerance of risk, saying, “When playing Russian roulette, the fact that the first shot got off safely is little comfort for the next. For a successful technology, reality must take precedence over public relations, for nature cannot be fooled.”

/end

Tech & Policy

Discussion about this post