The 2010s were arguably the most exciting and consequential decade in the history of artificial intelligence. Though there have certainly been conceptual improvements in the algorithms used in AI, the primary driver of all this progress has simply been deploying more expansive deep neural networks on ever faster computer hardware where they can hoover up greater and greater quantities of training data. This “scaling” strategy has been explicit since the 2012 ImageNet competition that set off the deep learning revolution. In November of that year, a front-page New York Times article was instrumental in bringing awareness of deep learning technology to the broader public sphere. The article, written by reporter John Markoff, ends with a quote from Geoff Hinton: “The point about this approach is that it scales beautifully. Basically you just need to keep making it bigger and faster, and it will get better. There’s no looking back now.”
There is increasing evidence, however, that this primary engine of progress is beginning to sputter out. According to one analysis by the research organization OpenAI, the computational resources required for cutting-edge AI projects is “increasing exponentially” and doubling about every 3.4 months.
In a December 2019 Wired magazine interview, Jerome Pesenti, Facebook’s Vice President of AI, suggested that even for a company with pockets as deep as Facebook’s, this would be financially unsustainable:
When you scale deep learning, it tends to behave better and to be able to solve a broader task in a better way. So, there’s an advantage to scaling. But clearly the rate of progress is not sustainable. If you look at top experiments, each year the cost [is] going up 10-fold. Right now, an experiment might be in seven figures, but it’s not going to go to nine or ten figures, it’s not possible, nobody can afford that.
Pesenti goes on to offer a stark warning about the potential for scaling to continue to be the primary driver of progress: “At some point we’re going to hit the wall. In many ways we already have.” Beyond the financial limits of scaling to ever larger neural networks, there are also important environmental considerations. A 2019 analysis by researchers at the University of Massachusetts, Amherst, found that training a very large deep learning system could potentially emit as much carbon dioxide as five cars over their full operational lifetimes.
Even if the financial and environmental impact challenges can be overcome—perhaps through the development of vastly more efficient hardware or software—scaling as a strategy simply may not be sufficient to produce sustained progress. Ever-increasing investments in computation have produced systems with extraordinary proficiency in narrow domains, but it is becoming increasingly clear that deep neural networks are subject to reliability limitations that may make the technology unsuitable for many mission critical applications unless important conceptual breakthroughs are made. One of the most notable demonstrations of the technology’s weaknesses came when a group of researchers at Vicarious, small company focused on building dexterous robots, performed an analysis of the neural network used in Deep-Mind’s DQN, the system that had learned to dominate Atari video games. One test was performed on Breakout, a game in which the player has to manipulate a paddle to intercept a fast-moving ball. When the paddle was shifted just a few pixels higher on the screen—a change that might not even be noticed by a human player—the system’s previously superhuman performance immediately took a nose dive. DeepMind’s software had no ability to adapt to even this small alteration. The only way to get back to top-level performance would have been to start from scratch and completely retrain the system with data based on the new screen configuration.
What this tells us is that while DeepMind’s powerful neural networks do instantiate a representation of the Breakout screen, this representation remains firmly anchored to raw pixels even at the higher levels of abstraction deep in the network. There is clearly no emergent understanding of the paddle as an actual object that can be moved. In other words, there is nothing close to a human-like comprehension of the material objects that the pixels on the screen represent or the physics that govern their movement. It’s just pixels all the way down. While some AI researchers may continue to believe that a more comprehensive understanding might eventually emerge if only there were more layers of artificial neurons, running on faster hardware and consuming still more data, I think this is very unlikely. More fundamental innovations will be required before we begin to see machines with a more human-like conception of the world.
This general type of problem, in which an AI system is inflexible and unable to adapt to even small unexpected changes in its input data, is referred to, among researchers, as “brittleness.” A brittle AI application may not be a huge problem if it results in a warehouse robot occasionally packing the wrong item into a box. In other applications, however, the same technical shortfall can be catastrophic. This explains, for example, why progress toward fully autonomous self-driving cars has not lived up to some of the more exuberant early predictions.
As these limitations came into focus toward the end of the decade, there was a gnawing fear that the field had once again gotten over its skis and that the hype cycle had driven expectations to unrealistic levels. In the tech media and on social media, one of the most terrifying phrases in the field of artificial intelligence—”AI winter“—was making a reappearance. In a January 2020 interview with the BBC, Yoshua Bengio said that “AI’s abilities were somewhat overhyped . . . by certain companies with an interest in doing so.”
My own view is that if another AI winter indeed looms, it’s likely to be a mild one. Though the concerns about slowing progress are well founded, it remains true that over the past few years AI has been deeply integrated into the infrastructure and business models of the largest technology companies. These companies have seen significant returns on their massive investments in computing resources and AI talent, and they now view artificial intelligence as absolutely critical to their ability to compete in the marketplace. Likewise, nearly every technology startup is now, to some degree, investing in AI, and companies large and small in other industries are beginning to deploy the technology. This successful integration into the commercial sphere is vastly more significant than anything that existed in prior AI winters, and as a result the field benefits from an army of advocates throughout the corporate world and has a general momentum that will act to moderate any downturn.
There’s also a sense in which the fall of scalability as the primary driver of progress may have a bright side. When there is a widespread belief that simply throwing more computing resources at a problem will produce important advances, there is significantly less incentive to invest in the much more difficult work of true innovation. This was arguably the case, for example, with Moore’s Law. When there was near absolute confidence that computer speeds would double roughly every two years, the semiconductor industry tended to focus on cranking out ever faster versions of the same microprocessor designs from companies like Intel and Motorola. In recent years, the acceleration in raw computer speeds has become less reliable, and our traditional definition of Moore’s Law is approaching its end game as the dimensions of the circuits imprinted on chips shrink to nearly atomic size. This has forced engineers to engage in more “out of the box” thinking, resulting in innovations such as software designed for massively parallel computing and entirely new chip architectures—many of which are optimized for the complex calculations required by deep neural networks. I think we can expect the same sort of idea explosion to happen in deep learning, and artificial intelligence more broadly, as the crutch of simply scaling to larger neural networks becomes a less viable path to progress.
Excerpted from “Rule of the Robots: How Artificial Intelligence will Transform Everything.” Copyright 2021 Basic Books. Available from Basic Books, an imprint of Hachette Book Group, Inc.