This article is part of our special report on AI, “The Great AI Reckoning.”
“I should probably not be standing this close,” I think to myself, as the robot slowly approaches a large tree branch on the floor in front of me. It’s not the size of the branch that makes me nervous—it’s that the robot is operating autonomously, and that while I know what it’s supposed to do, I’m not entirely sure what it will do. If everything works the way the roboticists at the U.S. Army Research Laboratory (ARL) in Adelphi, Md., expect, the robot will identify the branch, grasp it, and drag it out of the way. These folks know what they’re doing, but I’ve spent enough time around robots that I take a small step backwards anyway.
The robot, named
RoMan, for Robotic Manipulator, is about the size of a large lawn mower, with a tracked base that helps it handle most kinds of terrain. At the front, it has a squat torso equipped with cameras and depth sensors, as well as a pair of arms that were harvested from a prototype disaster-response robot originally developed at NASA’s Jet Propulsion Laboratory for a DARPA robotics competition. RoMan’s job today is roadway clearing, a multistep task that ARL wants the robot to complete as autonomously as possible. Instead of instructing the robot to grasp specific objects in specific ways and move them to specific places, the operators tell RoMan to “go clear a path.” It’s then up to the robot to make all the decisions necessary to achieve that objective.
The ability to make decisions autonomously is not just what makes robots useful, it’s what makes robots
robots. We value robots for their ability to sense what’s going on around them, make decisions based on that information, and then take useful actions without our input. In the past, robotic decision making followed highly structured rules—if you sense this, then do that. In structured environments like factories, this works well enough. But in chaotic, unfamiliar, or poorly defined settings, reliance on rules makes robots notoriously bad at dealing with anything that could not be precisely predicted and planned for in advance.
RoMan, along with many other robots including home vacuums, drones, and autonomous cars, handles the challenges of semistructured environments through artificial neural networks—a computing approach that loosely mimics the structure of neurons in biological brains. About a decade ago, artificial neural networks began to be applied to a wide variety of semistructured data that had previously been very difficult for computers running rules-based programming (generally referred to as symbolic reasoning) to interpret. Rather than recognizing specific data structures, an artificial neural network is able to recognize data patterns, identifying novel data that are similar (but not identical) to data that the network has encountered before. Indeed, part of the appeal of artificial neural networks is that they are trained by example, by letting the network ingest annotated data and learn its own system of pattern recognition. For neural networks with multiple layers of abstraction, this technique is called deep learning.
Even though humans are typically involved in the training process, and even though artificial neural networks were inspired by the neural networks in human brains, the kind of pattern recognition a deep learning system does is fundamentally different from the way humans see the world. It’s often nearly impossible to understand the relationship between the data input into the system and the interpretation of the data that the system outputs. And that difference—the “black box” opacity of deep learning—poses a potential problem for robots like RoMan and for the Army Research Lab.
In chaotic, unfamiliar, or poorly defined settings, reliance on rules makes robots notoriously bad at dealing with anything that could not be precisely predicted and planned for in advance.
This opacity means that robots that rely on deep learning have to be used carefully. A deep-learning system is good at recognizing patterns, but lacks the world understanding that a human typically uses to make decisions, which is why such systems do best when their applications are well defined and narrow in scope. “When you have well-structured inputs and outputs, and you can encapsulate your problem in that kind of relationship, I think deep learning does very well,” says
Tom Howard, who directs the University of Rochester’s Robotics and Artificial Intelligence Laboratory and has developed natural-language interaction algorithms for RoMan and other ground robots. “The question when programming an intelligent robot is, at what practical size do those deep-learning building blocks exist?” Howard explains that when you apply deep learning to higher-level problems, the number of possible inputs becomes very large, and solving problems at that scale can be challenging. And the potential consequences of unexpected or unexplainable behavior are much more significant when that behavior is manifested through a 170-kilogram two-armed military robot.
After a couple of minutes, RoMan hasn’t moved—it’s still sitting there, pondering the tree branch, arms poised like a praying mantis. For the last 10 years, the Army Research Lab’s Robotics Collaborative Technology Alliance (RCTA) has been working with roboticists from Carnegie Mellon University, Florida State University, General Dynamics Land Systems, JPL, MIT, QinetiQ North America, University of Central Florida, the University of Pennsylvania, and other top research institutions to develop robot autonomy for use in future ground-combat vehicles. RoMan is one part of that process.
The “go clear a path” task that RoMan is slowly thinking through is difficult for a robot because the task is so abstract. RoMan needs to identify objects that might be blocking the path, reason about the physical properties of those objects, figure out how to grasp them and what kind of manipulation technique might be best to apply (like pushing, pulling, or lifting), and then make it happen. That’s a lot of steps and a lot of unknowns for a robot with a limited understanding of the world.
This limited understanding is where the ARL robots begin to differ from other robots that rely on deep learning, says Ethan Stump, chief scientist of the AI for Maneuver and Mobility program at ARL. “The Army can be called upon to operate basically anywhere in the world. We do not have a mechanism for collecting data in all the different domains in which we might be operating. We may be deployed to some unknown forest on the other side of the world, but we’ll be expected to perform just as well as we would in our own backyard,” he says. Most deep-learning systems function reliably only within the domains and environments in which they’ve been trained. Even if the domain is something like “every drivable road in San Francisco,” the robot will do fine, because that’s a data set that has already been collected. But, Stump says, that’s not an option for the military. If an Army deep-learning system doesn’t perform well, they can’t simply solve the problem by collecting more data.
ARL’s robots also need to have a broad awareness of what they’re doing. “In a standard operations order for a mission, you have goals, constraints, a paragraph on the commander’s intent—basically a narrative of the purpose of the mission—which provides contextual info that humans can interpret and gives them the structure for when they need to make decisions and when they need to improvise,” Stump explains. In other words, RoMan may need to clear a path quickly, or it may need to clear a path quietly, depending on the mission’s broader objectives. That’s a big ask for even the most advanced robot. “I can’t think of a deep-learning approach that can deal with this kind of information,” Stump says.
While I watch, RoMan is reset for a second try at branch removal. ARL’s approach to autonomy is modular, where deep learning is combined with other techniques, and the robot is helping ARL figure out which tasks are appropriate for which techniques. At the moment, RoMan is testing two different ways of identifying objects from 3D sensor data: UPenn’s approach is deep-learning-based, while Carnegie Mellon is using a method called perception through search, which relies on a more traditional database of 3D models. Perception through search works only if you know exactly which objects you’re looking for in advance, but training is much faster since you need only a single model per object. It can also be more accurate when perception of the object is difficult—if the object is partially hidden or upside-down, for example. ARL is testing these strategies to determine which is the most versatile and effective, letting them run simultaneously and compete against each other.
Perception is one of the things that deep learning tends to excel at. “The computer vision community has made crazy progress using deep learning for this stuff,” says Maggie Wigness, a computer scientist at ARL. “We’ve had good success with some of these models that were trained in one environment generalizing to a new environment, and we intend to keep using deep learning for these sorts of tasks, because it’s the state of the art.”
ARL’s modular approach might combine several techniques in ways that leverage their particular strengths. For example, a perception system that uses deep-learning-based vision to classify terrain could work alongside an autonomous driving system based on an approach called inverse reinforcement learning, where the model can rapidly be created or refined by observations from human soldiers. Traditional reinforcement learning optimizes a solution based on established reward functions, and is often applied when you’re not necessarily sure what optimal behavior looks like. This is less of a concern for the Army, which can generally assume that well-trained humans will be nearby to show a robot the right way to do things. “When we deploy these robots, things can change very quickly,” Wigness says. “So we wanted a technique where we could have a soldier intervene, and with just a few examples from a user in the field, we can update the system if we need a new behavior.” A deep-learning technique would require “a lot more data and time,” she says.
It’s not just data-sparse problems and fast adaptation that deep learning struggles with. There are also questions of robustness, explainability, and safety. “These questions aren’t unique to the military,” says Stump, “but it’s especially important when we’re talking about systems that may incorporate lethality.” To be clear, ARL is not currently working on lethal autonomous weapons systems, but the lab is helping to lay the groundwork for autonomous systems in the U.S. military more broadly, which means considering ways in which such systems may be used in the future.
The requirements of a deep network are to a large extent misaligned with the requirements of an Army mission, and that’s a problem.
Safety is an obvious priority, and yet there isn’t a clear way of making a deep-learning system verifiably safe, according to Stump. “Doing deep learning with safety constraints is a major research effort. It’s hard to add those constraints into the system, because you don’t know where the constraints already in the system came from. So when the mission changes, or the context changes, it’s hard to deal with that. It’s not even a data question; it’s an architecture question.” ARL’s modular architecture, whether it’s a perception module that uses deep learning or an autonomous driving module that uses inverse reinforcement learning or something else, can form parts of a broader autonomous system that incorporates the kinds of safety and adaptability that the military requires. Other modules in the system can operate at a higher level, using different techniques that are more verifiable or explainable and that can step in to protect the overall system from adverse unpredictable behaviors. “If other information comes in and changes what we need to do, there’s a hierarchy there,” Stump says. “It all happens in a rational way.”
Nicholas Roy, who leads the Robust Robotics Group at MIT and describes himself as “somewhat of a rabble-rouser” due to his skepticism of some of the claims made about the power of deep learning, agrees with the ARL roboticists that deep-learning approaches often can’t handle the kinds of challenges that the Army has to be prepared for. “The Army is always entering new environments, and the adversary is always going to be trying to change the environment so that the training process the robots went through simply won’t match what they’re seeing,” Roy says. “So the requirements of a deep network are to a large extent misaligned with the requirements of an Army mission, and that’s a problem.”
Roy, who has worked on abstract reasoning for ground robots as part of the RCTA, emphasizes that deep learning is a useful technology when applied to problems with clear functional relationships, but when you start looking at abstract concepts, it’s not clear whether deep learning is a viable approach. “I’m very interested in finding how neural networks and deep learning could be assembled in a way that supports higher-level reasoning,” Roy says. “I think it comes down to the notion of combining multiple low-level neural networks to express higher level concepts, and I do not believe that we understand how to do that yet.” Roy gives the example of using two separate neural networks, one to detect objects that are cars and the other to detect objects that are red. It’s harder to combine those two networks into one larger network that detects red cars than it would be if you were using a symbolic reasoning system based on structured rules with logical relationships. “Lots of people are working on this, but I haven’t seen a real success that drives abstract reasoning of this kind.”
For the foreseeable future, ARL is making sure that its autonomous systems are safe and robust by keeping humans around for both higher-level reasoning and occasional low-level advice. Humans might not be directly in the loop at all times, but the idea is that humans and robots are more effective when working together as a team. When the most recent phase of the Robotics Collaborative Technology Alliance program began in 2009, Stump says, “we’d already had many years of being in Iraq and Afghanistan, where robots were often used as tools. We’ve been trying to figure out what we can do to transition robots from tools to acting more as teammates within the squad.”
RoMan gets a little bit of help when a human supervisor points out a region of the branch where grasping might be most effective. The robot doesn’t have any fundamental knowledge about what a tree branch actually is, and this lack of world knowledge (what we think of as common sense) is a fundamental problem with autonomous systems of all kinds. Having a human leverage our vast experience into a small amount of guidance can make RoMan’s job much easier. And indeed, this time RoMan manages to successfully grasp the branch and noisily haul it across the room.
Turning a robot into a good teammate can be difficult, because it can be tricky to find the right amount of autonomy. Too little and it would take most or all of the focus of one human to manage one robot, which may be appropriate in special situations like explosive-ordnance disposal but is otherwise not efficient. Too much autonomy and you’d start to have issues with trust, safety, and explainability.
“I think the level that we’re looking for here is for robots to operate on the level of working dogs,” explains Stump. “They understand exactly what we need them to do in limited circumstances, they have a small amount of flexibility and creativity if they are faced with novel circumstances, but we don’t expect them to do creative problem-solving. And if they need help, they fall back on us.”
RoMan is not likely to find itself out in the field on a mission anytime soon, even as part of a team with humans. It’s very much a research platform. But the software being developed for RoMan and other robots at ARL, called Adaptive Planner Parameter Learning (APPL), will likely be used first in autonomous driving, and later in more complex robotic systems that could include mobile manipulators like RoMan. APPL combines different machine-learning techniques (including inverse reinforcement learning and deep learning) arranged hierarchically underneath classical autonomous navigation systems. That allows high-level goals and constraints to be applied on top of lower-level programming. Humans can use teleoperated demonstrations, corrective interventions, and evaluative feedback to help robots adjust to new environments, while the robots can use unsupervised reinforcement learning to adjust their behavior parameters on the fly. The result is an autonomy system that can enjoy many of the benefits of machine learning, while also providing the kind of safety and explainability that the Army needs. With APPL, a learning-based system like RoMan can operate in predictable ways even under uncertainty, falling back on human tuning or human demonstration if it ends up in an environment that’s too different from what it trained on.
It’s tempting to look at the rapid progress of commercial and industrial autonomous systems (autonomous cars being just one example) and wonder why the Army seems to be somewhat behind the state of the art. But as Stump finds himself having to explain to Army generals, when it comes to autonomous systems, “there are lots of hard problems, but industry’s hard problems are different from the Army’s hard problems.” The Army doesn’t have the luxury of operating its robots in structured environments with lots of data, which is why ARL has put so much effort into APPL, and into maintaining a place for humans. Going forward, humans are likely to remain a key part of the autonomous framework that ARL is developing. “That’s what we’re trying to build with our robotics systems,” Stump says. “That’s our bumper sticker: ‘From tools to teammates.’ ”
This article appears in the October 2021 print issue as “Deep Learning Goes to Boot Camp.”