With Boston Dynamics’ recent(ish) emphasis on making robots that can do things that are commercially useful, it’s always good to be gently reminded that the company is still at the cutting edge of dynamic humanoid robotics. Or in this case, forcefully reminded. In its latest video, Boston Dynamics demonstrates some spectacular new capabilities with Atlas focusing on perception and manipulation, and the Atlas team lead answers some of our questions about how they pulled it off.
One of the highlights here is Atlas’ ability to move and interact dynamically with objects, and especially with objects that have significant mass to them. The 180 while holding the beam is impressive, since Atlas has to account for all that added momentum. Same with the spinning bag toss: as soon as the robot releases the bag in mid-air, its momentum changes, which it has to compensate for on landing. And shoving that box over has to be done by leaning into it, but carefully, so that Atlas doesn’t topple off the platform after it.
While the physical capabilities that Atlas demonstrates here are impressive (to put it mildly), this demonstration also highlights just how much work remains to be done to teach robots to be useful like this in an autonomous, or even a semi-autonomous, way. For example, environmental modification is something that humans do all the time, but we rely heavily on our knowledge of the world to do it effectively. I’m pretty sure that Atlas doesn’t have the capability to see a non-traversable gap, consider what kind of modification would be required to render the gap traversable, locate the necessary resources (without being told where they are first), and then make the appropriate modification autonomously in the way a human would—the video shows advances in manipulation rather than decision making. This certainly isn’t a criticism of what Boston Dynamics is showing in this video, it’s just to emphasize there is still a lot of work to be done on the world understanding and reasoning side before robots will be able to leverage these impressive physical skills on their own in a productive way.
There’s a lot more going on in this video, and Boston Dynamics has helpfully put together a bit of a behind-the-scenes explainer:
And for a bit more on this, we sent a couple of questions over to Boston Dynamics, and Atlas Team Lead Scott Kuindersma was kind enough to answer them for us.
How much does Atlas know in advance about the objects that it will be manipulating, and how important is this knowledge for real-world manipulation?
Scott Kuindersma: In this video, the robot has a high-level map that includes where we want it to go, what we want it to pick up, and what stunts it should do along the way. This map is not an exact geometric match for the real environment; it is an approximate description containing obstacle templates and annotated actions that is adapted online by the robot’s perception system. The robot has object-relative grasp targets that were computed offline, and the model-predictive controller (MPC) has access to approximate mass properties.
We think that real-world robots will similarly leverage priors about their tasks and environments, but what form these priors take and how much information they provide could vary a lot based on the application. The requirements for a video like this lead naturally to one set of choices—and maybe some of those requirements will align with some early commercial applications—but we’re also building capabilities that allow Atlas to operate at other points on this spectrum.
How often is what you want to do with Atlas constrained by its hardware capabilities? At this point, how much of a difference does improving hardware make, relative to improving software?
Kuindersma: Not frequently. When we occasionally spend time on something like the inverted 540, we are intentionally pushing boundaries and coming at it from a place of playful exploration. Aside from being really fun for us and (hopefully) inspiring to others, these activities nearly always bear enduring fruit and leave us with more capable software for approaching other problems.
The tight integration between our hardware and software groups—and our ability to design, iterate, and learn from each other—is one of the things that makes our team special. This occasionally leads to behavior-enabling hardware upgrades and, less often, major redesigns. But from a software perspective, we continuously feel like we’re just scratching the surface on what we can do with Atlas.
Can you elaborate on the troubleshooting process you used to make sure that Atlas could successfully execute that final trick?
Kuindersma: The controller works by using a model of the robot to predict and optimize its future states. The improvement made in this case was an extension to this model to include the geometric shape of the robot’s limbs and constraints to prevent them from intersecting. In other words, rather than specifically tuning this one behavior to avoid self-collisions, we added more model detail to the controller to allow it to better avoid infeasible configurations. This way, the benefits carry forward to all of Atlas’ behaviors.
Is the little hop at the end of the 540 part of the planned sequence, or is Atlas able to autonomously use motions like that to recover from dynamic behaviors that don’t end up exactly as expected? How important will this kind of capability be for real-world robots?
Kuindersma: The robot has the ability to autonomously take steps, lean, and/or wave its limbs around to recover balance, which we leverage on pretty much a daily basis in our experimental work. The hop jump after the inverted 540 was part of the behavior sequence in the sense that it was told that it should jump after landing, but where it jumped to and how it landed came from the controller (and generally varied between individual robots and runs).
Our experience with deploying Spot all over the world has reinforced the importance for mobile robots to be able to adjust and recover if they get bumped, slip, fall, or encounter unexpected obstacles. We expect the same will be true for future robots doing work in the real world.
What else can you share with us about what went into making the video?
Kuindersma: A few fun facts:
The core new technologies around MPC and manipulation were developed throughout this year, but the time between our whiteboard sketch for the video and completing filming was 6 weeks.
The tool bag throw and spin jump with the 2 by 12 inch plank are online generalizations of the same 180 jump behavior that was created two years ago as part of our mobility work. The only differences in the controller inputs are the object model and the desired object motion.
Although the robot has a good understanding of throwing mechanics, the real world performance was sensitive to the precise timing of the release and whether the bag cloth happened to get caught on the finger during release. These details weren’t well represented by our simulation tools, so we relied primarily on hardware experiments to refine the behavior until it worked every time.