Benjie’s Humanoid Olympic Games – by Benjie Holson


I was kinda disappointed by the World Humanoid Robot Games. As fun as real-life rock-em-sock-em robots is, what people really care about is robots doing their chores. This is why robot laundry folding videos are so popular. Those laundry videos are super impressive: we didn’t know how to do that even a few years ago. And it is certainly something that people want! But as this article so nicely articulates, basic laundry folding is in a sweet spot given the techniques we have now. It might feel like if our AI techniques can fold laundry maybe they can do anything, but that isn’t true, and we’re going to have to invent new techniques to be really general purpose and useful.

With that in mind I am issuing a challenge to roboticists: here are my Humanoid Olympics. Each event will require us to push the state of the art and unlock new capabilities. I will update this post as folks achieve these milestones, and will mail actual real-life medals to the winners.

In order to talk about why each of these challenges pushes the state of the art, let’s talk about what’s working now. What I’m seeing working is learning-from-demonstration. Folks get some robots and some puppeteering interfaces (standard seems to be two copies of the robot where you grab & move one of them and the other matches, or an Oculus headset + controllers or hand tracking) and record some 10-30 second activity over and over again (100s of times). We can then train a neural network to mimic those examples. This has unlocked tasks that have steps that are somewhat chaotic (like pulling a corner of a towel to see if it lays flat) or high state space (like how a wooden block is on one of 6 sides but a towel can be bunched up in myriad different ways). But thinking about it, it should be clear what some of the limitations are. Each of these has exceptions, but form a general trend.

  1. No force feedback at the wrists. The robot can only ever perform as well as the human teleoperation and we don’t yet have good standard ways of getting force information to the human teleoperator.

  2. Limited finger control. It’s hard for the teleoperator (and AI foundation model) to see and control all the robot fingers with more finesse than just open/close.

  3. No sense of touch. Human hands are packed absolutely full of sensors. Getting anywhere near that kind of sensing out of robot hands and usable by a human puppeteer is not currently possible.

  4. Medium precision. Guessing based on videos I think we’ve got about 1-3 cm precision for tasks.

Folding towels and t-shirts doesn’t depend on high wrist forces. You can get away with just hand open/close by using pinch grasps to pull and lift and open hands to spread. You can visually see how your grasp is so you don’t need finger sensing. 1-3 cm precision is just fine.

So what comes next? On to the events!

Doors are tricky because of the asymmetric forces: you need to grasp and twist quite hard, but if you pull hard outside of the arc of the door you tend to slip your grasp. Also, they require whole body manipulation, which is more than I’ve seen from anyone yet.

I think this is very close to state of the art (or maybe has happened and I didn’t see it). I expect this one to be claimed by December.

We have a winner: IHMC Robotics wins Bronze with a time of 18 seconds.

Adding self-closing makes this significantly more challenging, though the lever handle is arguably easier (I just don’t see many round-knob self-closing doors).

The bossfight of doors. You need to either use a second limb to block the door from re-closing or go fast enough to use dynamics.

We’re just getting started on laundry.

This is probably doable using the techniques we have now, but is a longer horizon task and might require some tricky two handed actions to pull the shirt through to right-side-out.

I think both the hand-insertion and the action of pinching the inside of the sock are interesting new challenges.

The size medium shirt starts unbuttoned with one sleeve inside-out. Must end up on the hanger correctly with the sleeve fixed and at least one button buttoned. I think this one is 3-10 years out, both because buttons are really hard and because getting a strong, dextrous hand small enough to fit in a sleeve is going to be hard.

Humans are creatures of technology and, as useful as our hands are, we mostly use them to hold and manipulate tools. This challenge is about building the strength and dexterity to use tools.

The windex ammonia-based-window-cleaning-fluid bottle is super forgiving in terms of how you grasp it, but you do need to independently articulate a finger (and the finger has to be pretty strong to get fluid to spray out).

The challenge here is to pick up a knife and then adjust the grasp to be strong and stable enough to scoop and spread the peanut butter. Humans use a ‘strong tool grasp’ for all kinds of activities but it is very challenging for robot grippers.

A keyring with at least 2 keys and a keychain is dropped into the robot’s waiting palm/gripper. Without putting the keys down, get the correct key aligned and inserted and turned in a lock. This requires very challenging in-hand manipulation, high precision and interesting forceful interaction.

We humans do all kinds of in-hand manipulation using the structure of our hands to manipulate things we are holding..

Requires dexterity and some precision but not very much force.

When I use a dog-bag I have to do a slide-between-the-fingertips action to separate the opening of the bag which is a tricky forceful interaction as well as a motion that I’m not even sure most robot hands are capable of. Also tricky is tearing off a single bag rather than pulling a big long spool out of the holder, if you choose to use one.

Done without external tools. This is super tricky: high force yet high precision fingertip actions.

If you sit down and write out what you might want a robot to do for you: a lot of tasks end up being kind of wet. Robots usually don’t like being wet: but we’ll have to change that if we want to have them clean for us.

Mildly damp, but with exciting risk of getting the whole hand in the water if you aren’t careful. Probably requires at least splash resistant hands (or a whole bunch of spares).

This one naturally follows after the sandwich one. Water everywhere. Seems like an important skill to have after a few hours collecting training data on the dog-poop task.

Water, soap, grease, and an unpleasant task no one wants to do.

Complain, comment and discuss on hacker-news.

And you should subscribe because I will post as folks achieve these challenges!

To be eligible to win it must be a 1x speed video with no cuts, featuring a general purpose mobile(?) manipulator robot running autonomously. (Wheels and centaur robots: totally fair game; industrial automation orange peelers don’t count.)

You are allowed 10x the time I took (e.g. a 4 second task can take 40 seconds). I reserve the right to be arbitrary in deciding if things aren’t following in the spirit of the challenge. First robot wins the prize.

To claim your medallion email bmholson+olympics@gmail.com with an address for me to ship it to. If you give me a photo of your robot wearing a medal I will be tickled pink.

I will also accept future challengers that are at least 25% faster than the current winner. Good luck and may the odds be ever in your favor.

I’m conflicted on if I should allow “arms bolted to a table” entries or require that all entrants must be mobile manipulators (obviously the door entrants have to be.) I’ll let y’all decide: challenge will be locked in based on this poll.

Thanks to Jeff Bingham for advice, fact checking and cool robot videos. Thanks to my patient wife for spending an hour filming me doing silly things in a silly costume.



Source link