Donut algorithm allows robots to learn from our mistakes
Instead being treated as useless mistakes, failed demonstrations can provide great insights into better learning, claim scientists from EPFL’s Learning Algorithms and Systems Laboratory (LASA). Their unusual point of view has led to the development of novel algorithms which enable machines to learn more rapidly and outperform humans by starting from failed or inaccurate demonstrations.
“We inversed the principle, generally accepted in robotics, of acquisition by imitation, and considered cases in which humans are inaccurate in certain tasks”, said professor Aude Billard, head of LASA.
Robot’s instructor needs to demonstrate false attempts of desired action and robot uses iterations of possible solutions until it finds the solution. The research does use ideas from Reinforcement Learning (RL) to deal with noisy demonstrations and to improve the robot’s performance beyond that of the human. However, unlike traditional RL where human has successfully completed the desired task, this method assumes that the humans have failed, and use their demonstrations as a negative constraint on exploration.
Postdoctoral researcher Dan Grollman based his work on what he calls the “Donut as I do” theory. He developed an algorithm that tells the robot not to reproduce a demonstrator’s inaccurate gesture, but rather search for alternative solutions. Thus the play of words with “do not”, where donut’s hole in the middle is the incorrect gesture, which must be excluded, and the surrounding dough represents the field of potential solutions to explore.
“This approach allows the robot to go further, to learn more quickly and above all, outperform the human”, said Grollman, who was recently awarded a “Best Paper Award” for an article on the subject presented at the International Conference on Robotics and Automation (ICRA), in Shanghai.
“We were inspired by the way in which humans learn”, said Billard. “Children often progress by making mistakes or by observing others’ mistakes and assimilating the fact that they must not reproduce them.”
The researchers are looking for ways to improve Donut’s performance. One approach is to use sampling in an attempt to find the global maxima instead of a local one. However, each additional sample would require its own gradient ascent, which is computationally costly and it could lead to potentially unsafe velocities and torques.
For more information, you can read the article named: “Donut as I do: Learning from failed demonstrations” (PDF).