Programming through demonstration – robot flipping pancakes
If you haven’t learned to flip pancakes yet, here is a robot that will put you to shame. Acquiring new motor skills involves various forms of learning. The efficiency of the process lies in the interconnections between imitation and self-improvement strategies. A team of researchers from the Italian Institute of Technology are developing algorithms which enable robots to acquire skills after they are shown how to perform them.
Dr. Sylvain Calinon, Dr. Peter Kormushev and Professor Darwin G. Caldwellat from the Italian Institute of Technology in Genova, Italy, are pursuing the problem of programming through demonstration called kinesthetic teaching with a robot arm. The motion is encoded in a mixture of basis force fields through an extension of Dynamic Movement Primitives (DMP) that represents the synergies across the different variables through stiffness matrices. An Inverse Dynamics controller with variable stiffness is used for reproduction.
The skill is first demonstrated via kinesthetic teaching, and then refined by Policy learning by Weighting Exploration with the Returns (PoWER) algorithm. Compared to policy-gradient approaches, the reward is treated as a pseudo-probability, which allows Reinforcement Learning (RL) to use probabilistic estimation methods such as Expectation-Maximization (EM). The following video shows a Barrett WAM 7 DOFs manipulator learning to flip pancakes by RL.
After 50 trials, the robot learns that the first part of the task requires a stiff behavior to throw the pancake in the air, while the second part requires the hand to be compliant in order to catch the pancake without having it bounced off the pan.
In the experiments presented here, imitation learning is used as an initialization phase, and afterwards RL is used to explore for better solutions. Both processes could, however, be interlaced. Depending on the user’s availability, the user could occasionally participate in the evaluation of new policies explored by the robot. For example, the user can manually give reward or punishment signals to the RL module, or provide new examples in case the robot’s improvement is too slow. The researchers plan to consider such interaction in their future work.
For more information (and MATLAB source code of the algorithm) visit the publication page of a paper named: “Robot Motor Skill Coordination with EM-based Reinforcement Learning”.