ASIMO can recognize three speakers simultaneously

By Damir Beciri

8 September 2009

In one of our previous articles we started to cover the features of ASIMO – one of the most advanced humanoid robots of today. Besides footsteps planning, ASIMO can understand three humans speaking simultaneously (to some degree). Hiroshi Okuno at Kyoto University, and Kazuhiro Nakadai at the Honda Research Institute in Saitama, both in Japan, have designed the software, which they call HARK. Okuno and Nakadai claim it can follow three speakers simultaneously with 70 to 80 percent accuracy when installed into Honda’s ASIMO robot.

HARK uses an array of eight microphones to work out where each voice is coming from and isolate it from other sound sources. The software first processes it by determining how reliably it has extracted an individual voice, before passing it onto speech-recognition software to decode. That quality control step is important. The other voices are likely to confuse speech recognition software. So any parts of the sound file that contain a lot of background noise across a range of frequencies are automatically ignored when the patched-up recording of each voice is passed on to a speech-recognition system.

The HARK system actually goes beyond normal human listening capabilities, because it can listen to several sources at once, and not just focus on a particular single sound source. While focusing on a single voice among many is known as the “cocktail party effect“, Okuno calls the ability to focus on multiple voices at once the “Prince Shotoku Effect”. According to Japanese legend, Prince Shotoku listened to 10 people’s petitions at the same time.

In their demonstration, modified ASIMO’s ability is being used to judge the vocal variation of rock-paper-scissors game, where three people call out their choices at once. It might seem simple, but the number of voices and the complexity of the sentences the software can deal with should grow in future. The array of eight microphones is placed around the ASIMO’s face and body, helping the software to accurately detect and isolate simultaneous voices. “The number of sound sources and their directions are not given to the system in advance,” says Nakadai.

Rock-paper-scissors uses a small vocabulary, making the task easier. When Okuno and Nakadai tried using their software to follow several complicated sentences at once, as three people shouted out restaurant orders, it could only identify 30 to 40 percent of what was said. Before we can match the performance of human listeners in ‘cocktail party’ situations.

This entry was posted on Tuesday, Sep 8th, 2009 at 2:28PM and filed under Robotics.

Tags: Asimo, asimo robot, honda asimo robot, humanoid robot, robots, sound, speech recognition software

Menu

ASIMO can recognize three speakers simultaneously

Leave your response!

Antibacterial power of black silicon inspired by cicada wings

Green architecture – Junction House, Melbourne

Using light to dramatically improve conductivity at room temperature

Improving hydrogen production with copper nanowires

Butterfly biomimicry

Butterfly wings biomimicry for dirt free coated surfaces

Analyzing butterfly flight for better MAV maneuverability

NanoTech Security KolourOptiks – butterly inspired anti-counterfiet technology

Biomimicry of butterfly wing scale structure could cut bank fraud

Subscribe

3 latest robotics articles

Recent comments

Article updates

Subscribe

Menu

ASIMO can recognize three speakers simultaneously

Related posts:

Leave your response!

Butterfly biomimicry

Subscribe

3 latest robotics articles

Recent comments

Article updates

Subscribe