|
|
|
| Recent accomplishments A new capability, autonomous skill development, has been
accomplished on our SAIL robot for an imitation learning mode, during which
human teacher guides the actions of the robot online. For example, for the first time in the field, a human teacher
can teach a "robot baby" how to navigate along a corridor and how to
make turns properly using its vision, simply by taking it for a walk by pushing
it along the corridor, without in-lab pre-training. This new learning capability is made possible by a new kind
of program, called developmental program, which automatically derives internal
representation, including deriving features to be used at any time, from the
sensory signal that the robot senses in real-time. A fundamental difference between our new SAIL developmental
program and other traditional programs is that it enables a robot to learn tasks
that its programmer does not know at the time of programming.
Consequently, this new capability has greatly increased the capability of
the robot to learn new tasks in unknown complex environment. It has also
drastically reduced the difficulty of human programming for sophisticated
intelligent robots. The above new capability, autonomous skill development, has
also been tested successfully for another learning mode, reinforcement learning
mode, through computer simulation using real corridor navigation video sequence.
In this learning mode, human teacher needs only to encourage or
discourage the robot while it is exploring and practicing on its own.
The Q-learning algorithm has been modified and integrated with our
Hierarchical Discriminant Regression (HDR) method to deal with the challenging
high-dimensional input, a large number of states, and real-time response.
Furthermore, the imitation and reinforcement learning modes can be
totally interleaved in any order and in any time duration. It is up to the human
teacher to decide according to the performance of the robot at that time.
With this new capability, human can allow the robot to explore
autonomously in the real physical world, while giving instructions from time to
time through hand-in-hand teaching (imitation learning mode) or giving
encouragement and discouragement (reinforcement learning mode). At a longer time scale, the deliberative layer builds an
environmental model and makes the plan. One of the most difficult challenges
here is to enable robot to learn to act reliably in confusing or perceptually
aliased situations (e.g. in an office environment, two corridors or
intersections can look very alike). We have investigated a new approach to
acting in perceptually aliased environments by building multi-scale hierarchical
spatial models. In these models, higher levels of the hierarchy represent more
"abstract" concepts, such as corridors or intersections, whereas lower
levels of the hierarchy represent regions within a corridor. We have implemented
a novel solution using the framework of Hierarchical Hidden Markov Models
(HHMMs). In an experimental study in indoor robot navigation, we have shown
faster learning by reusing submodels, better fit of the model to the training
data, better localization of the robot, and the ability to infer topological
structure of the environment. We
have also implemented a planning system using HHMM models, where the robot can
find its way to a destination location. We have investigated another novel approach to acting in
perceptually aliased environments, based on remembering previous observations
and actions. We have extended
Q-learning with a hierarchical short-term memory method that rapidly brings to
bear past experience that is appropriate to the grain-size of the decisions
being considered. At higher levels
in the hierarchy, the agent abstracts over lower-level details and looks back
over a variable number of high-level decisions in time. We formalized this idea in a framework called Hierarchical
Short-Term Memory (HSM). We have shown that this framework outperforms several
related reinforcement learning techniques on a challenging simulated corridor
navigation task. The research in the servo control layer focuses on the following subtopics, the path tracking for non-holonomic mobile robots, control of mobile manipulators, the interaction between human being and mobile manipulator, and formation control of multiple autonomous mobile robots. The goal of this research is to develop tracking controllers in a perceptive frame, which can deal with uncertain obstacles and ensure the stability of the controller. Tracking control for simple paths such as straight lines, circles as well as more complex paths have been developed and implemented on a Nomadic XR4000 mobile robot. With the help of sensors, obstacle avoidance is also tested. Equipped with these controllers, the command from deliberative and interactive layers can be executed with presence of obstacles. Another work at the servo control layer is the interaction
between human and mobile manipulator.
If the control interaction goes through the Internet, we need to overcome
the uncertain time delay existed in the Internet.
A real-time control of the mobile robot over the Internet has been
tested.
|
|