This document provides an overview of cognitive robotics technology and tools, including the iCub humanoid robot, YARP (Yet Another Robotics Platform), GPUs (Graphics Processing Units), and Aquila. It describes the iCub robot and simulator, how YARP supports building robot control systems across machines, and how GPUs can help with computationally intensive tasks in cognitive robotics like vision processing. It also discusses the inspiration and goals for the Aquila cognitive robotics toolkit.
2. Overview
iCub – iCognitive universal body
YARP – Yet Another Robotics Platform
GPU – Graphics Processing Unit
Aquila – Acquisition of language and actions
3. iCub humanoid robot
The dimensions are similar to that of a
3.5 year old child
53 degrees of freedom
Came from the European Framework 6
project: RobotCub (www.robotcub.org)
There are now 20 iCubs in different labs
in Europe and 1 in the US
Continued design - v2.0 to come out
Various ongoing project outcomes are
distributed via an open-source software
repository and via hardware upgrades
A free iCub simulator is available
6. iCub humanoid robot
Simulator
Open-source
Developed as part of a joint effort
with the European project iTalk
Widely adopted within cognitive
robotics community
V. Tikhanoff, P. Fitzpatrick, F. Nori, L. Natale, G. Metta, and A. Cangelosi, “The icub humanoid robotsimulator,”
In International Conference on Intel ligentRObots and Systems IROS, Nice, France, 2008
7. YARP
Yet Another Robotic Platform
Supports building a robot control system as a collection of programs
communicating via tcp, udp, multicast, local, MPI
Can be broken down into:
libYARP_OS - interfacing with the operating system(s) to support easy streaming of
data across many threads across many machines
libYARP_sig - performing common signal processing tasks (visual, auditory) in an open
manner easily interfaced with other commonly used libraries, for example OpenCV
libYARP_dev - interfacing with common devices used in robotics: framegrabbers,
digital cameras, motor control boards, etc.
10. YARP
Terminal commands
yarp yarp namespace
yarp help yarp ping
yarp check yarp read
yarp clean yarp regression
yarp cmake yarp resource
yarp conf yarp rpc
yarp detect yarp rpcserver
yarp disconnect yarp run
yarp exists yarp server
yarp forward yarp terminate
yarp help yarp topic
yarp name yarp version
yarp name check yarp wait
yarp name list yarp where
yarp name unregister yarp write
11. YARP and iCub simulator
Controlling motors
yarp rpc /icubSim/left_leg/rpc:I Terminal 1: yarpserver
6 joints Starts YARP server
yarp rpc /icubSim/right_leg/rpc:i Terminal 2: iCub_SIM
6 joints Starts iCub simulator
yarp rpc /icubSim/torso/rpc:I Terminal 3: yarp rpc /icubSim/left_arm/rpc:I
3 joints
Terminal 3: set pos 0 – 90
yarp rpc /icubSim/left_arm/rpc:I Terminal 3: set vel 0 50
the arm includes the hand for a total of 16
controlled degrees of freedom Terminal 3: set pos 0 90
yarp rpc /icubSim/right_arm/rpc:I
structure is identical to the left arm
12. YARP and iCub simulator
Displaying camera outputs and controlling joints
Terminal 1: yarpserver
Terminal 2: iCub_SIM
Terminal 3: yarpview /left
Terminal 3: yarpview /right
Terminal 3: yarp connect /icubSim/cam/left /left
Terminal 3: yarp connect /icubSim/cam/right /right
Move the iCub’s head and see the vision changing:
Terminal 3: yarp rpc /icubSim/head/rpc:I
Terminal 3: set pos 0 -30 (head will move down)
Terminal 3: set pos 0 30 (head will move up)
Easier way is to use the existing graphical user interface:
Terminal 3: robotMotorGui
To display camera outputs form the real iCub change the /icubSim prefix with /icub
13. Computation of visual, auditory, and tactile perception while performing
elaborate motor control in real-time requires a lot of computation
14. YARP can run across any number of machines with different operating systems
16. Biologically-inspired models used in cognitive robotics are inherently parallel and
can greatly benefit from massively parallel devices such as GPU processors
18. CPU vs. GPU
Different goals produce different designs
GPU assumes work load is highly parallel
CPU must be good at everything, parallel or not
CPU: minimize latency experienced by 1 thread
big on-chip caches
sophisticated control logic
GPU: maximize throughput of all threads
# threads in flight limited by resources => lots of resources (registers,
bandwidth, etc.)
multithreading can hide latency => skip the big caches
share control logic across many threads
19.
20. GPU Evolution
High throughput computation “Kepler”
GeForce GTX 690: 2 x 2811 GFLOP/s 7B xtors
High bandwidth memory
GeForce GTX 690: 2 x 192 GB/s
“Fermi”
High availability to all 3B xtors
200+ million CUDA-capable GPUs in the world
GeForce 8800
681M xtors
GeForce FX
125M xtors
GeForce 3
GeForce 256 60M xtors
RIVA 128 23M xtors
3M xtors
1995 2000 2005 2010 2012
21. Programming GPUs with CUDA
History
Nvidia creates CUDA to facilitate the development of parallel
programs on GPUs (2007)
The CUDA language is ANSI C extended with very few keywords
for labeling data-parallel functions (kernels) and their associated
data
Nvidia technology benefits from massive economies of scale in
the gaming market, CUDA-enabled cards are very inexpensive for
the performance they provide
21
26. Inspiration
Overcoming computational constrains by using GPU processors
Motion compliance < 1 ms
Vision (30fps) < 33 ms
Vision (60fps) < 16 ms
We typically take 33 ms as the cut-off time. 1 complete cycle of
everything critical MUST be completed in that time.
Of course some processes are not critical and their information
can be used as and when it becomes available, subject to
various constraints.
You may ask “Why are these design decisions different from a CPU?”In fact, the GPU’s goals differ significantly from the CPU’sGPU evolved to solve problems on a highly parallel workloadCPU evolved to be good at any problem whether it is parallel or notFor example, the trip out to memory is long and painfulThe question for the chip architect: How to deal with latency?One way is to avoid it: the CPU’s computational logic sits in a sea of cache. The idea is that few memory transactions are unique, and the data is usually found quickly after a short trip to the cache.Another way is amortization: GPUs forgo cache in favor of parallelization. Here, the idea is that most memory transactions are unique, and can be processed efficiently in parallel. The cost of a trip to memory is amortized across several independent threads, which results in high throughput.
In fact, manycore NVIDIA GPUs make parallel processing a commodity technologyGPUs are mass-market commodity products sold at tremendous economies of scale. We sell around 1 million GPUs per week! That’s about 100 per minute. GPUs are massively parallel devices. Our high-end GPUs have 3072 heavily multithreaded thread processors, 7.08 billion transistorsUpshot: Massively parallel computing has become a commodity technology!