Visual object representation has attracted substantial interest during the last decades. Besides being one of the fundamental challenges of computer vision, it also poses a central challenge to many robotic applications. In these applications, we do not only want to recognize objects but we also want to interact with them. In this context, we are specifically interested in applications involving manipulation and grasping in indoor scenarios. In these scenarios, reasoning about affordances of objects such as graspability, pour-ability, or cut-ability are paramount. Thus, making a link between visual object representation and their affordances plays an essential role in these applications.
This thesis deals with the problem of object representation by affordances. The hallmark of our object representation is the notion of parts. We argue that affordances are mostly associated with the parts of objects. For example, the head of a hammer affords pounding or the blade of a knife affords cutting. The distinction of our work compared to current state-of-the-art part based object representations is that in our work parts are driven from the affordances themselves. We present here a number of methods and techniques for a part-based object representation, part-based affordance detection, and actualizing affordances in robotic tasks. We aim at providing methods that robustly generalize to novel objects and are applicable in real robotic scenarios.
In our work, we use RGB-D data obtained from a Kinect sensor. We represent the RGB-D data based on parts which carry functional meaning. We then propose a part-based affordance detection approach. Since parts are shared among objects, affordances can thus be detected in novel objects. We then actualize affordances in robotic tasks which generally involve multiple affordances. As an example, scooping beans from a box with a ladle needs grasping the ladle's handle and scooping with the ladle's mouth. Thus, we learn relations between object parts and their affordances for performing tasks.
The proposed contributions which were integrated in a coherent framework have been evaluated on a number of robotic tasks and a publicly available RGB-D affordance dataset. We obtained a high object segmentation performance compared to the other state-of-the-art part segmentation methods on RGB-D data, even in the presence of clutter. Most importantly, we obtained a high affordance detection performance superior to other baseline methods. We also evaluated our framework in different grasping and manipulation tasks. The evaluation proved the applicability and generalization of our approach in real world scenarios and novel scenes.