Scene Understanding for Mobile Robots Exploiting Deep Learning Techinques
Rangel, José Carlos
MetadataShow full item record
Every day robots are becoming more common in the society. Consequently, they must have certain basic skills in order to interact with humans and the environment. One of these skills is the capacity to understand the places where they are able to move. Computer vision is one of the ways commonly used for achieving this purpose. Current technologies in this field offer outstanding solutions applied to improve data quality every day, therefore producing more accurate results in the analysis of an environment. With this in mind, the main goal of this research is to develop and validate an efficient object-based scene understanding method that will be able to help solve problems related to scene identification for mobile robotics. We seek to analyze state-of-the-art methods for finding the most suitable one for our goals, as well as to select the kind of data most convenient for dealing with this issue. Another primary goal of the research is to determine the most suitable data input for analyzing scenes in order to find an accurate representation for the scenes by meaning of semantic labels or point cloud features descriptors. As a secondary goal we will show the benefits of using semantic descriptors generated with pre-trained models for mapping and scene classification problems, as well as the use of deep learning models in conjunction with 3D features description procedures to build a 3D object classification model that is directly related with the representation goal of this work. The research described in this thesis was motivated by the need for a robust system capable of understanding the locations where a robot usually interacts. In the same way, the advent of better computational resources has allowed to implement some already defined techniques that demand high computational capacity and that offer a possible solution for dealing with scene understanding issues. One of these techniques are Convolutional Neural Networks (CNNs). These networks have the capacity of classifying an image based on their visual appearance.