PhD Defence | Deep learning with 3D and label geometry
Shuai completed his research under the supervision of Cees Snoek and Efstratios Gavves (both from the UvA).
Despite the recent progress in using deep learning to solve computer vision problems, a fine-grained understanding of an image remains challenging. Often, such understanding of an image is two-fold: visual understanding and semantic understanding. The former strives to understand intrinsic properties of the object in the image, e.g. the 2D visual appearance, the 3D shape, the 3D position/pose, etc., whereas the latter aims at associating the diverse objects with certain semantics, e.g. a category name of an object, an action or an attribute. All of these form the basis of an in-depth understanding of images that we wish a machine to have.
Today’s default architectures of deep convolutional networks have already shown a remarkable ability in capturing the visual appearances of images in the 2D domain, and mapping visual content to one specific semantic class thereafter (e.g. image classification, object detection). However, research on fine-grained image understanding, such as inferring the intrinsic 3D information and more structured semantics, is less explored. Shuai motivates his angles in looking at the problems by asking “How to better utilize geometry for better image understanding?”
In the first part of his thesis, Shuai researches visual image understanding with 3D geometry. He shows that it is possible to automatically explain a variety of visual contents in the image with texture-free 3D shapes. Furthermore, Shuai develops a deep learning framework to reliably recover a set of 3D geometric attributes, such as the pose of an object and the surface normal of its shape, from a 2D image.
In the second part, Shuai explores label geometry for semantic image understanding. He finds that a set of image classification problems have geometrically similar probability spaces. To this end, he introduces label geometry, unifying one-vs.-rest classification, multi-label classification, and out-of-distribution classification in one framework. Moreover, he shows that we can learn hierarchical label geometries to better model image classification tasks when a class hierarchy is used to balancing the accuracy and specificity of an image classifier.