Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images

Gen6D: Generalizable Model-Free 6-DoF Object Pose
Estimation from RGB Images

ECCV 2022

Yuan Liu¹, Yilin Wen¹, Sida Peng², Cheng Lin³, Xiaoxiao Long¹, Taku Komura¹, Wenping Wang⁴

¹The University of Hong Kong ²Zhejiang University ³Tencent ⁴Texas A&M University

Paper

Code

Model & Dataset

Abstract

Gen6D is able to predict unseen object poses in RGB images based on reference images of the object.

In this paper, we present a generalizable model-free 6-DoF object pose estimator called Gen6D. Existing generalizable pose estimators either need the high-quality object models or require additional depth maps or object masks in test time, which significantly limits their application scope. In contrast, our pose estimator only requires some posed images of the unseen object and is able to accurately predict poses of the object in arbitrary environments. Gen6D consists of an object detector, a viewpoint selector and a pose refiner, all of which do not require the 3D object model and can generalize to unseen objects. Experiments show that Gen6D achieves state-of-the-art results on two model-free datasets: the MOPED dataset and a new GenMOP dataset collected by us. In addition, on the LINEMOD dataset, Gen6D achieves competitive results compared with instance-specific pose estimators.

Comparison

Both DeepIM and Gen6D are trained on the same training dataset and generalize to these unseen objects. Gen6D generalizes better than DeepIM due to the utilization of a feature volume-based refiner.
PVNet is trained on the object using the reference images (about 200) which are not enough to train a PVNet for accurate pose estimation.

Application

A simple AR application: With the known poses, we are able to render an adorable Dodoco to replace the cute Lulu Piggy. Gen6D does not require the object model nor the object mask. By simply capturing reference images of an unseen object by cellphones and recovering the poses of reference images by COLMAP, Gen6D is able to predict the object pose on arbitrary query images. Thus, Gen6D can be easily applied on daily objects for AR/VR applications.