Amodal layout estimation is the task of estimating a semantic occupancy map in bird’s-eye view, given a monocular image or video. The term amodal implies that we estimate occupancy and semantic labels even for parts of the world that are occluded in image space. In this work, we introduce AutoLay, a new dataset and benchmark for this task. AutoLay provides annotations in 3D, in bird’s-eye view, and in image space. We provide high quality labels for sidewalks, vehicles, crosswalks, and lanes. We evaluate several approaches on sequences from the KITTI and Argoverse datasets.
Supplementary notes can be added here, including code and math.