publications
For a list of my featured publications, please scroll down the home page. This publication page lists all of my major publications. For an exhaustive list that includes workshop papers, please see my CV.
* indicates that authors contributed equally
2022
- Rethinking Optimization with Differentiable Simulation from a Global PerspectivearXiv 2022
Differentiable simulation is a promising toolkit for fast gradient-based policy optimization and system identification. In this work, we study the challenges that differentiable simulation presents when it is not feasible to expect that a single descent reaches a global optimum. We analyze the optimization landscapes of diverse scenarios and find that in dynamic environments with highly deformable objects and fluids, differentiable simulators produce rugged landscapes with useful gradients. We propose a method that combines Bayesian optimization with semi-local leaps to obtain a global search method that can use gradients effectively and maintain robust performance in regions with noisy gradients. We show extensive experiments in simulation, and also validate the method in a real robot setup.
We show that differentiable simulations present difficult optimization landscapes and address this with a method that combines global and local optimization
Rika and Jingyun did nearly all of the work. I was consulted primarily for advice on differentiable simulation. I helped write a few parts of the manuscript.
@inproceedings{bodiffsim, title = {Rethinking Optimization with Differentiable Simulation from a Global Perspective}, author = {Antonova, Rika and Yang, Jingyun and Jatavallabhula, {Krishna Murthy} and Bohg, Jeannette}, year = {2022}, booktitle = {arXiv}, }
- f -Cal: Calibrated aleatoric uncertainty estimation from neural networks for robot perceptionDhaivat Bhatt*, Kaustubh Mani*, Dishank Bansal, Hanju Lee, Krishna Murthy Jatavallabhula, and Liam PaullICRA 2022
f-Cal is calibration method for probabilistic regression networks. Typical Bayesian neural networks are overconfident in their predictions. For these predictions to be used in downstream tasks, reliable and calibrated uncertainity estimates are critical. f-Cal proposes a simple loss function to remedy this; this can be employed to train any probabilistic neural regressor to produced calibrated estimates of aleatoric uncertainty.
We present a simple approach that uses a variational loss to enforce calibration in probabilistic regression networks
Dhaivat and Kaustubh contributed equally to the experiments. Dishank implemented very early prototypes. Liam came up with this idea. I only played a largely hands-off, mentorship role.
@inproceedings{fcal, title = {f -Cal: Calibrated aleatoric uncertainty estimation from neural networks for robot perception}, author = {Bhatt, Dhaivat and Mani, Kaustubh and Bansal, Dishank and Lee, Hanju and Jatavallabhula, {Krishna Murthy} and Paull, Liam}, year = {2022}, booktitle = {ICRA}, }
- DRACO: Weakly supervised dense reconstruction and canonicalization of objectsRahul Sajnani*, Aadil Mehdi Sanchawala*, Krishna Murthy Jatavallabhula, Srinath Sridhar, and Madhava Krishna KICRA 2022
We present DRACO, a method for Dense Reconstruction And Canonicalization of Object shape from one or more RGB images. Canonical shape reconstruction; estimating 3D object shape in a coordinate space canonicalized for scale, rotation, and translation parameters—is an emerging paradigm that holds promise for a multitude of robotic applications. Prior approaches either rely on painstakingly gathered dense 3D supervision, or produce only sparse canonical representations, limiting real-world applicability. DRACO performs dense canonicalization using only weak supervision in the form of camera poses and semantic keypoints at train time. During inference, DRACO predicts dense object-centric depth maps in a canonical coordinate-space, solely using one or more RGB images of an object. Extensive experiments on canonical shape reconstruction and pose estimation show that DRACO is competitive or superior to fully-supervised methods.
We present a weakly supervised approach that reconstructs objects in a canonical coordinate space
I played a mentorship role on this project. Srinath and Madhav contributed more than me.
@inproceedings{draco, title = {DRACO: Weakly supervised dense reconstruction and canonicalization of objects}, author = {Sajnani, Rahul and Sanchawala, {Aadil Mehdi} and Jatavallabhula, {Krishna Murthy} and Sridhar, Srinath and K, {Madhava Krishna}}, year = {2022}, booktitle = {ICRA}, }
2021
- Taskography: Evaluating robot task planning over large 3D scene graphsChris Agia*, Krishna Murthy Jatavallabhula*, Mohamed Khodeir, Ondrej Miksik, Vibhav Vineet, Mustafa Mukadam, Liam Paull, and Florian ShkurtiCoRL 2021
3D scene graphs (3DSGs) are an emerging description; unifying symbolic, topological, and metric scene representations. However, typical 3DSGs contain hundreds of objects and symbols even for small environments; rendering task planning on the \emphfull graph impractical. We construct \textbfTaskography, the first large-scale robotic task planning benchmark over 3DSGs. While most benchmarking efforts in this area focus on \emphvision-based planning, we systematically study \emphsymbolic planning, to decouple planning performance from visual representation learning. We observe that, among existing methods, neither classical nor learning-based planners are capable of real-time planning over \emphfull 3DSGs. Enabling real-time planning demands progress on \emphboth (a) sparsifying 3DSGs for tractable planning and (b) designing planners that better exploit 3DSG hierarchies. Towards the former goal, we propose \textbfScrub, a task-conditioned 3DSG sparsification method; enabling classical planners to match (and surpass) state-of-the-art learning-based planners. Towards the latter goal, we propose \textbfSeek, a procedure enabling learning-based planners to exploit 3DSG structure, reducing the number of replanning queries required by current best approaches by an order of magnitude. We will open-source all code and baselines to spur further research along the intersections of robot task planning, learning and 3DSGs.
We present a large-scale benchmark and performant approaches for long-horizon task planning over large 3D scene graphs
Idea was conceived, led, and implemented by Chris and I. Chris focused more on the benchmark. I focused on the SCRUB and SEEK algorithms. Mohamed helped implement several optimal planners. Chris implemented the Taskography-API.
@inproceedings{taskography, title = {Taskography: Evaluating robot task planning over large 3D scene graphs}, author = {Agia, Chris and Jatavallabhula, {Krishna Murthy} and Khodeir, Mohamed and Miksik, Ondrej and Vineet, Vibhav and Mukadam, Mustafa and Paull, Liam and Shkurti, Florian}, year = {2021}, booktitle = {CoRL}, }
- gradSim: Differentiable simulation for system identification and visuomotor controlKrishna Murthy Jatavallabhula*, Miles Macklin*, Florian Golemo, Vikram Voleti, Linda Petrini, Martin Weiss, Breandan Considine, Jerome Parent-Levesque, Kevin Xie, Kenny Erleben, Liam Paull, Florian Shkurti, Derek Nowrouzezahrai, and Sanja FidlerICLR 2021
In this paper, we tackle the problem of estimating object physical properties such as mass, friction, and elasticity directly from video sequences. Such a system identification problem is fundamentally ill-posed due to the loss of information during image formation. Current best solutions to the problem require precise 3D labels which are labor intensive to gather, and infeasible to create for many systems such as deformable solids or cloth. In this work we present gradSim, a framework that overcomes the dependence on 3D supervision by combining differentiable multiphysics simulation and differentiable rendering to jointly model the evolution of scene dynamics and image formation. This unique combination enables backpropagation from pixels in a video sequence through to the underlying physical attributes that generated them. Furthermore, our unified computation graph across dynamics and rendering engines enables the learning of challenging visuomotor control tasks, without relying on state-based (3D) supervision, while obtaining performance competitive to/better than techniques that require precise 3D labels.
Differentiable models of time-varying dynamics and image formation pipelines result in highly accurate physical parameter estimation from video and visuomotor control.
This idea was jointly conceived in a meeting which included me, Derek, Breandan, Martin, Bhairav Mehta, and Maxime Chevalier-Boisvert. Martin prototyped an initial differentiable billiards engine. I implemented the first rigid-body engine, integrated it with a differentiable renderer, and setup sys-id experiments. Miles and I then joined forces, with him focusing on the physics engine and me focusing on the physics + rendering combination and overall systems integration. I ran all of the experiments for this paper. Florian (Golemo) and Vikram created the datasets, designed experiments, and also helped with code and the manuscript. All authors participated in writing the manuscript and the author response phase. Florian (Shkurti), Derek, and Sanja nearly equally co-advised on this effort.
@inproceedings{gradsim, title = {gradSim: Differentiable simulation for system identification and visuomotor control}, author = {Jatavallabhula, {Krishna Murthy} and Macklin, Miles and Golemo, Florian and Voleti, Vikram and Petrini, Linda and Weiss, Martin and Considine, Breandan and Parent-Levesque, Jerome and Xie, Kevin and Erleben, Kenny and Paull, Liam and Shkurti, Florian and Nowrouzezahrai, Derek and Fidler, Sanja}, year = {2021}, booktitle = {ICLR}, }
2020
- gradSLAM: Dense SLAM meets automatic differentiationKrishna Murthy Jatavallabhula, Ganesh Iyer, and Liam PaullICRA 2020
The question of “representation” is central in the context of dense simultaneous localization and mapping (SLAM). Newer learning-based approaches have the potential to leverage data or task performance to directly inform the choice of representation. However, learning representations for SLAM has been an open question, because traditional SLAM systems are not end-to-end differentiable. In this work, we present gradSLAM, a differentiable computational graph take on SLAM. Leveraging the automatic differentiation capabilities of computational graphs, gradSLAM enables the design of SLAM systems that allow for gradient-based learning across each of their components, or the system as a whole. This is achieved by creating differentiable alternatives for each non-differentiable component in a typical dense SLAM system. Specifically, we demonstrate how to design differentiable trust-region optimizers, surface measurement and fusion schemes, as well as differentiate over rays, without sacrificing performance. This amalgamation of dense SLAM with computational graphs enables us to backprop all the way from 3D maps to 2D pixels, opening up new possibilities in gradient-based learning for SLAM.
We present end-to-end differentiable dense SLAM systems that open up new possibilites for integrating deep learning and SLAM
I came up with the idea and led this work. Ganesh Iyer was instrumental with implementing the surfel and point-based fusion pipelines.
@inproceedings{gradslam, title = {gradSLAM: Dense SLAM meets automatic differentiation}, author = {Jatavallabhula, {Krishna Murthy} and Iyer, Ganesh and Paull, Liam}, year = {2020}, booktitle = {ICRA}, }
- AutoLay: Benchmarking Monocular Layout EstimationIROS 2020
Amodal layout estimation is the task of estimating a semantic occupancy map in bird’s-eye view, given a monocular image or video. The term amodal implies that we estimate occupancy and semantic labels even for parts of the world that are occluded in image space. In this work, we introduce AutoLay, a new dataset and benchmark for this task. AutoLay provides annotations in 3D, in bird’s-eye view, and in image space. We provide high quality labels for sidewalks, vehicles, crosswalks, and lanes. We evaluate several approaches on sequences from the KITTI and Argoverse datasets.
We present a dataset and introduce a new benchmark for *amodal* layout estimation from monocular imagery
Kaustubh led this work. I came up with the idea, served as mentor, and helped write some of the manuscript.
@inproceedings{autolay, title = {AutoLay: Benchmarking Monocular Layout Estimation}, author = {Mani, Kaustubh and Shankar, Sai and Jatavallabhula, {Krishna Murthy} and K, {Madhava Krishna}}, year = {2020}, booktitle = {IROS}, }
- MonoLayout: Amodal scene layout from a single imageKaustubh Mani, Swapnil Daga, Shubhika Garg, Sai Shankar, Krishna Murthy Jatavallabhula, and Madhava Krishna KWACV 2020
In this paper, we address the novel, highly challenging problem of estimating the layout of a complex urban driving scenario. Given a single color image captured from a driving platform, we aim to predict the bird’s-eye view layout of the road and other traffic participants. The estimated layout should reason beyond what is visible in the image, and compensate for the loss of 3D information due to projection. We dub this problem "amodal scene layout estimation", which involves hallucinating scene layout for even parts of the world that are occluded in the image. To this end, we present MonoLayout, a deep neural network for real-time amodal scene layout estimation from a single image. MonoLayout maps a color image of a scene into a multi-channel occupancy grid in bird’s-eye view, where each channel represents occupancy probabilities of various scene components. We represent scene layout as a multi-channel semantic occupancy grid, and leverage adversarial feature learning to hallucinate plausible completions for occluded image parts. We extend several state-of-the-art approaches for road-layout estimation and vehicle occupancy estimation in bird’s-eye view to the amodal setup and thoroughly evaluate against them. By leveraging temporal sensor fusion to generate training labels, we significantly outperform current art over a number of datasets.
We present a neural network that "hallucinates" the layout of a road scene from a single image, including scene parts that are outside the bounds of the image
Kaustubh led this work. I came up with the idea, served as mentor, and helped write bulk of the manuscript.
@inproceedings{monolayout, title = {MonoLayout: Amodal scene layout from a single image}, author = {Mani, Kaustubh and Daga, Swapnil and Garg, Shubhika and Shankar, Sai and Jatavallabhula, {Krishna Murthy} and K, {Madhava Krishna}}, year = {2020}, booktitle = {WACV}, }
- Multi-object monocular SLAM for dynamic environmentsGokul Nair, Swapnil Daga, Rahul Sajnani, Anirudha Ramesh, Junaid Ahmed Ansari, Krishna Murthy Jatavallabhula, and Madhava Krishna KIntelligent Vehicles Symposium (IV) 2020
In this paper, we tackle the problem of multibody SLAM from a monocular camera. The term multibody, implies that we track the motion of the camera, as well as that of other dynamic participants in the scene. The quintessential challenge in dynamic scenes is unobservability; it is not possible to unambiguously triangulate a moving object from a moving monocular camera. Existing approaches solve restricted variants of the problem, but the solutions suffer relative scale ambiguity (i.e., a family of infinitely many solutions exist for each pair of motions in the scene). We solve this rather intractable problem by leveraging single-view metrology, advances in deep learning, and category-level shape estimation. We propose a multi posegraph optimization formulation, to resolve the relative and absolute scale factor ambiguities involved. This optimization helps us reduce the average error in trajectories of multiple bodies over real-world datasets, such as KITTI. To the best of our knowledge, our method is the first practical monocular multi-body SLAM system to perform dynamic multi-object and ego localization in a unified framework in metric scale.
We present a monocular object SLAM system that tracks not just the camera, but also other moving objects in the scene
I mentored Gokul and Swapnil on this project. I also wrote a part of the manuscript.
@inproceedings{nair2020iv, title = {Multi-object monocular SLAM for dynamic environments}, author = {Nair, Gokul and Daga, Swapnil and Sajnani, Rahul and Ramesh, Anirudha and Ansari, {Junaid Ahmed} and Jatavallabhula, {Krishna Murthy} and K, {Madhava Krishna}}, year = {2020}, booktitle = {Intelligent Vehicles Symposium (IV)}, }
2019
- MapLite: Autonomous intersection navigation without detailed prior mapsTeddy Ort, Krishna Murthy Jatavallabhula, Rohan Banerjee, Sai Krishna Gottipati, Dhaivat Bhatt, Igor Gilitschenski, Liam Paull, and Daniela RusIEEE RAL 2019
In this work, we present MapLite- a one-click autonomous navigation system capable of piloting a vehicle to an arbitrary desired destination point given only a sparse publicly available topometric map (from OpenStreetMap). The onboard sensors are used to segment the road region and register the topometric map in order to fuse the high-level navigation goals with a variational path planner in the vehicle frame. This enables the system to plan trajectories that correctly navigate road intersections without the use of an external localization system such as GPS or a detailed prior map. Since the topometric maps already exist for the vast majority of roads, this solution greatly increases the geographical scope for autonomous mobility solutions. We implement MapLite on a full-scale autonomous vehicle and exhaustively test it on over 15 km of road including over 100 autonomous intersection traversals. We further extend these results through simulated testing to validate the system on complex road junction topologies such as traffic circles.
MapLite is a one-click autonomous navigation system for a vehicle that only uses OpenStreetMap data and local sensing
Teddy did nearly all of this work. I helped design the topometric registration algorithm and write the manuscript.
@inproceedings{maplite, title = {MapLite: Autonomous intersection navigation without detailed prior maps}, author = {Ort, Teddy and Jatavallabhula, {Krishna Murthy} and Banerjee, Rohan and Gottipati, {Sai Krishna} and Bhatt, Dhaivat and Gilitschenski, Igor and Paull, Liam and Rus, Daniela}, year = {2019}, booktitle = {IEEE RAL}, }
- Kaolin: A PyTorch Library for Accelerating 3D Deep Learning ResearchKrishna Murthy Jatavallabhula*, Edward Smith*, Jean-Francois Lafleche, Clement Fuji Tsang, Artem Rozantsev, Wenzheng Chen, Tommy Xiang, Rev Lebaredian, and Sanja FidlerWhitepaper 2019
Kaolin is a PyTorch library aiming to accelerate 3D deep learning research. Kaolin provides efficient implementations of differentiable 3D modules for use in deep learning systems. With functionality to load and preprocess several popular 3D datasets, and native functions to manipulate meshes, pointclouds, signed distance functions, and voxel grids, Kaolin mitigates the need to write wasteful boilerplate code. Kaolin packages together several differentiable graphics modules including rendering, lighting, shading, and view warping. Kaolin also supports an array of loss functions and evaluation metrics for seamless evaluation and provides visualization functionality to render the 3D results. Importantly, we curate a comprehensive model zoo comprising many state-of-the-art 3D deep learning architectures, to serve as a starting point for future research endeavours.
Kaolin is a PyTorch library aimed at accelerating 3D deep learning research.
Edward and I led this work during our 2019 internships at NVIDIA. It has since been maintained and developed by several others, notably, Clement, Masha Shugrina, and Towaki Takikawa
@inproceedings{kaolin, title = {Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research}, author = {Jatavallabhula, {Krishna Murthy} and Smith, Edward and Lafleche, Jean-Francois and {Fuji Tsang}, Clement and Rozantsev, Artem and Chen, Wenzheng and Xiang, Tommy and Lebaredian, Rev and Fidler, Sanja}, year = {2019}, booktitle = {Whitepaper}, }
- Deep Active LocalizationSai Krishna Gottipati*, Keehong Seo*, Krishna Murthy Jatavallabhula, Dhaivat Bhatt, Vincent Mai, and Liam PaullIEEE RAL 2019
Active localization is the problem of generating robot actions that allow it to maximally disambiguate its pose within a reference map. Traditional approaches to this use an information-theoretic criterion for action selection and hand-crafted perceptual models. In this work we propose an end-to-end differentiable method for learning to take informative actions that is trainable entirely in simulation and then transferable to real robot hardware with zero refinement. The system is composed of two modules - a convolutional neural network for perception, and a deep reinforcement learned planning module. We introduce a multi-scale approach to the learned perceptual model since the accuracy needed to perform action selection with reinforcement learning is much less than the accuracy needed for robot control. We demonstrate that the resulting system outperforms using the traditional approach for either perception or planning. We also demonstrate our approaches robustness to different map configurations and other nuisance parameters through the use of domain randomization in training. The code is also compatible with the OpenAI gym framework, as well as the Gazebo simulator.
We demonstrate the applicability of a learned perception model and an exploration policy applied to active localization on real robots
Sai and Keehong did most of this work. I was only loosely involved in a mentorship role
@inproceedings{dal, title = {Deep Active Localization}, author = {Gottipati, {Sai Krishna} and Seo, Keehong and Jatavallabhula, {Krishna Murthy} and Bhatt, Dhaivat and Mai, Vincent and Paull, Liam}, year = {2019}, booktitle = {IEEE RAL}, }
- INFER: INtermediate representations for FuturE pRedictionShashank Srikanth, Junaid Ahmed Ansari, Karnik Ram, Sarthak Sharma, Krishna Murthy Jatavallabhula, and Madhava Krishna KIROS 2019
Deep learning methods have ushered in a new era for computer vision and robotics. With very accurate methods for object detection and semantic segmentation, we are now at a juncture where we can envisage the application of these techniques to perform higher-order understanding. One such application which we consider in this work, is predicting future states of traffic participants in urban driving scenarios. Specifically, we argue that constructing intermediate representations of the world using off-the-shelf computer vision models for semantic segmentation and object detection, we can train models that account for the multi-modality of future states, and at the same time transfer well across different train and test distributions (datasets). Our approach, dubbed INFER (INtermediate representations for distant FuturE pRediction), involves training an autoregressive model that takes in an intermediate representation of past states of the world, and predicts a multimodal distribution over plausible future states. The model consists of an Encoder-Decoder with ConvLSTM present along the skip connections, and in between the Encoder-Decoder. The network takes an intermediate representation of the scene and predicts the future locations of the Vehicle of Interest (VoI). We outperform the current best future prediction model on KITTI while predicting deep into the future (3 sec, 4 sec) by a significant margin. Contrary to most approaches dealing with future prediction that do not generalize well to datasets that they have not been trained on, we test our method on different datasets like Oxford RobotCar and Cityscapes, and show that the network performs well across these datasets which differ in scene layout, weather conditions, and also generalizes well across cross-sensor modalities. We carry out a thorough ablation study on our intermediate representation that captures the role played by different semantics. We conclude the results section by showcasing an important use case of future prediction- multi object tracking and exhibit results on select sequences from KITTI and Cityscapes.
INFER demonstrates the applicability of intermediate representations for zero-shot transferrable trajectory forecasting of vehicles in urban driving scenarios
Shashank led this work. I came up with the idea, mentored the work, and wrote bulk of the manuscript.
@inproceedings{infer, title = {INFER: INtermediate representations for FuturE pRediction}, author = {Srikanth, Shashank and Ansari, {Junaid Ahmed} and Ram, Karnik and Sharma, Sarthak and Jatavallabhula, {Krishna Murthy} and K, {Madhava Krishna}}, year = {2019}, booktitle = {IROS}, }
2018
- The Earth ain’t Flat: Monocular Reconstruction of Vehicles on Steep and Graded Roads from a Moving CameraJunaid Ahmed Ansari*, Sarthak Sharma*, Anshuman Majumdar, Krishna Murthy Jatavallabhula, and Madhava Krishna KIROS 2018
Accurate localization of other traffic participants is a vital task in autonomous driving systems. State-of-the-art systems employ a combination of sensing modalities such as RGB cameras and LiDARs for localizing traffic participants, but most such demonstrations have been confined to plain roads. We demonstrate, to the best of our knowledge, the first results for monocular object localization and shape estimation on surfaces that do not share the same plane with the moving monocular camera. We approximate road surfaces by local planar patches and use semantic cues from vehicles in the scene to initialize a local bundle-adjustment like procedure that simultaneously estimates the pose and shape of the vehicles, and the orientation of the local ground plane on which the vehicle stands as well. We evaluate the proposed approach on the KITTI and SYNTHIA-SF benchmarks, for a variety of road plane configurations. The proposed approach significantly improves the state-of-the-art for monocular object localization on arbitrarily-shaped roads.
We demonstrate monocular object localization and wireframe (shape) estimation on extremely steep and graded roads
Junaid and Sarthak led this work. I mentored them closely on this project and helped write the manuscript.
@inproceedings{ansari2018steep, title = {The Earth ain’t Flat: Monocular Reconstruction of Vehicles on Steep and Graded Roads from a Moving Camera}, author = {Ansari, {Junaid Ahmed} and Sharma, Sarthak and Majumdar, Anshuman and Jatavallabhula, {Krishna Murthy} and K, {Madhava Krishna}}, year = {2018}, booktitle = {IROS}, }
- CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer NetworksIROS 2018
3D LiDARs and 2D cameras are increasingly being used alongside each other in sensor rigs for perception tasks. Before these sensors can be used to gather meaningful data, however, their extrinsics (and intrinsics) need to be accurately calibrated, as the performance of the sensor rig is extremely sensitive to these calibration parameters. A vast majority of existing calibration techniques require significant amounts of data and/or calibration targets and human effort, severely impacting their applicability in large-scale production systems. We address this gap with CalibNet - a self-supervised deep network capable of automatically estimating the 6-DoF rigid body transformation between a 3D LiDAR and a 2D camera in real-time. CalibNet alleviates the need for calibration targets, thereby resulting in significant savings in calibration efforts. During training, the network only takes as input a LiDAR point cloud, the corresponding monocular image, and the camera calibration matrix K. At train time, we do not impose direct supervision (i.e., we do not directly regress to the calibration parameters, for example). Instead, we train the network to predict calibration parameters that maximize the geometric and photometric consistency of the input images and point clouds. CalibNet learns to iteratively solve the underlying geometric problem and accurately predicts extrinsic calibration parameters for a wide range of mis-calibrations, without requiring retraining or domain adaptation.
CalibNet is a geometrically-supervised deep neural network for the extrinsic calibration of lidar-stereo camera rigs
Ganesh led this work. I had the initial idea, and Ganesh had clever tweaks that got it to work in practice.
@inproceedings{calibnet, title = {CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks}, author = {Iyer, Ganesh and Ram, Karnik and Jatavallabhula, {Krishna Murthy} and K, {Madhava Krishna}}, year = {2018}, booktitle = {IROS}, }
- Geometric Consistency for Self-Supervised End-to-End Visual OdometryCVPR Workshops 2018
With the success of deep learning based approaches in tackling challenging problems in computer vision, a wide range of deep architectures have recently been proposed for the task of visual odometry (VO) estimation. Most of these proposed solutions rely on supervision, which requires the acquisition of precise ground-truth camera pose information, collected using expensive motion capture systems or high-precision IMU/GPS sensor rigs. In this work, we propose an unsupervised paradigm for deep visual odometry learning. We show that using a noisy teacher, which could be a standard VO pipeline, and by designing a loss term that enforces geometric consistency of the trajectory, we can train accurate deep models for VO that do not require ground-truth labels. We leverage geometry as a self-supervisory signal and propose "Composite Transformation Constraints (CTCs)", that automatically generate supervisory signals for training and enforce geometric consistency in the VO estimate. We also present a method of characterizing the uncertainty in VO estimates thus obtained. To evaluate our VO pipeline, we present exhaustive ablation studies that demonstrate the efficacy of end-to-end, self-supervised methodologies to train deep models for monocular VO. We show that leveraging concepts from geometry and incorporating them into the training of a recurrent neural network results in performance competitive to supervised deep VO methods.
We use the compositional property of transformations to self-supervise learning of visual odometry from images
Ganesh contributed to this work more than me. I came up with the idea, but he got this to work.
@inproceedings{ctcnet, title = {Geometric Consistency for Self-Supervised End-to-End Visual Odometry}, author = {Iyer, Ganesh and Jatavallabhula, {Krishna Murthy} and Gupta, Gunshi and K, {Madhava Krishna} and Paull, Liam}, year = {2018}, booktitle = {CVPR Workshops}, }
- Beyond Pixels: Leveraging Geometry and Shape Cues for Multi-Object TrackingSarthak Sharma*, Junaid Ahmed Ansari*, Krishna Murthy Jatavallabhula, and Madhava Krishna KICRA 2018
This paper introduces geometry and object shape and pose costs for multi-object tracking in urban driving scenarios. Using images from a monocular camera alone, we devise pairwise costs for object tracks, based on several 3D cues such as object pose, shape, and motion. The proposed costs are agnostic to the data association method and can be incorporated into any optimization framework to output the pairwise data associations. These costs are easy to implement, can be computed in real-time, and complement each other to account for possible errors in a tracking-by-detection framework. We perform an extensive analysis of the designed costs and empirically demonstrate consistent improvement over the state-of-the-art under varying conditions that employ a range of object detectors, exhibit a variety in camera and object motions, and, more importantly, are not reliant on the choice of the association framework. We also show that, by using the simplest of associations frameworks (two-frame Hungarian assignment), we surpass the state-of-the-art in multi-object-tracking on road scenes.
We present a monocular multi-object tracker that uses simple 3D cues and obtained (in 2018) state-of-the-art results.
I came up with this idea and mentored Sarthak and Junaid on this work. I also wrote most of the manuscript.
@inproceedings{beyondpixels, title = {Beyond Pixels: Leveraging Geometry and Shape Cues for Multi-Object Tracking}, author = {Sharma, Sarthak and Ansari, {Junaid Ahmed} and Jatavallabhula, {Krishna Murthy} and and K, {Madhava Krishna}}, year = {2018}, booktitle = {ICRA}, }
- Constructing Category-Specific Models for Monocular Object SLAMParv Parkhiya, Rishabh Khawad, Krishna Murthy Jatavallabhula, Madhava Krishna K, and Brojeshwar BhowmickICRA 2018
We present a new paradigm for real-time object-oriented SLAM with a monocular camera. Contrary to previous approaches, that rely on object-level models, we construct category-level models from CAD collections which are now widely available. To alleviate the need for huge amounts of labeled data, we develop a rendering pipeline that enables synthesis of large datasets from a limited amount of manually labeled data. Using data thus synthesized, we learn category-level models for object deformations in 3D, as well as discriminative object features in 2D. These category models are instance-independent and aid in the design of object landmark observations that can be incorporated into a generic monocular SLAM framework. Where typical object-SLAM approaches usually solve only for object and camera poses, we also estimate object shape on-the-fly, allowing for a wide range of objects from the category to be present in the scene. Moreover, since our 2D object features are learned discriminatively, the proposed object-SLAM system succeeds in several scenarios where sparse feature-based monocular SLAM fails due to insufficient features or parallax. Also, the proposed category-models help in object instance retrieval, useful for Augmented Reality (AR) applications. We evaluate the proposed framework on multiple challenging real-world scenes and show — to the best of our knowledge — first results of an instance-independent monocular object-SLAM system and the benefits it enjoys over feature-based SLAM methods.
We present a monocular object SLAM system that uses category-level object representations as object observations
Parv implemented bulk of the object SLAM backend. Rishabh implemented the frontend. I proposed this project and mentored Parv and Rishabh, and wrote bulk of the manuscript.
@inproceedings{parkhiya2018icra, title = {Constructing Category-Specific Models for Monocular Object SLAM}, author = {Parkhiya, Parv and Khawad, Rishabh and Jatavallabhula, {Krishna Murthy} and K, {Madhava Krishna} and Bhowmick, Brojeshwar}, year = {2018}, booktitle = {ICRA}, }
2017
- Shape Priors for Real-Time Monocular Object Localization in Dynamic EnvironmentsKrishna Murthy Jatavallabhula, Sarthak Sharma, and Madhava Krishna KIROS 2017
Reconstruction of dynamic objects in a scene is a highly challenging problem in the context of SLAM. In this paper, we present a real-time monocular object localization system that estimates the shape and pose of dynamic objects in real-time, using video frames captured from a moving monocular camera. Although the problem seems to be ill-posed, we demonstrate that, by incorporating prior knowledge of the object category, we can obtain more detailed instance-level reconstructions. As opposed to earlier object model specifications, the proposed shape-prior model leads to the formulation of a Bundle Adjustment-like optimization problem for simultaneous shape and pose estimation. Leveraging recent successes of Convolutional Neural Networks (CNNs) for object keypoint localization, we present a CNN architecture that performs precise keypoint localization. We then demonstrate how these keypoints can be used to recover 3D object properties, while accounting for any 2D localization errors and self-occlusion. We show significant performance improvements compared to state-of-the-art monocular competitors for 2D keypoint detection, as well as 3D localization and reconstruction of dynamic objects.
I did most of this work – part of my Masters thesis
@inproceedings{jatavallabhula2017iros, title = {Shape Priors for Real-Time Monocular Object Localization in Dynamic Environments}, author = {Jatavallabhula, {Krishna Murthy} and Sharma, Sarthak and K, {Madhava Krishna}}, year = {2017}, booktitle = {IROS}, }
- Reconstructing Vehicles From a Single Image: Shape Priors for Road Scene UnderstandingKrishna Murthy Jatavallabhula, Sai Krishna Gottipati, Falak Chhaya, and Madhava Krishna KICRA 2017
We present an approach for reconstructing vehicles from a single (RGB) image, in the context of autonomous driving. Though the problem appears to be ill-posed, we demonstrate that prior knowledge about how 3D shapes of vehicles project to an image can be used to reason about the reverse process, i.e., how shapes (back-)project from 2D to 3D. We encode this knowledge in shape priors, which are learnt over a small keypoint-annotated dataset. We then formulate a shape-aware adjustment problem that uses the learnt shape priors to recover the 3D pose and shape of a query object from an image. For shape representation and inference, we leverage recent successes of Convolutional Neural Networks (CNNs) for the task of object and keypoint localization, and train a novel cascaded fully-convolutional architecture to localize vehicle keypoints in images. The shape-aware adjustment then robustly recovers shape (3D locations of the detected keypoints) while simultaneously filling in occluded keypoints. To tackle estimation errors incurred due to erroneously detected keypoints, we use an Iteratively Re-weighted Least Squares (IRLS) scheme for robust optimization, and as a by-product characterize noise models for each predicted keypoint. We evaluate our approach on autonomous driving benchmarks, and present superior results to existing monocular, as well as stereo approaches.
I did most of this work – part of my Masters thesis
@inproceedings{jatavallabhula2017icra, title = {Reconstructing Vehicles From a Single Image: Shape Priors for Road Scene Understanding}, author = {Jatavallabhula, {Krishna Murthy} and Gottipati, {Sai Krishna} and Chhaya, Falak and K, {Madhava Krishna}}, year = {2017}, booktitle = {ICRA}, }
2016
- JIRSFAST: Frontier Allocation Synchronized by Token-passingAvinash Gautam, Bhargav Jha, Gourav Kumar, Krishna Murthy Jatavallabhula, Arjun Ram S P, and Sudeept MohanSpringer Journal of Intelligent Robotic Systems 2016
We present an efficient, complete, and fault-tolerant multi-robot exploration algorithm
I designed and implemented the core algorithm. Authors listed lexicographically by their lastname (my lastname was assumed to be "Murthy") (except for the senior author; listed last)
@inproceedings{fast, title = {FAST: Frontier Allocation Synchronized by Token-passing}, author = {Gautam, Avinash and Jha, Bhargav and Kumar, Gourav and Jatavallabhula, {Krishna Murthy} and {S P}, {Arjun Ram} and Mohan, Sudeept}, year = {2016}, booktitle = {Springer Journal of Intelligent Robotic Systems}, }
2015
- SMCCluster, Allocate, Cover: An Efficient Approach for Multi-robot CoverageAvinash Gautam, Krishna Murthy Jatavallabhula, Gourav Kumar, Arjun Ram S P, Bhargav Jha, and Sudeept MohanIEEE SMC 2015
We design a performant online multi-robot coverage path planning technique
I designed and implemented core aspects of the algorithm
@inproceedings{cac, title = {Cluster, Allocate, Cover: An Efficient Approach for Multi-robot Coverage}, author = {Gautam, Avinash and Jatavallabhula, {Krishna Murthy} and Kumar, Gourav and {S P}, {Arjun Ram} and Jha, Bhargav and Mohan, Sudeept}, year = {2015}, booktitle = {IEEE SMC}, }
- UKSIMMaxxyt: An Autonomous Wearable Device for Real-Time Tracking of a Wide Range of ExercisesDanish Pruthi, Ayush Jain, Krishna Murthy Jatavallabhula, and Puneet TejaUKSIM 2015
We design a wearable device capable of tracking exercise activty entirely on a low-resource microcontroller.
I prototyped a few algorithms for tracking and recording repetitions and helped write the manuscript
@inproceedings{maxxyt, title = {Maxxyt: An Autonomous Wearable Device for Real-Time Tracking of a Wide Range of Exercises}, author = {Pruthi, Danish and Jain, Ayush and Jatavallabhula, {Krishna Murthy} and Teja, Puneet}, year = {2015}, booktitle = {UKSIM}, }