Research & Projects


Below are selected research projects and publications led by me, grouped by theme. For additional projects that I have collaborated on or supported within the lab, please visit the RoboPI Lab website.

Underwater Cave Exploration with AUVs


CaveSeg1

CaveSeg Project

ICRA | 2024 | Paper | Dataset | Demo

In this project, we present CaveSeg, the first visual learning pipeline for semantic segmentation and scene parsing for AUV navigation inside underwater caves. We address the problem of scarce annotated data by preparing a comprehensive dataset for semantic segmentation of underwater cave scenes. It contains pixel annotations for important navigation markers (e.g. caveline, arrows), obstacles (e.g. ground plane and overhead layers), scuba divers, and open areas for servoing. Moreover, we formulate a novel transformer-based model that is computationally light and offers near real-time execution in addition to achieving state-of-the-art performance. We explore the design choices and implications of semantic segmentation for visual servoing by AUVs inside underwater caves.

Cave CavePI gif

CavePI Project

RSS | 2025 | Paper | Code | Demo

We developed a novel AUV (autonomous underwater vehicle) named CavePI for navigating underwater caves using semantic guidance from cavelines and other navigational markers. The compact design and 4-DOF (surge, heave, roll, and yaw) motion model enable safe traversal through narrow passages with minimal silt disturbance. Designed for single-person deployment, CavePI features a forward-facing camera for visual servoing and a downward-facing camera for caveline detection and tracking, effectively minimizing blind spots around the robot. A Ping sonar provides sparse range data to ensure robust obstacle avoidance and safe navigation within the caves. The computational framework is powered by two single-board computers: a Jetson Nano for perception, and a Raspberry Pi-5 for planning and control. We also present a digital twin of CavePI, built using ROS and simulated in an underwater environment via Gazebo, to support pre-mission planning and testing, providing a cost-effective platform for validating mission concepts.

Shared Autonomy & Embodied Teleoperation


hmi1

Human-Machine Interfaces for Subsea Telerobotics: Review

IEEE T-HMS (In Review) | 2025 | Pre-print

This project investigates the evolution of human-machine interfaces (HMIs) in subsea telerobotics, with a focus on enabling effective and adaptive human-robot collaboration beyond traditional teleoperation. Existing subsea systems largely rely on human-to-machine control with limited, delayed, and low-dimensional sensory feedback, constraining operator situational awareness and decision-making. We examine how subsea HMIs have progressed from narrow field-of-view, first-person “soda-straw” interfaces to modern systems incorporating immersive visualization, gesture-based interaction, haptics, and natural language communication. Through a systematic analysis of prior work, we study HMI design from the perspectives of operator experience, robotic autonomy, and bidirectional communication quality. Particular attention is given to persistent limitations in immersive feedback fidelity, intuitive control, and cross-platform standardization, as well as the role of simulators and digital twins for training and prototyping. The project further explores emerging shared autonomy paradigms that support seamless human-robot cooperation and outlines open challenges and future directions for intelligent, user-centric subsea HMI development.

EgoExopp1

EgoExo++ (Extension of Ego-to-Exo) Project

IJRR (In Review) | 2025 | Pre-print | Demo

We propose EgoExo++ that extends beyond 2D exocentric view synthesis (EgoExo) to augment a dense 2.5D ground surface estimation on-the-fly. It simultaneously renders the ROV model onto this reconstructed surface, enhancing semantic perception and depth comprehension. The computations involved are closed-form and rely solely on egocentric views and monocular SLAM estimates, which makes it portable across existing teleoperation engines and robust to varying waterbody characteristics. We validate the geometric accuracy of our approach through extensive experiments of 2-DOF indoor navigation and 6-DOF underwater cave exploration in challenging low-light conditions. Quantitative metrics confirm the reliability of the rendered Exo views, while a user study involving 15 operators demonstrates improved situational awareness, navigation safety, and task efficiency during teleoperation. Furthermore, we highlight the role of EgoExo++ augmented visuals in supporting shared autonomy, operator training, and embodied teleoperation. This new interactive approach to ROV teleoperation presents promising opportunities for future research in subsea telerobotics.

EgoExo1 EgoExo gif

Ego-to-Exo Project

ISRR | 2024 | Paper | Demo

Underwater ROVs (Remotely Operated Vehicles) are unmanned submersible vehicles designed for exploring and operating in the depths of the ocean. Despite using high-end cameras, typical teleoperation engines based on first- person (egocentric) views limit a surface operator's ability to maneuver and navigate the ROV in complex deep-water missions. In this paper, we present an interactive teleoperation interface that (i) offers on-demand “third”-person (exocentric) visuals from past egocentric views, and (ii) facilitates enhanced peripheral information with augmented ROV pose in real-time. We achieve this by integrating a 3D geometry-based Ego-to-Exo view synthesis algorithm into a monocular SLAM system for accurate trajectory estimation. The proposed closed-form solution only uses past egocentric views from the ROV and a SLAM backbone for pose estimation, which makes it portable to existing ROV platforms. Unlike data-driven solutions, it is invariant to applications and waterbody-specific scenes.

Underwater Data Center Surveillance


LCMAP1

LC-MAP: Acoustic Threat Localization Project

IEEE JoE (In Review) | 2025 | Pre-print | Demo

This project develops a comprehensive surveillance framework for localizing and tracking close-range adversarial acoustic sources targeting offshore infrastructures, particularly underwater data centers (UDCs). We propose a heterogeneous receiver configuration comprising a fixed hydrophone mounted on the facility and a mobile hydrophone deployed on a dedicated surveillance robot. While using enough arrays of static hydrophones covering large infrastructures is not feasible in practice, off-the-shelf approaches based on time difference of arrival (TDOA) and frequency difference of arrival (FDOA) filtering fail to generalize for this dynamic configuration. To address this, we formulate a Locus-Conditioned Maximum A-Posteriori (LC-MAP) scheme to generate acoustically informed and geometrically consistent priors, ensuring a physically plausible initial state for a joint TDOAFDOA filtering. We integrate this into an unscented Kalman filtering (UKF) pipeline, which provides reliable convergence under nonlinearity and measurement noise. Extensive Monte Carlo analyses, Gazebo-based physics simulations, and field trials demonstrate that the proposed framework can reliably estimate the 3D position and velocity of an adversarial acoustic attack source in real time. It achieves sub-meter localization accuracy and over 90% success rates, with convergence times nearly halved compared to baseline methods.