Back to the articles

Labeling challenges for underwater robotics

November 3rd, 2023 - 3 min -
Avatar photo

Ocius Technology Ltd., an Australian firm committed to autonomous maritime surveillance, presented recently the newest addition to its AUV-line (Autonomous Underwater Vehicle): The iDrogue.

Sarina, a robotics software engineer and the perception lead at Ocius at that time, shares her insights into the challenges and successes of developing an innovative underwater robotics system from scratch.

A novel approach to underwater robotics perception

In the project, the iDrogue team uses for validation. The portion of the project that is primarily the responsibility of Sarina is the development of algorithms to detect and track a target in 3D.

“Specifically, the computer vision task is to automatically find a particular target object – of known dimensions and appearance – and obtain its position in all 6 degrees (x, y, z, roll, pitch, yaw). “

The setup for this task:

  • a stereo camera, which outputs both a 2D RGB image and a 3D point cloud
  • a 2D sonar that outputs another point cloud.

The algorithms fuse information from all 3 sources in order to produce a final detection result. In order to evaluate and performance-test these algorithms, a way to compare the output of these automated detections against a ground truth is needed

This is where Ocius looked at a 3D labeling platform.

Challenges, solutions, and lessons learned in labeling underwater data

The working environment of the iDrogue unfortunately resulted in relatively low-resolution point clouds with a considerable amount of noise.

3D labeling of the ground truth was even for a human a hard task. These difficulties are the reason we were working with the information fused from multiple sensors for our computer vision task.

Correspondingly, it would also take a human cross-referencing multiple sensor outputs to arrive at a ground truth label.

Thus, our vital requirement in choosing a platform was having the ability to simultaneously view a fusion of 2D image and 3D point cloud data.

Employing for Vital Validation

Once we found, our process was to generate datasets of all our combined sensor data with the correct camera calibration info in order to overlay images on the point.

The fact that could host the files themselves for upload into datasets made getting started quick and easy.

Once uploaded (using the Python SDK), multiple members of the team joined in on the shared project to label the ground truth of our target in all 6 degrees of freedom.

Often, the noise and poor definition of the point clouds meant that labeling was primarily done off the camera image, which was only made possible by the perspective overlay of the 2D and 3D data.

Additionally, because our datasets were each a single recording of data collection, we used the point cloud sequence format instead of just the raw point cloud format. Being able to take advantage of’s sequence features sped up the process considerably.

Since we were always labeling the same object of known dimensions, being able to pre-configure the bounding box size so that we never had to spend time resizing was very convenient. As was the way the bounding box would automatically propagate into the next frame so that it only had to be adjusted/verified instead of recreated.

After labeling, the results were exported into’s JSON script, and I wrote a script that parsed both the ground truth labels and the output of our system’s automated detections to compare and compute accuracy statistics.

One of our biggest challenges was the difficulty in collecting data

As an early-stage project, we were defining our own workflow without any pre-existing frameworks to aid, either in terms of process and in terms of physical equipment.

In the absence of precision testing rigs that could move our target through a concretely defined ground truth, we found ways to work with what we had.

If we lacked a way to determine the ground truth without some level of uncertainty, making it difficult to compute an exact success value for performance accuracy, we could ensure that this uncertainty would at least be consistent.

Thus, in comparing datasets, their relative performance would still be a meaningful measure, even if their absolute performance was not.

Having a way to display all available data to a human reviewer, with as many helpful aids as possible, was one of our tools in doing this.

About Sarina Hao

Sarina leveraged her background in mechatronic engineering and experience in industrial automation, LIDAR, and sonar technology, navigating through a flourishing Sydney robotics startup scene to delve into a project that aimed to bring a preliminary concept of underwater robotic perception to life. While this marked her full-fledged dive into a career in robotics, it spotlighted the importance of experimentation and adaptation within the rapidly evolving robotics sector.

Follow Sarina on Linkedin.

Share this article