By Bert De Brabandere on August 5th, 2020
A large dataset of labeled images is the first thing you need in any serious computer vision project. Building such datasets is a time-consuming endeavour, involving lots of manual labeling work. This is especially true for tasks like image segmentation where the labels need to be very precise.
One way to drastically speed up image labeling is by leveraging your machine learning models from the start. Instead of labeling the entire dataset manually, you can use your model to help you by iterating between image labeling and model training.
This tutorial will show you how to achieve such a fast labeling workflow for image segmentation with Segments.ai.
Segments.ai is a labeling platform with powerful automation tools for image segmentation. It also features a flexible API and Python SDK, which enable you to quickly set up custom workflows by uploading images and labels directly from your code.
We will walk you through a simple but efficient setup:
You can find all code for this tutorial on Github, or follow along on Google Colab.
First, we need some images to label.
If you have a folder of images on your pc, you can simply upload them to Segments.ai through the web interface: first create a new dataset, then upload the samples.
But let’s assume your data is in the cloud, and all you have is a list of image URLs. In this case, you can upload them to Segments.ai using our API or Python SDK. You need an API key for this, which can be created on your account page.
In this tutorial, our goal is to label a dataset of 100 tomato images. First, we upload the images using the Python SDK:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from segments import SegmentsClient # Install this package with 'pip install segments-ai'
from utils import get_image_urls
# Set up the client
client = SegmentsClient('YOUR_API_KEY')
dataset_name = 'bert/tomatoes' # Name of a dataset you've created on Segments.ai
# Get a list of image URLs
image_urls = get_image_urls('tomatoes')
# Upload the images to Segments.ai
for i, url in enumerate(image_urls):
sample_name = 'image_{:05}.jpg'.format(i)
attributes = {
"image": { "url": url }
}
result = client.add_sample(dataset_name, sample_name, attributes)
Once the images are uploaded, click the “Start labeling” button on the samples tab of your dataset and get to work! Rather than immediately labeling the entire dataset, let’s start out by labeling around 20 images.
Segments.ai’s deep learning fueled superpixel tool makes the labeling a breeze.
After you’ve labeled a few images, go to the releases tab of your dataset and create a new release, for example with the name “v0.1”. A release is a snapshot of your dataset at a particular point in time.
Through the Python SDK, we can now initialize a SegmentsDataset from this release and visualize the labeled images. The SegmentsDataset is compatible with popular frameworks like PyTorch, Tensorflow and Keras.
1
2
3
4
5
6
7
8
9
10
11
from segments import SegmentsDataset
from utils import visualize, train_model
# Initialize a dataset from the release file
release = client.get_release(dataset_name, 'v0.1')
dataset = SegmentsDataset(release, task='segmentation', filter_by='labeled')
# Visualize a few samples in the dataset
for sample in dataset:
print(sample['name'])
visualize(sample['image'], sample['segmentation_bitmap'])
Next, let’s train a computer vision model on the labeled images. Here we use Facebook’s Detectron2 framework to train the model, but you can just as easily plug in your own custom models and training code.
1
2
3
# Train an instance segmentation model on the dataset
from utils import train_model
model = train_model(dataset)
When the model is trained, we can run it on the unlabeled images to generate label predictions, and upload these predictions to Segments.ai:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from segments.utils import bitmap2file
# Initialize a new dataset, this time containing only unlabeled images
dataset = SegmentsDataset(release, task='segmentation', filter_by='unlabeled')
for sample in dataset:
# Generate label predictions
image = sample['image']
segmentation_bitmap, annotations = model(image)
# Visualize the predictions
visualize(image, segmentation_bitmap)
print(annotations)
# Upload the predictions to Segments.ai
file = bitmap2file(segmentation_bitmap)
asset = client.upload_asset(file, 'label.png')
attributes = {
'format_version': '0.1',
'annotations': annotations,
'segmentation_bitmap': { 'url': asset.url },
}
client.add_label(sample['uuid'], 'segmentation', attributes, label_status='PRELABELED')
Now go back to Segments.ai and click the “Start labeling” button again to continue labeling. This time, your job is quite a bit easier: instead of having to label each image from scratch, you can simply correct the few mistakes your model made!
The superpixel technology makes it very easy to correct the mistakes, and is a real time-saver here.
As you keep iterating between model training and labeling in this manner, your model will quickly get better and better. You’ll reach a point where you’re mostly just verifying the model’s predictions, only having to correct the occasional mistakes on hard edge cases.
Was this useful for you? Let us know! Make sure to check out the Segments.ai documentation and don’t hesitate to contact us if you have any questions.