Image to Point Cloud with Point-E

By Arnout Hillen on January 18th, 2023

Just before Christmas, OpenAI published its Point-E model. During the holidays, we took it for a spin and experimented with how easy it would be to integrate the model into a Segments.ai workflow. Read with me, or give it a run for its money yourself.

Colab notebook: Open In Colab

Six image-point cloud pairs. The point clouds are generated by Point-E. Image taken from Point-E paper. Six image-point cloud pairs. Point-E generates the point clouds. Image taken from Point-E paper.

🤖 Point-E

Point-E is a deep learning model created by OpenAI that transforms a text caption into a colored point cloud. More specifically, Point-E consists of three steps, each handled by a dedicated ML model:

  1. Generate an image conditioned on a text caption
  2. Create a point cloud (1024 points) conditioned on the image
  3. Upsample the point cloud (to 4096 points) conditioned on the image and low-resolution point cloud

In this experiment we skip the first step and instead create a point cloud based on an image. This process typically results in higher-quality point clouds.

Let’s zoom in on step 2. Point-E uses a so-called diffusion model to generate point clouds. Intuitively, this model was trained to gradually remove noise from a point cloud. By initially giving it an input that is pure noise and repeatedly feeding its outputs to its inputs, we eventually end up with a clean point cloud. So the model takes in three inputs:

  • A noisy point cloud
  • A time step to keep track of how far we are in the diffusion process
  • A vector representation of an image, on which the denoising process is conditioned. In this case, the vector representation is a CLIP embedding. CLIP is an OpenAI model trained on text-image pairs and used here to create a meaningful vector representation of an image.

The authors created several million image-point cloud pairs to train the model by taking 3D renders (i.e., a 2D image from a 3D model) of a Blender model.

Input and output of the Point-E diffusion model (image-to-point cloud step). A CLIP embedding, timestep and point cloud with noise are input. A denoised point cloud is output. Image taken from Point-E paper. Input and output of the Point-E diffusion model (image-to-point cloud step). A CLIP embedding, timestep and point cloud with noise are input. A denoised point cloud is the output. Image taken from Point-E paper.

Create a point cloud

Let’s start. First, we need to install the Point-E repo and import the models (i.e., an image-to-point cloud and an upsampler model).

1
!pip install git+https://github.com/openai/point-e -q
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
from PIL import Image
import torch
from tqdm.auto import tqdm

from point_e.diffusion.configs import DIFFUSION_CONFIGS, diffusion_from_config
from point_e.diffusion.sampler import PointCloudSampler
from point_e.models.download import load_checkpoint
from point_e.models.configs import MODEL_CONFIGS, model_from_config
from point_e.util.plotting import plot_point_cloud

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print("Creating base model")
base_name = "base300M"  # Use base1B for better results
base_model = model_from_config(MODEL_CONFIGS[base_name], device)
base_model.eval()
base_diffusion = diffusion_from_config(DIFFUSION_CONFIGS[base_name])

print("Creating upsample model")
upsampler_model = model_from_config(MODEL_CONFIGS["upsample"], device)
upsampler_model.eval()
upsampler_diffusion = diffusion_from_config(DIFFUSION_CONFIGS["upsample"])

print("Downloading base checkpoint")
base_model.load_state_dict(load_checkpoint(base_name, device))

print("Downloading upsampler checkpoint")
upsampler_model.load_state_dict(load_checkpoint("upsample", device))

# Combine the image-to-point cloud and upsampler model
sampler = PointCloudSampler(
    device=device,
    models=[base_model, upsampler_model],
    diffusions=[base_diffusion, upsampler_diffusion],
    num_points=[1024, 4096 - 1024],
    aux_channels=["R", "G", "B"],
    guidance_scale=[3.0, 3.0],
)

Choose an image and paste its path here.

1
2
3
# Load an image to condition on
img_path = "<IMG_PATH>" # Fill in your image path
img = Image.open(img_path)
Christmas tree for 3D point cloud labeling Christmas tree. Image taken from freepik.com.

Create a point cloud.

1
2
3
4
5
6
7
# Produce a sample from the model (this takes around 3 minutes on base300M)
samples = None
for x in tqdm(
    sampler.sample_batch_progressive(batch_size=1, model_kwargs=dict(images=[img]))
):
    samples = x
pc = sampler.output_to_point_clouds(samples)[0]

Let’s look at the point cloud before we upload it to Segments.ai.

1
!pip install plotly -q
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import plotly.express as px

def rgb_to_hex(r: float, g: float, b: float) -> str:
    """Map a [0, 1] RGB to a [0, 255] RGB hex string"""
    return ("#{:02x}{:02x}{:02x}").format(int(r * 255), int(g * 255), int(b * 255))

x, y, z = pc.coords[:, 0], pc.coords[:, 1], pc.coords[:, 2]
colors = [
    rgb_to_hex(r, g, b)
    for r, g, b in zip(pc.channels["R"], pc.channels["G"], pc.channels["B"])
]  # Create a category per color
color_map = {hex: hex for hex in colors}  # Map a color to a category
fig = px.scatter_3d(x=x, y=y, z=z, color=colors, color_discrete_map=color_map)
fig.update_traces(showlegend=False)
fig.show()
Christmas tree 3D point cloud annotation

Upload a point cloud to Segments.ai

To be able to store, manage and manipulate our point cloud. Let’s upload it to Segments.ai. We will upload it to a point cloud segmentation dataset so that you can label individual points.

Create an account

If you don’t yet have an account on Segments.ai, you can create a free account for data labeling here (don’t worry, we don’t ask for your credit card 😉).

1
username = "<USERNAME>" # Fill in your Segments username

Create a dataset

Install the Segments.ai Python SDK.

1
!pip install segments-ai -q

Copy your API key from the settings page and create a dataset.

1
2
3
4
5
6
7
8
9
10
from segments import SegmentsClient

api_key = "<API_KEY>"  # Fill in your API key
client = SegmentsClient(api_key)

dataset_name = "image-to-pointcloud-with-openai-point-e"
description = "A dataset to upload point clouds made with OpenAI's Point-E model."
task_type = "pointcloud-segmentation"
dataset = client.add_dataset(dataset_name, description, task_type)
print("Dataset:", dataset)

Upload a point cloud

To upload our point cloud, we use a function upload_pcd_to_segments (you can check out the colab notebook if you’re interested in implementing this function 🙂). It takes in the point cloud (positions and colors), sample name, and dataset name and uploads the point cloud to the dataset.

1
2
3
4
5
6
7
positions = pc.coords
rgb = [
    [r, g, b] for r, g, b in zip(pc.channels["R"], pc.channels["G"], pc.channels["B"])
]
dataset_id = f"{username}/{dataset_name}"
name = "sample_point_cloud"  # Fill in a unique sample name if you want to upload more point clouds
upload_pcd_to_segments(client, dataset_id, positions, name, rgb=rgb)

That’s it. Now you can use your point cloud on Segments.ai. Curious what point clouds you will make! If you have any questions, feel free to reach out at arnout@segments.ai or support@segments.ai.

A 3D Christmas tree point cloud data annotation on Segments.ai.

Further reading

Arnout
Arnout Hillen
Share: