By Arnaud Hillen on January 18th, 2023
Just before Christmas, OpenAI published its Point-E model. During the holidays, we took it for a spin and experimented with how easy it would be to integrate the model into a Segments.ai workflow. Read with me, or give it a run for its money yourself.
Point-E is a deep learning model created by OpenAI that transforms a text caption into a colored point cloud. More specifically, Point-E consists of three steps, each handled by a dedicated ML model:
In this experiment we skip the first step and instead create a point cloud based on an image. This process typically results in higher-quality point clouds.
Let’s zoom in on step 2. Point-E uses a so-called diffusion model to generate point clouds. Intuitively, this model was trained to gradually remove noise from a point cloud. By initially giving it an input that is pure noise and repeatedly feeding its outputs to its inputs, we eventually end up with a clean point cloud. So the model takes in three inputs:
The authors created several million image-point cloud pairs to train the model by taking 3D renders (i.e., a 2D image from a 3D model) of a Blender model.
Let’s start. First, we need to install the Point-E repo and import the models (i.e., an image-to-point cloud and an upsampler model).
1
!pip install git+https://github.com/openai/point-e -q
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
from PIL import Image
import torch
from tqdm.auto import tqdm
from point_e.diffusion.configs import DIFFUSION_CONFIGS, diffusion_from_config
from point_e.diffusion.sampler import PointCloudSampler
from point_e.models.download import load_checkpoint
from point_e.models.configs import MODEL_CONFIGS, model_from_config
from point_e.util.plotting import plot_point_cloud
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Creating base model")
base_name = "base300M" # Use base1B for better results
base_model = model_from_config(MODEL_CONFIGS[base_name], device)
base_model.eval()
base_diffusion = diffusion_from_config(DIFFUSION_CONFIGS[base_name])
print("Creating upsample model")
upsampler_model = model_from_config(MODEL_CONFIGS["upsample"], device)
upsampler_model.eval()
upsampler_diffusion = diffusion_from_config(DIFFUSION_CONFIGS["upsample"])
print("Downloading base checkpoint")
base_model.load_state_dict(load_checkpoint(base_name, device))
print("Downloading upsampler checkpoint")
upsampler_model.load_state_dict(load_checkpoint("upsample", device))
# Combine the image-to-point cloud and upsampler model
sampler = PointCloudSampler(
device=device,
models=[base_model, upsampler_model],
diffusions=[base_diffusion, upsampler_diffusion],
num_points=[1024, 4096 - 1024],
aux_channels=["R", "G", "B"],
guidance_scale=[3.0, 3.0],
)
Choose an image and paste its path here.
1
2
3
# Load an image to condition on
img_path = "<IMG_PATH>" # Fill in your image path
img = Image.open(img_path)
Create a point cloud.
1
2
3
4
5
6
7
# Produce a sample from the model (this takes around 3 minutes on base300M)
samples = None
for x in tqdm(
sampler.sample_batch_progressive(batch_size=1, model_kwargs=dict(images=[img]))
):
samples = x
pc = sampler.output_to_point_clouds(samples)[0]
Let’s look at the point cloud before we upload it to Segments.ai.
1
!pip install plotly -q
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import plotly.express as px
def rgb_to_hex(r: float, g: float, b: float) -> str:
"""Map a [0, 1] RGB to a [0, 255] RGB hex string"""
return ("#{:02x}{:02x}{:02x}").format(int(r * 255), int(g * 255), int(b * 255))
x, y, z = pc.coords[:, 0], pc.coords[:, 1], pc.coords[:, 2]
colors = [
rgb_to_hex(r, g, b)
for r, g, b in zip(pc.channels["R"], pc.channels["G"], pc.channels["B"])
] # Create a category per color
color_map = {hex: hex for hex in colors} # Map a color to a category
fig = px.scatter_3d(x=x, y=y, z=z, color=colors, color_discrete_map=color_map)
fig.update_traces(showlegend=False)
fig.show()
To be able to store, manage and manipulate our point cloud. Let’s upload it to Segments.ai. We will upload it to a point cloud segmentation dataset so that you can label individual points.
If you don’t yet have an account on Segments.ai, you can create a free account for data labeling here (don’t worry, we don’t ask for your credit card 😉).
1
username = "<USERNAME>" # Fill in your Segments username
Install the Segments.ai Python SDK.
1
!pip install segments-ai -q
Copy your API key from the settings page and create a dataset.
1
2
3
4
5
6
7
8
9
10
from segments import SegmentsClient
api_key = "<API_KEY>" # Fill in your API key
client = SegmentsClient(api_key)
dataset_name = "image-to-pointcloud-with-openai-point-e"
description = "A dataset to upload point clouds made with OpenAI's Point-E model."
task_type = "pointcloud-segmentation"
dataset = client.add_dataset(dataset_name, description, task_type)
print("Dataset:", dataset)
To upload our point cloud, we use a function upload_pcd_to_segments
(you can check out the colab notebook if you’re interested in implementing this function 🙂). It takes in the point cloud (positions and colors), sample name, and dataset name and uploads the point cloud to the dataset.
1
2
3
4
5
6
7
positions = pc.coords
rgb = [
[r, g, b] for r, g, b in zip(pc.channels["R"], pc.channels["G"], pc.channels["B"])
]
dataset_id = f"{username}/{dataset_name}"
name = "sample_point_cloud" # Fill in a unique sample name if you want to upload more point clouds
upload_pcd_to_segments(client, dataset_id, positions, name, rgb=rgb)
That’s it. Now you can use your point cloud on Segments.ai. Curious what point clouds you will make! If you have any questions, feel free to reach out at arnaud@segments.ai or support@segments.ai.