Back to the articles

Image to point cloud with Point-E

January 18th, 2023 - 2 min -
Avatar photo

Just before Christmas, OpenAI published its Point-E model. During the holidays, we took it for a spin and experimented with how easy it would be to integrate the model into a workflow. Read with me, or give it a run for its money yourself.

Colab notebook: Open In Colab

🤖 Point-E

Point-E is a deep learning model created by OpenAI that transforms a text caption into a colored point cloud. More specifically, Point-E consists of three steps, each handled by a dedicated ML model:

  1. Generate an image conditioned on a text caption
  2. Create a point cloud (1024 points) conditioned on the image
  3. Upsample the point cloud (to 4096 points) conditioned on the image and low-resolution point cloud

In this experiment we skip the first step and instead create a point cloud based on an image. This process typically results in higher-quality point clouds.

Let’s zoom in on step 2. Point-E uses a so-called diffusion model to generate point clouds. Intuitively, this model was trained to gradually remove noise from a point cloud. By initially giving it an input that is pure noise and repeatedly feeding its outputs to its inputs, we eventually end up with a clean point cloud. So the model takes in three inputs:

  • A noisy point cloud
  • A time step to keep track of how far we are in the diffusion process
  • A vector representation of an image, on which the denoising process is conditioned. In this case, the vector representation is a CLIP embedding. CLIP is an OpenAI model trained on text-image pairs and used here to create a meaningful vector representation of an image.

The authors created several million image-point cloud pairs to train the model by taking 3D renders (i.e., a 2D image from a 3D model) of a Blender model.

Input and output of the Point-E diffusion model (image-to-point cloud step)

Input and output of the Point-E diffusion model (image-to-point cloud step). A CLIP embedding, timestep and point cloud with noise are input. A denoised point cloud is the output. Image taken from Point-E paper.

Create a point cloud

Copy to Clipboard
Copy to Clipboard

Choose an image and paste its path here.

Copy to Clipboard
Christmas tree image with gifts in front of it
Christmas tree – Image taken from freepik

Create a point cloud.

Copy to Clipboard

Let’s look at the point cloud before we upload it to

Copy to Clipboard
Copy to Clipboard
Point cloud christmas tree in point e

Upload a point cloud to

To be able to store, manage and manipulate our point cloud. Let’s upload it to We will upload it to a point cloud segmentation dataset so that you can label individual points.

Create an account

If you don’t yet have an account on, you can create a free account for data labeling here (don’t worry, we don’t ask for your credit card 😉).

Copy to Clipboard

Create a dataset

Install the Python SDK.

Copy to Clipboard

Copy your API key from the settings page and create a dataset.

Copy to Clipboard

Upload a point cloud

To upload our point cloud, we use a function upload_pcd_to_segments (you can check out the colab notebook if you’re interested in implementing this function 🙂). It takes in the point cloud (positions and colors), sample name, and dataset name and uploads the point cloud to the dataset.

Copy to Clipboard
Point-e christmas tree and gifts

That’s it. Now you can use your point cloud on Curious what point clouds you will make! If you have any questions, feel free to reach out at or

Further reading

  • OpenAI’s Point-E
  • Diffusion models
  • OpenAI’s CLIP
  • Google’s Transformer

Share this article