Deploying PyTorch on AWS Lambda

3 min read -
Avatar photo
- April 14th, 2020 -

Deploying PyTorch models cost-efficiently in the cloud is not straightforward. While GPU-accelerated servers can deliver results in real-time, they are quite expensive. CPU-only servers on the other hand are cheaper, but lack performance due to the computation intensive nature of deep learning. Serverless functions like AWS Lambda provide a good alternative, making up for their slower performance with the ability of drastic parallelization (up to 1000 in parallel), while only being charged for the time they are used. However, it is not trivial to adhere to Lambda’s memory constraints due to PyTorch’s large codebase.

In this article, we will show you how to import the PyTorch library into your Lambda functions by implementing an image classifier as an example. In the following article, we will discuss a faster, lightweight alternative by converting PyTorch models to ONNX and weigh the pros and cons.

Image classification with PyTorch on AWS Lambda

While PyTorch is a popular deep learning research library, it is not optimized for a production setting. Its large codebase of around 370 MB does not meet AWS Lambda 250 MB memory limit. In this section we will cover the steps to circumvent this limit, albeit at the expense of the function’s initialization time  ( the dreaded cold-start delay for which serverless functions are infamous). However, with the right “pre-warm” strategy it might just work for you. Let’s begin!

Exporting the PyTorch model

The first thing to do is to export/trace your PyTorch model into a TorchScript representation. This bundles the model definition together with the weights into a compact graph-like representation. In this example we use a pretrained ResNet-34 model, but the same steps apply to other models as well:

Copy to Clipboard

The exported model is then uploaded into an S3-bucket, so that it can be accessed from within the Lambda function.

Preparing the deployment package

To upload our codebase, Lambda requires us to create a deployment package, a zip archive containing the python code combined with the necessary dependencies. Therefore we create a new virtual environment and activate it:

Copy to Clipboard

By using this new environment, it will be easier to include the right dependencies later on . We also create a file and a folder “code”, initialized with two files, and, which we will fill in later. The file structure should look as follows:

Copy to Clipboard

Next we install PyTorch(v1.4.0) and TorchVision(v0.5.0):

Copy to Clipboard

Defining the Lambda function

In the file we write the code the Lambda function will execute. In this example we implement a basic image classifier which will return the class id of the requested image:

Copy to Clipboard

Unfortunately, the PyTorch codebase is too large (~370 MB) to fit within the 250 MB size limit of the deployment package. Therefore, we compress the library by zipping it, decreasing its size to 120 MB. At runtime we unzip the library under the /tmp directory, where we have an additional 500 MB of storage.

The unzipping is defined in, and called at the top of

Copy to Clipboard

Zipping up the deployment package

The Lambda’s deployment package consists of the Python files located in the code directory, as well as the required libraries, located under /venv/lib64/python3.6/site-packages/. As mentioned, we will zip the torch library to adhere to the size limit. We also remove some unnecessary files to make the package a bit smaller. Next, we upload the deployment package to an S3 bucket and update our Lambda function accordingly. All of this is done in the file, which contains the following code:

Copy to Clipboard


Once the Lambda function is updated, we can test its performance. Here we will make a distinction between a “cold-start”, which means that there is no cached container in memory, and a “warm-start”, where there is still a container waiting to be executed. Since our deployment package is quite large it takes a lot of time before Lambda has created our environment. In our example it took ~30 seconds on average for this initialization, which is not even worth considering for a production setting. On the other hand, a warm start was quite fast, averaging around 670ms in total, thanks to PyTorch and the model file already being cached in memory.


We managed to include PyTorch into our Lambda functions by adding it as a zipped archive to our deployment package and unzipping it on the fly. However, a cold-start averaging around 30 seconds is not practical for production settings. In contrast, a warm-start is quite fast (670ms for ResNet-34) and can be achieved by applying a good “pre-warm” strategy. Hence, if really necessary, you can use PyTorch within your Lambda functions, albeit at the expense of warming-up calls. In all other cases we suggest exporting your PyTorch model to a leaner runtime like ONNX, which we will discuss in a separate article (stay tuned!).