GPU Cloud Computing Services Compared: AWS, Google Cloud, IBM Nimbix/Power AI, and Crestle

This technical article was written for The Data Incubator by Tim Pollio, a Fellow of our 2017 Fall cohort in Washington, DC who joined The Data Incubator team as one of our resident Data Scientist instructors.

At The Data Incubator, a data science training and placement company, we’re excited about the potential for neural networks and deep learning to transform AI and Big Data. Of course, to practically run deep learning, normal CPUs won’t suffice — you’ll need GPUs. GPUs can dramatically increase the speed of deep learning algorithms, so it’s no surprise that they’re becoming increasingly popular and accessible. Amazon, Google, and IBM all offer GPU enabled options with their cloud computing services, and newer companies like Crestle provide additional options.

We tried four different services — Amazon Web Services, Google Cloud Platform, Nimbix/PowerAI, and Crestle — to find the options with the best performance, price, and convenience. Each service was tested using the same task: 1000 training steps on a tensorflow neural network designed for text prediction. The code for this benchmark can be found here.

Performance doesn’t vary much, but Nimbix is a bit slower

Runtime was recorded every 50 steps and had very low variance. The bars on the plot above represent 99% confidence intervals for the 1000 step totals.

AWS, GCP, and Crestle all completed the test in about 3.2 minutes. It’s not a surprise that these results are so close since these services all use the Tesla K80, which NVIDIA advertises as the world’s most popular GPU. Nimbix uses a P100 Tesla GPU with NVLink, which took about 15% longer, but this is still much faster than the 14.3 minute no-GPU test run on the author’s local machine.

Nimbix has the lowest price, but Google has the best free trial

Nimbix is the cheapest long term option at $0.43/hour, but GCP offers a $300 free credit that you can use over the course of a year. AWS is the most expensive option.

Adjusting for runtime doesn’t change the rankings. IBM’s Nimbix/PowerAI is still the cheapest.

Crestle is amazingly convenient, Google is not

Crestle is effortless to setup and use. You’ll have browser access to a gpu enabled jupyter server immediately after signing up. GPU drivers and tensorflow come pre-installed, so there are no extra steps to slow you down. The 1 hour free trial doesn’t require a credit card, and can be split into multiple sessions.

Nimbix/PowerAI is slightly less convenient. Again, you’ll have access to a jupyter server with a deep learning framework pre-installed, but the initial setup requires SSH, which may be a turnoff for those who are less engineering inclined.

Setup on AWS takes longer because there are many options for customization. Free-tier users will also need to request a quota increase for EC2 instances. Using a deep learning machine image will let you spin up a virtual machine with all of the standard software pre-installed. However, you will not have access to the instance through the browser. Instead you’ll need to connect using an SSH client (using credentials generated in the AWS console).

Setup on GCP resembles AWS in that there are many options for customization. New users will also need to request a quota increase for GPU access. Google’s compute engine lets you spin up a virtual machine, and while you can conveniently access the command line through the GCP console, the image you get is generic: deep learning framework (including GPU drivers) not included. This deficiency may be tolerable for long projects, but it’s a deal breaker if you’re trying to move quickly. Installing everything yourself can be a chore, especially since version requirements for tensorflow-gpu are a moving target.

Details

This section summarizes the hardware and software options used with each service.

Amazon Web Services (Amazon EC2)

Deep Learning AMI (Ubuntu) running on a p2.xlarge instance
4 vCPUs, 1 NVIDIA Tesla K80
Note: Needed to request a quota increase to change allowed EC2 instances from 0 to 1.

Google Cloud Platform (Compute Engine)

4 vCPUs, 16 GB RAM, 1 NVIDIA Tesla K80
Ubuntu 16.04 LTS, 50GB disk
Manually installed cuda 8.0, cuDNN 6.0, tensorflow-gpu
Note: Needed to request a quota increase to change the number of allowed GPUs from 0 to 1

Crestle

1 NVIDIA Tesla K80

Nimbix/PowerAI

P100 Tesla GPU with NVLink
1 CPU core, 32GB memory

Related Blog Posts