Running R and Data Science in the Cloud

There are many advantages to moving away from local data analysis and pushing it to the cloud. I’ve talked a little bit about them here. This post, however, isn’t intended to convince you even more. I’m hoping that you’re already convinced. And if you are, see below a complete tutorial of what I believe to be the quickest way to get up and running with R in the cloud.

 

Introducing R & RStudio Server

If you’re an R user, you might already be familiar with RStudio, a fantastic integrated development environment (IDE) for R, which extends the console capabilities in R to dynamic R scripts, data management tools, package management tools and a file explorer. RStudio Desktop allows users to access RStudio locally. RStudio Server, on the other hand, opens up access to the RStudio IDE in a web browser, giving users the ability to scale their analyses and decrease compute time. How is that so? RStudio Server can be deployed to many cloud hosting platforms which have the ability to scale up or down depending on user needs.

 

How do I get started?

To use R and RStudio in the cloud, we’ll need to get ourselves a server. Fortunately, the folks over at Digital Ocean make it super easy for us to do just that! Use my link to sign up and get $10 credit toward your bill! Then we’ll need to download R and RStudio Server to that server, do a couple of configuration changes, and voilla! You’re now well on your way to cloud computing. Let’s start with the server.

 

Creating a cloud computing instance with Digital Ocean

Get a $10 credit to your account by signing up through this link!

Head on over to digital ocean, where you’ll be greeted with a nice Sign-Up screen. After filling in your email and password, you’ll need to confirm your email address, update your billing preferences, then create a droplet. When you get to the create a droplet stage, swing on back and continue the tutorial!

 

Choosing and creating a droplet

Digital Ocean has many options for creating droplets, from meager 1 CPU/512 RAM up to 20 Core/64GB RAM. Depending on your computing needs, you will be able to scale accordingly. For the purpose of this tutorial, let’s choose the $10/mo option which has 1 CPU and 1GB RAM. Though R and RStudio should work just fine on the smallest option as well, I’ve found that 512 MB of memory isn’t enough to cover the overhead of installing packages and doing maintenance, and hacks are needed to make the system somewhat usable. I should note that I use a 2 CPU/2 GB RAM on my instance (the $20/mo one).

 

Creating a Digital Ocean Droplet - Name it RStudio. Choose the $10 (1 CPU/1GB RAM option).

Creating a Digital Ocean Droplet – Name it RStudio. Choose the $10 (1 CPU/1GB RAM option).

 

The next step is to choose the location of your server. Currently, Digital Ocean provides servers in New York, San Francisco, Amsterdam, London, Frankfurt and Singapore. Since I’m located in California, I’ve chosen San Francisco. If you’re on the East Coast, New York would be a better option. You won’t need to check any of the boxes under Available Settings.

 

Select the region that is closest to you. No need to check any of the boxes under 'available settings.'

Select the region that is closest to you. No need to check any of the boxes under ‘available settings.’

 

Finally, under Select Image, choose UBUNTU 14.04 x64. R and RStudio will run on other operating systems, though my main experience has been under Ubuntu, and it’s worked great! No need to add any SSH keys at this step (though for extra security, you can read digital ocean’s information on SSH keys). Once done, click on Create Droplet and we’re done creating our droplet!

 

Digital Ocean

Select UBUNTU 14.01 x64 and click on Create Droplet. If you know how to use SSH keys, go ahead and add them. Otherwise, read DOs information on SSH keys.

 

This concludes the first step! Read this next post to learn how to configure your new instance. We’ll then go through installing R, updating R, then installing RStudio.

Leave a Reply

Next ArticleRunning R and Data Science in the Cloud Part II