Tutorial: Running Stata in Galileo

Written and developed by Matthew Gasperetti

Tutorial: Running Stata in Galileo

Written and developed by Matthew Gasperetti

Licensing

Stata is propriety software licensed by StataCorp LLC. In order to run Stata on Galileo and comply with our terms of use, you must have a valid Stata 16 license.

We have implemented a bring your own license system. To use Stata with Galileo please add your stata.lic file from your home or office computer running Stata 16 to your Galileo project folder. On OS X, the file can typically be found at /Applications/Stata/stata.lic or by searching the file system.

We do not sell Stata licenses. Additional information on Stata can be found at https://www.stata.com. Any other inquiries about Stata or licensing can be directed to support@hypernetlabs.io.

Getting started with Stata in Galileo

To get started with Galileo log into your account using Firefox or Chrome, and download our Stata example file from GitHub.

Once the download completes, add a valid stata.lic file for Stata 16 from your home or office computer, as described above under licensing.

The downloaded file consists of a .do file, a .dta file, and a Dockerfile. We’ll try running this folder in Galileo, first, and then take a look at what’s happening behind the scenes. 

Let’s have a look at our files

The example_stata.do script conducts a logistic regression using the binary.dta dataset and makes a simple plot.

Next, our .do file conducts a Monte Carlo experiment that draws 500 observations from a χ²(1) distribution to calculate the sample average and another 500 observations to calculate the maximum likelihood. It does this 1,000 times. The results are stored in a file ef_comp.dta and the densities are plotted.

Finally, the .do file conducts a simple Bayesian regression using the auto.dta dataset that comes with Stata, summarizes the results, and creates diagnostic plots.

Understanding the user interface

When you log into Galileo, the first thing you’ll see is your Dashboard:

View of the Galileo Dashboard

To run the stata_example.do file, drag and drop the entire stata_example folder you downloaded from our GitHub to the station Galilei at the top of the Dashboard:

Drag and drop the stata_example folder to the Galilei station

After you drag and drop the stata_example folder to Galileo, you’ll be able to see the job running in the Your Recent Jobs panel. The job runs quickly in Galileo – try running it locally and comparing:

When the example job completes, hit the Download button under Action to download the results:

Download button

The results folder will be downloaded as a .zip that contains an output.log file returning the results of the analysis and a folder called filesys where plots and other files that were created by the analysis are stored.

The Downloaded .zip file contains a folder called filesys and a file called output.log

Let’s take a look at the output.log file first, which returns the results of the regression, Monte Carlo, and Bayesian analysis we ran:

Summary of the results of the Bayesian regression we ran

Next, if we look in the filesys folder, we can see the plots we made:

Bayesian Regression Diagnostic Plots

Getting started with Stata in Galileo

To get started with Galileo log into your account using Firefox or Chrome, and download our Stata example file from GitHub.

The downloaded file consists of a .do file, a .dta file, and a Dockerfile. We’ll try running this folder in Galileo, first, and then take a look at what’s happening behind the scenes. 

Let’s have a look at our files

The example_stata.do script conducts a logistic regression using the binary.dta dataset and makes a simple plot.

Next, our .do file conducts a Monte Carlo experiment that draws 500 observations from a χ²(1) distribution to calculate the sample average and another 500 observations to calculate the maximum likelihood. It does this 1,000 times. The results are stored in a file ef_comp.dta and the densities are plotted.

Finally, the .do file conducts a simple Bayesian regression using the auto.dta dataset that comes with Stata, summarizes the results, and creates diagnostic plots.

Understanding the user interface

When you log into Galileo, the first thing you’ll see is your Dashboard:

View of the Galileo Dashboard

To run the stata_example.do file, drag and drop the entire stata_example folder you downloaded from our GitHub to the station Galilei at the top of the Dashboard:

Drag and drop the stata_example folder to the Galilei station

After you drag and drop the stata_example folder to Galileo, you’ll be able to see the job running in the Your Recent Jobs panel. The job runs quickly in Galileo – try running it locally and comparing:

When the example job completes, hit the Download button under Action to download the results:

The results folder will be downloaded as a .zip that contains an output.log file returning the results of the analysis and a folder called filesys where plots and other files that were created by the analysis are stored.

The Downloaded .zip file contains a folder called filesys and a file called output.log

Let’s take a look at the output.log file first, which returns the results of the regression, Monte Carlo, and Bayesian analysis we ran:

Summary of the results of the Bayesian regression we ran

Next, if we look in the filesys folder, we can see the plots we made:

Bayesian Regression Diagnostic Plots

Running your own Stata files in Galileo—A closer look at how it works

A closer study of the files in our stata_example folder will help illustrate how to modify them so we can run other jobs. After that, we’ll have a look at the Galileo Docker Wizard, which helps automate the process. 

How to code a Dockerfile to run Stata in Galileo

Let’s quickly review the example Dockerfile, which you can open with a text editor like Atom.

The first thing to notice is that the file is called Dockerfile with no extension. It cannot be called anything else—Dockerfile2, Dockerfile copy, or Dockerfile.txt won’t work.

Looking at the Dockerfile with our text editor, the first Docker command we see is:

FROM hypernetlabs/stata:16batch

This tells Docker how to setup a Stata 16 environment. We want to leave it as is. 

Let’s look at the next line of code we see in our Docker file:

COPY . /data

This tells Docker where to look for, and where to save, our files and should be left as is.

The final command is:

ENV DOFILE=stata_example

This tells Docker the name of our .do file—notice the .do extension is not included

Here is the Dockerfile from the stata_example folder in its entirety with comments:

#The line below determines the build image to use
FROM hypernetlabs/stata:16batch
COPY . /data
#The entrypoint is the command used to start your project
ENV DOFILE=stata_example

Now, Let’s have a look at our .do file

The stata_example.do file should look familiar. However, there are a couple important things to note.

The first is that we install our dependencies using ssc install. For example, to install fitstat, we use the command:

ssc install fitstat

The other important thing to note is that we read in the dataset we are using, binary.dta, like it is in our working directory with the following command:

use binary.dta, clear

Notice the path is relative not absolute. There should not be a path to a directory anywhere in our .do file. The code below will NOT work and will cause an error:

cd “/Users/Matthew/Desktop/stata_example”
use binary.dta, clear

Let’s turn our attention to the datasets next

Both the binary.dta and auto.dta datasets are simple. No surprises here. I just wanted to show you two ways to access data: 1) including binary.dta in the folder you drag and drop to Galileo, and 2) by calling the preloaded auto.dta dataset directly from your .do file.

Running your own Stata files in Galileo—A closer look at how it works

A closer study of the files in our stata_example folder will help illustrate how to modify them so we can run other jobs. After that, we’ll have a look at the Galileo Docker Wizard, which helps automate the process. 

How to code a Dockerfile to run Stata in Galileo

Let’s quickly review the example Dockerfile, which you can open with a text editor like Atom.

The first thing to notice is that the file is called Dockerfile with no extension. It cannot be called anything else—Dockerfile2, Dockerfile copy, or Dockerfile.txt won’t work.

Looking at the Dockerfile with our text editor, the first Docker command we see is:

FROM hypernetlabs/stata:16batch

This tells Docker how to setup a Stata 16 environment. We want to leave it as is. 

Let’s look at the next line of code we see in our Docker file:

COPY . /data

This tells Docker where to look for, and where to save, our files and should be left as is.

The final command is:

ENV DOFILE=stata_example

This tells Docker the name of our .do file—notice the .do extension is not included

Here is the Dockerfile from the stata_example folder in its entirety with comments:

#The line below determines the build image to use
FROM hypernetlabs/stata:16batch
COPY . /data
#The entrypoint is the command used to start your project
ENV DOFILE=stata_example

Now, Let’s have a look at our .do file

The stata_example.do file should look familiar. However, there are a couple important things to note.

The first is that we install our dependencies using ssc install. For example, to install fitstat, we use the command:

ssc install fitstat

The other important thing to note is that we read in the dataset we are using, binary.dta, like it is in our working directory with the following command:

use binary.dta, clear

Notice the path is relative not absolute. There should not be a path to a directory anywhere in our .do file. The code below will NOT work and will cause an error:

cd “/Users/Matthew/Desktop/
stata_example”
use binary.dta, clear

Let’s turn our attention to the datasets next

Both the binary.dta and auto.dta datasets are simple. No surprises here. I just wanted to show you two ways to access data: 1) including binary.dta in the folder you drag and drop to Galileo, and 2) by calling the preloaded auto.dta dataset directly from your .do file.

Using the Docker Wizard to create your own project

If you drag and drop a folder to Galileo that does not contain a Dockerfile, you will see a Docker Wizard prompt:

The Docker Wizard helps automate creating a Docker file

To create a Dockerfile for a .do file called my_project.do that installs mixlogit and fitstat, enter the following settings into the Docker Wizard:

An example showing how to use Galileo’s Docker Wizard

Notice that you do not add the .do extension to my_project. It’s also important to mention that including mixlogit and fitstat as dependencies will install them via Docker by adding the following commands to the Dockerfile:

RUN stata-mp -b “ssc install mixlogit”
RUN stata-mp -b “ssc install fitstat”

If you install mixlogit and fitstat through Docker, you do not need to include the respective ssc install commands in your .do file. This saves time, but you can also omit these commands and install dependencies in your .do file. Both will work.

Once you complete your custom Dockerfile, make sure to add it to the project folder containing your my_project.do script, your data (if applicable), and your stata.lic file. Your folder should look like this:

Now that your folder looks right, drag and drop it onto Galilei in your Dashboard at https://app.galileoapp.io

I hope this tutorial was helpful. Please let me know if you have any questions or any problems using Galileo. Your feedback is extremely important to us. Contact me anytime at matthew@hypernetlabs.io.