Lecture 4: Basic Programming Structures

In this lecture we will review some basic programming structures by working through a practical application: computing annual population growth rates for all countries with available data between 1960 and 2010.

There are two do-files associated with this lecture:

(1) Lecture 4 notes.do Provides the basic examples outlined in the lecture notes below.
(2) Lecture 4 extended example.do Provides the extended example (computing population growth rates) that we will go through together in class.

Before we get started, however, we need to introduce two concepts: macros and loops . These are relatively advanced techniques that seem intimidating and my be hard to understand at first but once you master them, they will vastly improve your ability to perform tasks efficiently in Stata.

4.1 Macros

A macro simply associates a name with some text (or numbers) – macros are objects stored in memory (they are not variables in your dataset!). The macro can be referenced anywhere in a program. So they are great to store “things” that are repeated multiple times in your code, but that might change frequently. (By using macros you avoid having to replace all the instances of that “thing” in your do files).

Here is a stupid but hopefully illustrative example (this example will NOT work in stata, it is just to demonstrate the concept of macros): say you were responsible for writing a generic press release about the president in a country where the president changes every day:

Yesterday Mr. Gonzalez was sworn into office as president of Volatilistan. Mr. Gonzales will only have 24 hours to enact the critical reforms the country needs to move forward. Mr. Gonzalez faces the huge challenge of improving investor confidence…

The next day the president changes and you need to change the name of the president but you don’t want to manually change every instance of the name. Here is where a macro comes in handy. What you do is generalize your code by substituting the specific value of the president’s name with the general class of values it represents, in this case a last name. So you could re-write the paragraph as follows:

define lastname Perez

Yesterday Mr. {lastname} was sworn into office as president of Volatilistan. Mr. {lastname} will only have 24 hours to enact the critical reforms the country needs to move forward. Mr. {lastname} faces the huge challenge of improving investor confidence…

In this way, everyday we only need to make 1 replacement as {lastname} will refer (or evaluate) to the contents of the macro lastname, which we define above the paragraph.

The key thing to remember is that there are two steps to using macros – (1) Defining macros and (2) Evaluating macros in the places where they will be used to perform a function. Stata has a specific syntax for defining and evaluating macros which we will see next.

Macros can either be local or global in scope. Local macros can have names up to 31 characters and are available only in within one context (e.g. a do file). In this lecture we will focus on local macros.

4.1.2 Defining and evaluating local macros

Local macros are defined as follows: local name [=] text
Local macros are evaluated as follows: `name'

Important Please note the use of the backtick <`>. On most keyboards, this is located at the top left, under the <esc> key.

Note that the use of = when you define the local is optional, the definition without an equal sign is useful when want to store large amounts of text.

Lets try and example. Say we need to run a regression to explore the relationship between income and education with a bunch of control variables. You can store those control variables in a local called controls:

local controls age agesquared male urban

So instead of running the regression as follows:

regress income education age agesquared male urban

you can use the local to reference the controls:

regress income education `controls'

You can see that in a setting where your boss is continually telling you to change the control variables, using locals will be helpful. You could create a “menu” of control variable specifications and deploy them as needed. For example:

local controls1 age agesquared
local controls2 age agesquared male
local controls3 age agesquared male urban
local controls4 age agesquared male urban dependents
regress income education `controls4'

Warning: You must spell the names of your macros correctly. If in the example above you wrote regress income education `cntrls' this would not give you an error as `cntrls' would just evaluate to an empty string.

4.1.3 Storing results in local macros

Another very useful application of macros, is to use them to capture results from specific commands that estimate regression coefficients, summary statistics etc.

To capture results, we use the second type of macro definition: local name = text The use of the equal sign tells stata to treat the text on the right hand side as an expression, evaluate it and store a representation of the result under the given name.

Lets work through a basic example. Lets say we want to capture the average value of a variable in a local macro. In this example, we are going to use the system dataset: auto.dta to compute the average miles per gallon of the cars in the dataset.

Lets open the datset and use the sum command on the variable mpg:

sysuse auto, clear
sum mpg

You should get the following output in the result window:

 . sum mpg

Variable | Obs Mean    Std. Dev. Min Max
mpg      | 74  21.2973 5.785503  12 41

In order to make use of these results, we need to look under the hood to see how stata stores these in memory. Submit the following command: return list You will see the following:

                  r(N) =  74
              r(sum_w) =  74
               r(mean) =  21.2972972972973
                r(Var) =  33.47204738985561
                 r(sd) =  5.785503209735141
                r(min) =  12
                r(max) =  41
                r(sum) =  1576

Here we see that stata spits out 8 “scalar” quanitites. Stata scalars are named entities that store single numbers or strings,which may include missing values. Type help scalar if you want to find out more.

In the output above, we see that the mean of the variable mpg is stored in a scalar called r(mean). So if we want to store that value in our own local, we can do the following:

sum mpg
local meanmpg = r(mean) 
display `meanmpg'

Note how we use the command display to display the contents of the local meanmpg. We will not see the contents if we were to type display meanmpg . Note the error that you get if you try this, and try to understand why you get the error. Lets try another example, this time to store the r-squared (r-squared is just a statistic that tells us how much of the variation in the dependent variable is “explained” by the variation in the independent variable) value of a regression of miles per gallon versus the weight of the car. First lets see what kind of results the regress command stores in memory. (Regress, is a command of type “e-class” so we use the command ereturn list after running the regression to see what kinds of results the command generates:

regress mpg weight
ereturn list
                  e(N) =  74
               e(df_m) =  1
               e(df_r) =  72
                  e(F) =  134.618242241237
                 e(r2) =  .6515312529087511
               e(rmse) =  3.438889631047954
                e(mss) =  1591.990203053362
                e(rss) =  851.4692564060979
               e(r2_a) =  .6466914091991505
                 e(ll) =  -195.3886885951764
               e(ll_0) =  -234.3943376482347
               e(rank) =  2

            e(cmdline) : "regress mpg weight"
              e(title) : "Linear regression"
          e(marginsok) : "XB default"
                e(vce) : "ols"
             e(depvar) : "mpg"
                e(cmd) : "regress"
         e(properties) : "b V"
            e(predict) : "regres_p"
              e(model) : "ols"
          e(estat_cmd) : "regress_estat"

                  e(b) :  1 x 2
                  e(V) :  2 x 2


Notice how the regress command stores so much more information than the sum command. Regress stored results as scalars, macros and matrices. In this example, we are only interested in the result stored in: e(r2) To capture that result in our own local macro, we do the following:

regress mpg weight
local rsquared = e(r2)
di `rsquared'

4.2 Loops

Loops are used to do repetitive tasks. Stata has commands that allow looping over sequences of numbers and various types of lists, including lists of variables.

4.2.1 Looping over sequences of numbers

The first type of loop we will look at takes the following form:

forvalues i = 1/10 { 
    di `i'

Here i is a local macro which you can call whatever you like, 1/10 is a sequence of numbers that increases from 1 to 10 in steps of 1. But you can define the starting point, the end point and the size of the steps however you like, for example in the following loop we go from 1000 to 2000 in steps of 50:

forvalues number = 1000(50)2000 { 
    di `number'

4.2.2 Looping over elements in a list

In this type of loop, the list members can be anything.

foreach animal in cats and dogs { 
    display "`animal'" 
foreach level in 2 6 12 {
	display "Do something with `level'"

4.2.3 Looping Over Specialized Lists

Looping over variables:

foreach varname of make price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign { 
   describe `varname'

Again, note that you can call varname whatever you like – this is the name of the local macro you will evaluate in the loop.

Looping over words in macros

local controls age agesquared education 
foreach control of local controls {
   display "`control'" 

Looping over a list of numbers

foreach year of numlist 1980 1985 1995 {
   display "`year'" 

Ok, that is an overview of the kinds of loops you will encounter and use 99 percent of the time.

To give you a simple example of how macros and loops work well in combination, say you want to draw obtain the r-squared statistic that measures the strength of the linear association of price versus miles per gallon for foreign and domestic cars. We could do the following: (here I introduce the levelsof command, see if you understand what this accomplishes and for what types of variables this operation makes sense). Also pay attention to what the quietly command accomplishes.

gen cartype = "Foreign" if foreign == 1
replace cartype = "Domestic" if foreign == 0
levelsof cartype, local(cartypes)
foreach cartype of local cartypes { 
   quietly: regress price mpg if cartype == "`cartype'" 
   local rsq =  e(r2)
   di "`cartype': rsquared = `rsq'" 

Now suppose we wanted to store the rsquared results so that we could generated a stata dataset of results later. For this, matrices are handy. We start by creating an empty matrix with the appropriate dimensions. In this case we want a matrix that is 2 rows by 2 columns (the first column to store the identify the cartype and the second column to store the values of rsquared). To create an emptry matrix (in this example we call it “results” but we could name it anything) we use the matrix command as follows.

matrix results = J(2,2,.)
matrix list results

The matrix list command displays the matrix called “results” in the results window.

To use the matrix “results” to store the r-squared values we use subset notation to tell stata in which cell of the matrix to insert a value. So, for example results[1,1] refers to the first row, first column cell of the matrix, while results[2,1] refers to the second row, second column cell of the matrix. The following example will make clear how this works:

matrix results = J(2,2.) 
matrix list results 
matrix results[1,1] = 1
matrix results[1,2] = 2
matrix results[2,1] = 3
matrix results[2,2] = 4
matrix list results

we could also do this same thing in a loop:

matrix results = J(2,2.) /* Defining the matrix */ 
local i = 1              /* Starting a counter */ 
forvalues rows = 1/2 {   /* Row subscripts */ 
forvalues cols = 1/2 {   /* Column subscripts */ 
    display "row: `rows' col: `cols' value: `i'" 
    matrix  results[`rows',`cols'] = `i' /*storing `i' in matrix*/
    local i = `i' + 1 /* Increasing the counter by 1 */ 
matrix list results

Make sure you understand what is happening at each point in the code above! This example forms the backbone of the extended example we will cover in the lecture (computing population growth rates for each country.


Your email address will not be published. Required fields are marked *