Você está na página 1de 6

Introduction to STATA

Stata is an all purpose statistics package. Stata 8.0 is installed on all the lab machines. It is documented in Reference Manual, a Users Guide and a Getting Started manual. Each of these can be found in the lab. Stata also has a built in help utility and several on-line tutorials.

Starting Stata
To start Stata click on the Start button then point to programs then stata then Intercooled Stata. Four windows will appear, review, variables, stata results, and stata command. There may be a stata icon on the desktop.

The cd command
The cd command tells you what the current working directory is. This is directory where Stata will look for data or save data or save a log file. When you first open stata the working directory is C:\data. You can change the working directory by using the command cd. For example, the command

cd c:\ changes the working directory to c:\

The dir command The dir command lists the contents of the current directory. The command dir c:\stata lists the contents of the directory c:\stata Comments Stata treats a line preceded by a asterisk as a comment. * this is a comment The Display Command The command display allows you to use Stata as a calculator . di 2/3 + 9^(1/2) 3.6666667 Entering data There are four commonly used ways to enter data into Stata. Which method you will use depends upon the way the data is made available to you. A. Entering data from a previously saved Stata dataset. If the data has been saved in a stata dataset the simplest way to open it is to use the open command from the file menu. If the data

set has been stored with the name filename.dta it can also be loaded with the command use filename . You should try this. There is a file named auto.dta in the directory c:\stata8. Try the command C use auto. If you get the reply file auto.dta not found, that means that the file is not in the current directory. You can change the command to

use c:\stata8\auto

and it will load properly. The command use or open replaces the data in memory if there is any with new data. This would involve a loss of data if you have data in memory which has not been saved. In order to avoid this loss of data stata will return an error message and will not load the new data. In order to clear the existing data from memory and replace it with new data we enter the command use filename, clear. (Special note for econ209/210/310: If you are using a file which is saved on the Economics server Irving you can use the command: use \\\Irving\Data\econ209\wages. A simpler way in this case is to choose open from the file menu then go to Network then Irving then Data then econ209 then the file you are looking for. Similar instructions apply for econ 210 and 310.) B. Entering data from the keyboard. Data may be entered from the keyboard using the input command. Examples

C C

input x1 input x1 x2 x3

After the last observation is entered type end. C. Entering data from a file. Suppose the data consists of 100 observations of variables a, b, and c stored in a file filename.raw in the current working directory. This data can be loaded into stata using the command.

infile a b c using filename

The extension .raw is assumed if you do not specify otherwise. If the data file has another extension the full filename should be used. But in this case a simpler way is to use import from the file menu. The file menu import command contains other options for other types of data. D. Entering data using statas spreadsheet editor. Stata has a spreadsheet-like editor which can be accessed by using the command edit Instructions for using the editor are found in chapter 4 of the Getting Started Manual E. Data from other programs. If your data is stored as a special file from another program, Microsoft Excel for example, you can convert the data to a Stata dataset using the program StatTransfer, which is on each of the Econlab Machines.

Stata Commands Commands to report on data: The commands list, describe, and summarize can be used to examine the data. The command list lists all the variables observation by observation. For example, the command list x1 x2 x5 lists the variables x1, x2, and x5. The command describe describes the nature of all the variables in memory. The command summarize lists for each of the variables in memory the number of observations, the mean, the standard deviation, the maximum and the minimum. Describe and summarize can be restricted to a specific list of variables just like the command list. A few other commands for data reporting include table and tabulate. Commands to manipulate data: The most basic stata commands for manipulating data are the commands generate and replace. Generate is used to create a new variable; replace is used to replace the values of a variable which already exists. Generate and replace use the familiar arithmetic operators: +, -, *, /, ^. generate x3 = x2/x1 generate x4 = x3^2 replace x4 = x3^3 g lny = ln(y) (or lny =log(y)) g expy = exp(y) g sqrtx = sqrt(x) g y = x +z if t > 0 (dummy variables) gd=0 replace d = 1 if t >=1978 gd=0 replace d = 1 if age > 21 & age <= 35 g d = 1 if region == west Logical operators appear an many expressions. They are & (and), | (or) and ~ (not). Also the symbol == (double equal sign) means a relational equality rather than an assignment equality. Lagged variables. Lagged values of variables are frequently used in economics. A lagged variable can be created in using the following way: First define a time series index. For example this can be done with the command g t = _n followed by the command tsset t. Then lagged values can be designated as L.varname. For example L.y designates a lagged value of the variable y, L2.invest designates the variable invest lagged twice. The L can be capitalized or not as you prefer. The commands drop and keep: The commands drop and keep are used to alter the number of variables in the data set. For example the command drop x1 x2 x3 eliminates the variables x1, x2, and x3. The command keep x1 x2 x3 keeps the variables x1, x2, and x3 and drops all the rest. The command egen (extensions to generate): this command generates several built in functions including mean, standard deviation, median, count, moving average, row mean, row standard deviation, maximum, minimum, etc. ( If you need to use egen it is documented in the reference manual and in help egen.) Examples of egen follow: egen xbar= mean(x) egen xdev = x - xbar

egen z = mean(2*x2 + x4) egen racesex = group( race sex) Command Syntax With few exceptions stata commands have the following syntax [by varlist:] command [varlist] [= exp] [if exp] [in range] [weight] [, options] The elements enclosed by square brackets are optional. The by varlist: prefix instructs stata to perform the indicated operation for every value of the variables which appear in the varlist. For example the command by y: summarize x1 x2 summarizes the variables x1 and x2 for each value which y takes on. The if exp part of a command instructs stata to perform the command if the expression is non zero or true. For example, the command generate z = x1 + x2 if y creates the new variable z for all observation with a non zero y. The command summarize x if region == west summarizes the variable x for all observations in region west. (Note: the double equals sign means relational equality ). In range instructs Stata to perform the indicated operation for observations within a specified range. The command list x2 x3 in 34/41 lists the values of the variables x2 and x3 in observations 34 through 41. Weights specifies the weights for the command. For example the command summarize y [weights = x2] summarizes the variable y calculating the mean and standard deviation using the variable x2 as weights. Notice that in the case of weights the square brackets must be included. ( See help weights). Logging Results . The results of your work can be saved in a log file. The command log using filename, text opens a log file with the name filename.log. The log can be turned off by using the command log off. The log can be resumed with the command log on. When you are done you can close the log file with the command log close. A log file created in this way is a text file and can be edited with a word processing program or a text editing program. If you simply use the command log using filename, the file that will be saved will as filename.smcl. This file is neither readable or printable as an text file. However it can be converted to a printable file with the following command translate filename.smcl filename.log. Do Files Do files are programs or macros in stata. You can use a do file whenever there is a set of commands which you want to repeat. You can create a do file in any text editor such as wordpad or notepad or as text output from a word processor. Stata also has a built in do file editor. When you are working in stata the commands which you have executed appear in the review window. You can repeat a command by simply clicking on the command. You can save the commands in the review window as a do file by clicking on the box at the upper left of the review window and choosing the save review contents option. You can then use the do file editor to edit the commands.

Regression The command reg y x2 x3 x4 instructs stata to run the regression of y on a constant and the explanatary variables x2, x3, and x4. If a regression through the origin is desired the command is reg y x2 x3 x4, noconstant. There are a variety of other options. There are also a variety of post-regression commands: The command test can be used to test a variety of hypotheses about the coefficients . Examples: test x2 test x3 x4 test x2 = x4 test x2 = 3 test 2*x3+x4 = 4 These commands test respectively the following hypotheses: coefficient of x2 = 0, coefficients of x3 and x4 are both = 0, coefficient of x2 = coefficient of x4, coefficient of x2 =3, and 2 times the coefficient of x3 +coefficient of x4 = 4. The test command can be used to test two or more hypotheses jointly. For example, test x2 = x3 = x4 = x5 This command instructs stata to test the joint hypothesis that the coefficient of all these variables are equal. More complex joint hypotheses can be test using the accumulate option after the test command. For example: test x2 = x3 test x4 = 3, acc test x5 = 10*x2, acc This set of command test the joint hypothesis that the coefficients of x2 and x3 are equal, the coefficient of x4 = 3, and the coefficient of x5 = 10 time the coefficient of x2. Another useful post regression command is predict. This command can be used to, among other things, find the fitted values of y, find the residuals, or to make out of sample predictions. The command predict yhat puts the fitted values from the regression in the variable yhat. The command predict e , re puts the estimated residuals in a variable called e. The command predict se, stdp stores the estimated standard deviation of as the variable se. Predict can also be used to make out of sample predictions. In addition to the command predict there are several diagnostic command available after the regression command: vif, hettest, dwstat, ovtest. Accessing estimation results: Every estimation command saves information which can be accessed. For example, suppose I run the regression, reg y x2 x3 x4 , the coefficients and standard errors are saved as _b[variable name] and _se[variable name]. For example the coefficient of x3 is _b[x3]. Graphing : Stata has a wide array of graphic commands. The simplest command twoway scatter y x will produce a graph of y against x. The command twoway scatter y z x will produce a graph of y and z against x. Other estimation commands: Stata has a variety of estimation commands which are similar to the regress command. They include probit, oprobit, tobit, logit, poisson, etc.

Você também pode gostar