IVNONLINEAR MODELS AND RELATED TOPICS
We now apply the general methods of Part III to study speciﬁc nonlinear models that often arise in applications. Many nonlinear econometric models are intended to ex-plain limited dependent variables. Roughly, a limited dependent variable is a variable whose range is restricted in some important way. Most variables encountered in economics are limited in range, but not all require special treatment. For example, many variables—wage, population, and food consumption, to name just a few—can only take on positive values. If a strictly positive variable takes on numerous values, special econometric methods are rarely called for. Often, taking the log of the vari-able and then using a linear model su‰ces.
When the variable to be explained, y, is discrete and takes on a ﬁnite number of values, it makes little sense to treat it as an approximately continuous variable. Dis-creteness of y does not in itself mean that a linear model for EðyjxÞ is inappropriate. However, in Chapter 15 we will see that linear models have certain drawbacks for modeling binary responses, and we will treat nonlinear models such as probit and logit. We also cover basic multinomial response models in Chapter 15, including the case when the response has a natural ordering.
Other kinds of limited dependent variables arise in econometric analysis, especially when modeling choices by individuals, families, or ﬁrms. Optimizing behavior often leads to corner solutions for some nontrivial fraction of the population. For example, during any given time, a fairly large fraction of the working age population does not work outside the home. Annual hours worked has a population distribution spread out over a range of values, but with a pileup at the value zero. While it could be that a linear model is appropriate for modeling expected hours worked, a linear model will likely lead to negative predicted hours worked for some people. Taking the nat-ural log is not possible because of the corner solution at zero. In Chapter 16 we will discuss econometric models that are better suited for describing these kinds of limited dependent variables.
We treat the problem of sample selection in Chapter 17. In many sample selection contexts the underlying population model is linear, but nonlinear econometric meth-ods are required in order to correct for nonrandom sampling. Chapter 17 also covers testing and correcting for attrition in panel data models, as well as methods for dealing with stratiﬁed samples.
In Chapter 18 we provide a modern treatment of switching regression models and, more generally, random coe‰cient models with endogenous explanatory variables. We focus on estimating average treatment e¤ects.
We treat methods for count-dependent variables, which take on nonnegative inte-ger values, in Chapter 19. An introduction to modern duration analysis is given in Chapter 20.
15Discrete Response Models
15.1 Introduction
In qualitative response models, the variable to be explained, y, is a random variable taking on a ﬁnite number of outcomes; in practice, the number of outcomes is usually small. The leading case occurs where y is a binary response, taking on the values zero and one, which indicate whether or not a certain event has occurred. For example, y ¼ 1 if a person is employed, y ¼ 0 otherwise; y ¼ 1 if a family contributes to charity during a particular year, y ¼ 0 otherwise; y ¼ 1 if a ﬁrm has a particular type of pension plan, y ¼ 0 otherwise. Regardless of the deﬁnition of y, it is traditional to refer to y ¼ 1 as a success and y ¼ 0 as a failure.
As in the case of linear models, we often call y the explained variable, the response variable, the dependent variable, or the endogenous variable; x1ðx1;x2;...;xKÞ is the vector of explanatory variables, regressors, independent variables, exogenous variables, or covariates.
In binary response models, interest lies primarily in the response probability,
pðxÞ1Pðy ¼ 1jxÞ ¼ Pðy ¼ 1jx1;x2;...;xKÞ ð15:1Þ
for various values of x. For example, when y is an employment indicator, x might contain various individual characteristics such as education, age, marital status, and other factors that a¤ect employment status, such as a binary indicator variable for participation in a recent job training program, or measures of past criminal behavior. For a continuous variable, xj, the partial e¤ect of xj on the response probability is
qPðy ¼ 1jxÞ qpðxÞ
qxj qxj
ð15:2Þ
When multiplied by Dxj, equation (15.2) gives the approximate change in Pðy ¼ 1jxÞ when xj increases by Dxj, holding all other variables ﬁxed (for ‘‘small’’ Dxj). Of course if, say, x1 1z and x2 1z2 for some variable z (for example, z could be work experience), we would be interested in qpðxÞ=qz.
If xK is a binary variable, interest lies in
pðx1;x2;...;xKÿ1;1Þ ÿ pðx1;x2;...;xKÿ1;0Þ ð15:3Þ
which is the di¤erence in response probabilities when xK ¼ 1 and xK ¼ 0. For most of the models we consider, whether a variable xj is continuous or discrete, the partial e¤ect of xj on pðxÞ depends on all of x.
In studying binary response models, we need to recall some basic facts about Bernoulli (zero-one) random variables. The only di¤erence between the setup here
454 Chapter 15
and that in basic statistics is the conditioning on x. If Pðy ¼ 1jxÞ ¼ pðxÞ then Pðy ¼ 0jxÞ ¼ 1 ÿ pðxÞ, EðyjxÞ ¼ pðxÞ, and VarðyjxÞ ¼ pðxÞ½1 ÿ pðxÞ.
15.2 The Linear Probability Model for Binary Response
The linear probability model (LPM) for binary response y is speciﬁed as
Pðy ¼ 1jxÞ ¼ b0 þb1x1 þ b2x2 þ þbKxK ð15:4Þ
As usual, the xj can be functions of underlying explanatory variables, which would simply change the interpretations of the bj. Assuming that x1 is not functionally re-lated to the other explanatory variables, b1 ¼ qPðy ¼ 1jxÞ=qx1. Therefore, b1 is the change in the probability of success given a one-unit increase in x1. If x1 is a binary explanatory variable, b1 is just the di¤erence in the probability of success when x1 ¼ 1 and x1 ¼ 0, holding the other xj ﬁxed.
Using functions such as quadratics, logarithms, and so on among the independent
variables causes no new di‰culties. The important point is that the bj now measure the e¤ects of the explanatory variables xj on a particular probability.
Unless the range of x is severely restricted, the linear probability model cannot be a good description of the population response probability Pðy ¼ 1jxÞ. For given values
of the population parameters bj, there would usually be feasible values of x1;...;xK such that b0 þ xb is outside the unit interval. Therefore, the LPM should be seen as a convenient approximation to the underlying response probability. What we hope is
that the linear probability approximates the response probability for common values of the covariates. Fortunately, this often turns out to be the case.
In deciding on an appropriate estimation technique, it is useful to derive the con-ditional mean and variance of y. Since y is a Bernoulli random variable, these are simply
EðyjxÞ ¼ b0 þ b1x1 þb2x2 þ þ bKxK ð15:5Þ
VarðyjxÞ ¼ xbð1 ÿ xbÞ ð15:6Þ
where xb is shorthand for the right-hand side of equation (15.5).
Equation (15.5) implies that, given a random sample, the OLS regression of y
on 1;x1;x2;...;xK produces consistent and even unbiased estimators of the bj. Equation (15.6) means that heteroskedasticity is present unless all of the slope co-e‰cients b1;...;bK are zero. A nice way to deal with this issue is to use standard heteroskedasticity-robust standard errors and t statistics. Further, robust tests of
multiple restrictions should also be used. There is one case where the usual F statistic
Discrete Response Models 455
can be used, and that is to test for joint signiﬁcance of all variables (leaving the con-stant unrestricted). This test is asymptotically valid because VarðyjxÞ is constant under this particular null hypothesis.
Since the form of the variance is determined by the model for Pðy ¼ 1jxÞ, an asymptotically more e‰cient method is weighted least squares (WLS). Let b be the
OLS estimator, and let y denote the OLS ﬁtted values. Then, provided 0 < y < 1 for all observations i, deﬁne the estimated standard deviation as si 1½yið1 ÿ yiÞ1=2. Then the WLS estimator, b , is obtained from the OLS regression
yi=si on 1=si;xi1=si;...;xiK=si; i ¼ 1;2;...;N ð15:7Þ
The usual standard errors from this regression are valid, as follows from the treat-ment of weighted least squares in Chapter 12. In addition, all other testing can be done using F statistics or LM statistics using weighted regressions.
If some of the OLS ﬁtted values are not between zero and one, WLS analysis is not possible without ad hoc adjustments to bring deviant ﬁtted values into the unit in-
terval. Further, since the OLS ﬁtted value yi is an estimate of the conditional proba-bility Pðyi ¼ 1jxiÞ, it is somewhat awkward if the predicted probability is negative or above unity.
Aside from the issue of ﬁtted values being outside the unit interval, the LPM implies that a ceteris paribus unit increase in xj always changes Pðy ¼ 1jxÞ by the same amount, regardless of the initial value of xj. This implication cannot literally be true because continually increasing one of the xj would eventually drive Pðy ¼ 1jxÞ to be less than zero or greater than one.
Even with these weaknesses, the LPM often seems to give good estimates of the partial e¤ects on the response probability near the center of the distribution of x. (How good they are can be determined by comparing the coe‰cients from the LPM with the partial e¤ects estimated from the nonlinear models we cover in Section 15.3.) If the main purpose is to estimate the partial e¤ect of xj on the response probability, averaged across the distribution of x, then the fact that some predicted values are outside the unit interval may not be very important. The LPM need not provide very good estimates of partial e¤ects at extreme values of x.
Example 15.1 (Married Women’s Labor Force Participation): We use the data from MROZ.RAW to estimate a linear probability model for labor force participation (inlf ) of married women. Of the 753 women in the sample, 428 report working non-zero hours during the year. The variables we use to explain labor force participation are age, education, experience, nonwife income in thousands (nwifeinc), number of children less than six years of age (kidslt6), and number of kids between 6 and 18
456 Chapter 15
inclusive (kidsge6); 606 women report having no young children, while 118 report having exactly one young child. The usual OLS standard errors are in parentheses, while the heteroskedasticity-robust standard errors are in brackets:
inlf ¼ :586 ÿ :0034 nwifeinc þ :038 educ þ :039 exper ÿ :00060 exper2
ð:154Þ ð:0014Þ ½:151 ½:0015
ð:007Þ ð:006Þ ½:007 ½:006
ð:00018Þ ½:00019
ÿ :016 age ÿ :262 kidslt6 þ :013 kidsge6 ð:002Þ ð:034Þ ð:013Þ
½:002 ½:032 ½:013
N ¼ 753; R2 ¼ :264
With the exception of kidsge6, all coe‰cients have sensible signs and are statistically signiﬁcant; kidsge6 is neither statistically signiﬁcant nor practically important. The coe‰cient on nwifeinc means that if nonwife income increases by 10 ($10,000), the probability of being in the labor force is predicted to fall by .034. This is a small e¤ect given that an increase in income by $10,000 in 1975 dollars is very large in this sam-ple. (The average of nwifeinc is about $20,129 with standard deviation $11,635.) Having one more small child is estimated to reduce the probability of inlf ¼ 1 by about .262, which is a fairly large e¤ect.
Of the 753 ﬁtted probabilities, 33 are outside the unit interval. Rather than using some adjustment to those 33 ﬁtted values and applying weighted least squares, we just use OLS and report heteroskedasticity-robust standard errors. Interestingly, these di¤er in practically unimportant ways from the usual OLS standard errors.
The case for the LPM is even stronger if most of the xj are discrete and take on only a few values. In the previous example, to allow a diminishing e¤ect of young children on the probability of labor force participation, we can break kidslt6 into three binary indicators: no young children, one young child, and two or more young children. The last two indicators can be used in place of kidslt6 to allow the ﬁrst young child to have a larger e¤ect than subsequent young children. (Interestingly, when this method is used, the marginal e¤ects of the ﬁrst and second young children are virtually the same. The estimated e¤ect of the ﬁrst child is about ÿ.263, and the additional reduction in the probability of labor force participation for the next child is about ÿ.274.)
In the extreme case where the model is saturated—that is, x contains dummy vari-ables for mutually exclusive and exhaustive categories—the linear probability model
...
- tailieumienphi.vn

nguon tai.lieu . vn