16Corner Solution Outcomes and Censored Regression Models
16.1 Introduction and Motivation
In this chapter we cover a class of models traditionally called censored regression models. Censored regression models generally apply when the variable to be explained is partly continuous but has positive probability mass at one or more points. In order to apply these methods e¤ectively, we must understand that the statistical model underlying censored regression analysis applies to problems that are conceptually very di¤erent.
For the most part, censored regression applications can be put into one of two categories. In the ﬁrst case there is a variable with quantitative meaning, call it y, and we are interested in the population regression Eðy jxÞ. If y and x were ob-served for everyone in the population, there would be nothing new: we could use standard regression methods (ordinary or nonlinear least squares). But a data prob-lem arises because y is censored above or below some value; that is, it is not ob-servable for part of the population. An example is top coding in survey data. For example, assume that y is family wealth, and, for a randomly drawn family, the actual value of wealth is recorded up to some threshold, say, $200,000, but above that level only the fact that wealth was more than $200,000 is recorded. Top coding is an example of data censoring, and is analogous to the data-coding problem we dis-cussed in Section 15.10.2 in connection with interval regression.
Example 16.1 (Top Coding of Wealth): In the population of all families in the United States, let wealth denote actual family wealth, measured in thousands of dollars. Suppose that wealth follows the linear regression model Eðwealth jxÞ ¼ xb, where x is a 1 K vector of conditioning variables. However, we observe wealth only when wealth a200. When wealth is greater than 200 we know that it is, but we do not know the actual value of wealth. Deﬁne observed wealth as
wealth ¼ minðwealth;200Þ
The deﬁnition wealth ¼ 200 when wealth > 200 is arbitrary, but it is useful for deﬁning the statistical model that follows. To estimate b we might assume that wealth given x has a homoskedastic normal distribution. In error form,
wealth ¼ xb þ u; ujx@Normalð0;s2Þ
This is a strong assumption about the conditional distribution of wealth, something we could avoid entirely if wealth were not censored above 200. Under these as-sumptions we can write recorded wealth as
wealth ¼ minð200;xb þ uÞ ð16:1Þ
518 Chapter 16
Data censoring also arises in the analysis of duration models, a topic we treat in Chapter 20.
A second kind of application of censored regression models appears more often in econometrics and, unfortunately, is where the label ‘‘censored regression’’ is least appropriate. To describe the situation, let y be an observable choice or outcome describing some economic agent, such as an individual or a ﬁrm, with the following characteristics: y takes on the value zero with positive probability but is a continuous random variable over strictly positive values. There are many examples of variables that, at least approximately, have these features. Just a few examples include amount of life insurance coverage chosen by an individual, family contributions to an indi-vidual retirement account, and ﬁrm expenditures on research and development. In each of these examples we can imagine economic agents solving an optimization problem, and for some agents the optimal choice will be the corner solution, y ¼ 0. We will call this kind of response variable a corner solution outcome. For corner solu-tion outcomes, it makes more sense to call the resulting model a corner solution model. Unfortunately, the name ‘‘censored regression model’’ appears to be ﬁrmly entrenched.
For corner solution applications, we must understand that the issue is not data observability: we are interested in features of the distribution of y given x, such as EðyjxÞ and Pðy ¼ 0jxÞ. If we are interested only in the e¤ect of the xj on the mean response, EðyjxÞ, it is natural to ask, Why not just assume EðyjxÞ ¼ xb and apply OLS on a random sample? Theoretically, the problem is that, when yb0, EðyjxÞ cannot be linear in x unless the range of x is fairly limited. A related weakness is that the model implies constant partial e¤ects. Further, for the sample at hand, predicted values for y can be negative for many combinations of x and b. These are very sim-ilar to the shortcomings of the linear probability model for binary responses.
We have already seen functional forms that ensure that EðyjxÞ is positive for all values of x and parameters, the leading case being the exponential function, EðyjxÞ ¼ expðxbÞ. [We cannot use logðyÞ as the dependent variable in a linear re-gression because logð0Þ is undeﬁned.] We could then estimate b using nonlinear least squares (NLS), as in Chapter 12. Using an exponential conditional mean function is a reasonable strategy to follow, as it ensures that predicted values are positive and that the parameters are easy to interpret. However, it also has limitations. First, if y is a corner solution outcome, VarðyjxÞ is probably heteroskedastic, and so NLS could be ine‰cient. While we may be able to partly solve this problem using weighted NLS, any model for the conditional variance would be arbitrary. Probably a more important criticism is that we would not be able to measure the e¤ect of each xj on other features of the distribution of y given x. Two that are commonly of
Corner Solution Outcomes and Censored Regression Models 519
interest are Pðy ¼ 0jxÞ and Eðyjx; y > 0Þ. By deﬁnition, a model for EðyjxÞ does not allow us to estimate other features of the distribution. If we make a full distribu-tional assumption for y given x, we can estimate any feature of the conditional dis-tribution. In addition, we will obtain e‰cient estimates of quantities such as EðyjxÞ. The following example shows how a simple economic model leads to an econo-metric model where y can be zero with positive probability and where the conditional
expectation EðyjxÞ is not a linear function of parameters.
Example 16.2 (Charitable Contributions): Problem 15.1 shows how to derive a probit model from a utility maximization problem for charitable giving, using utility function utiliðc;qÞ ¼ c þ ai logð1 þqÞ, where c is annual consumption, in dollars, and q is annual charitable giving. The variable ai determines the marginal utility of giving
for family i. Maximizing subject to the budget constraint ci þ piqi ¼ mi (where mi is family income and pi is the price of a dollar of charitable contributions) and the in-equality constraint c, qb0, the solution qi is easily shown to be qi ¼ 0 if ai=pi a1 and qi ¼ ai=pi ÿ1 if ai=pi > 1. We can write this relation as 1 þ qi ¼ maxð1;ai=piÞ. If ai ¼ expðzig þ uiÞ, where ui is an unobservable independent of ðzi; pi;miÞ and nor-mally distributed, then charitable contributions are determined by the equation
logð1 þ qiÞ ¼ max½0;zig ÿ logðpiÞ þ ui ð16:2Þ
Comparing equations (16.2) and (16.1) shows that they have similar statistical structures. In equation (16.2) we are taking a maximum, and the lower threshold is zero, whereas in equation (16.1) we are taking a minimum with an upper threshold of 200. Each problem can be transformed into the same statistical model: for a ran-domly drawn observation i from the population,
y ¼ xib þui; ui jxi @Normalð0;s2Þ ð16:3Þ
yi ¼ maxð0; yÞ ð16:4Þ
These equations constitute what is known as the standard censored Tobit model (after Tobin, 1956) or type I Tobit model (which is from Amemiya’s 1985 taxonomy). This is the canonical form of the model in the sense that it is the form usually studied in methodological papers, and it is the default model estimated by many software packages.
The charitable contributions example immediately ﬁts into the standard censored
Tobit framework by deﬁning xi ¼ ½zi;logðpiÞ and yi ¼ logð1 þ qiÞ. This particular transformation of qi and the restriction that the coe‰cient on logðpiÞ is ÿ1 depend critically on the utility function used in the example. In practice, we would probably
take yi ¼ qi and allow all parameters to be unrestricted.
520 Chapter 16
The wealth example can be cast as equations (16.3) and (16.4) after a simple transformation:
ÿðwealthi ÿ 200Þ ¼ maxð0;ÿ200 ÿxib ÿ uiÞ
and so the intercept changes, and all slope coe‰cients have the opposite sign from equation (16.1). For data-censoring problems, it is easier to study the censoring scheme directly, and many econometrics packages support various kinds of data censoring. Problem 16.3 asks you to consider general forms of data censoring, including the case when the censoring point can change with observation, in which case the model is often called the censored normal regression model. (This label properly emphasizes the data-censoring aspect.)
For the population, we write the standard censored Tobit model as
y ¼ xb þu; ujx@Normalð0;s2Þ ð16:5Þ
y ¼ maxð0; yÞ ð16:6Þ
where, except in rare cases, x contains unity. As we saw from the two previous examples, di¤erent features of this model are of interest depending on the type of application. In examples with true data censoring, such as Example 16.1, the vector b tells us everything we want to know because Eðy jxÞ ¼ xb is of interest. For corner solution outcomes, such as Example 16.2, b does not give the entire story. Usually, we are interested in EðyjxÞ or Eðyjx; y > 0Þ. These certainly depend on b, but in a nonlinear fashion.
For the statistical model (16.5) and (16.6) to make sense, the variable y should have characteristics of a normal random variable. In data censoring cases this re-quirement means that the variable of interest y should have a homoskedastic nor-mal distribution. In some cases the logarithmic transformation can be used to make this assumption more plausible. Example 16.1 might be one such case if wealth is positive for all families. See also Problems 16.1 and 16.2.
In corner solution examples, the variable y should be (roughly) continuous when y > 0. Thus the Tobit model is not appropriate for ordered responses, as in Section 15.10. Similarly, Tobit should not be applied to count variables, especially when the count variable takes on only a small number of values (such as number of patents awarded annually to a ﬁrm or the number of times someone is arrested during a year). Poisson regression models, a topic we cover in Chapter 19, are better suited for analyzing count data.
For corner solution outcomes, we must avoid placing too much emphasis on the latent variable y. Most of the time y is an artiﬁcial construct, and we are not interested in Eðy jxÞ. In Example 16.2 we derived the model for charitable con-
Corner Solution Outcomes and Censored Regression Models 521
tributions using utility maximization, and a latent variable never appeared. Viewing y as something like ‘‘desired charitable contributions’’ can only sow confusion: the variable of interest, y, is observed charitable contributions.
16.2 Derivations of Expected Values
In corner solution applications such as the charitable contributions example, interest centers on probabilities or expectations involving y. Most of the time we focus on the expected values Eðyjx; y > 0Þ and EðyjxÞ.
Before deriving these expectations for the Tobit model, it is interesting to derive an inequality that bounds EðyjxÞ from below. Since the function gðzÞ1maxð0;zÞ is convex, it follows from the conditional Jensen’s inequality (see Appendix 2A) that EðyjxÞbmax½0;Eðy jxÞ. This condition holds when y has any distribution and for any form of Eðy jxÞ. If Eðy jxÞ ¼ xb, then
which is always nonnegative. Equation (16.7) shows that EðyjxÞ is bounded from below by the larger of zero and xb.
When u is independent of x and has a normal distribution, we can ﬁnd an explicit expression for EðyjxÞ. We ﬁrst derive Pðy > 0jxÞ and Eðyjx; y > 0Þ, which are of interest in their own right. Then, we use the law of iterated expectations to obtain EðyjxÞ:
EðyjxÞ ¼ Pðy ¼ 0jxÞ 0 þPðy > 0jxÞ Eðyjx; y > 0Þ
¼ Pðy > 0jxÞ Eðyjx; y > 0Þ ð16:8Þ
Deriving Pðy > 0jxÞ is easy. Deﬁne the binary variable w ¼ 1 if y > 0, w ¼ 0 if y ¼ 0. Then w follows a probit model:
Pðw ¼ 1jxÞ ¼ Pðy > 0jxÞ ¼ Pðu > ÿxb jxÞ
¼ Pðu=s > ÿxb=sÞ ¼ Fðxb=sÞ ð16:9Þ
One implication of equation (16.9) is that g1b=s, but not b and s separately, can be consistently estimated from a probit of w on x.
To derive Eðyjx; y > 0Þ, we need the following fact about the normal distribution: if z@Normalð0;1Þ, then, for any constant c,
Eðzjz > cÞ ¼ 1 ÿ FðcÞ
nguon tai.lieu . vn