IIIGENERAL APPROACHES TO NONLINEAR ESTIMATION
In this part we begin our study of nonlinear econometric methods. What we mean by nonlinear needs some explanation because it does not necessarily mean that the underlying model is what we would think of as nonlinear. For example, suppose the population model of interest can be written as y ¼ xb þu, but, rather than assuming EðujxÞ ¼ 0, we assume that the median of u given x is zero for all x. This assumption implies MedðyjxÞ ¼ xb, which is a linear model for the conditional median of y given x. [The conditional mean, EðyjxÞ, may or may not be linear in x.] The stan-dard estimator for a conditional median turns out to be least absolute deviations (LAD), not ordinary least squares. Like OLS, the LAD estimator solves a minimi-zation problem: it minimizes the sum of absolute residuals. However, there is a key di¤erence between LAD and OLS: the LAD estimator cannot be obtained in closed form. The lack of a closed-form expression for LAD has implications not only for obtaining the LAD estimates from a sample of data, but also for the asymptotic theory of LAD.
All the estimators we studied in Part II were obtained in closed form, a fact which greatly facilitates asymptotic analysis: we needed nothing more than the weak law of large numbers, the central limit theorem, and the basic algebra of probability limits. When an estimation method does not deliver closed-form solutions, we need to use more advanced asymptotic theory. In what follows, ‘‘nonlinear’’ describes any prob-lem in which the estimators cannot be obtained in closed form.
The three chapters in this part provide the foundation for asymptotic analysis of most nonlinear models encountered in applications with cross section or panel data. We will make certain assumptions concerning continuity and di¤erentiability, and so problems violating these conditions will not be covered. In the general development of M-estimators in Chapter 12, we will mention some of the applications that are ruled out and provide references.
This part of the book is by far the most technical. We will not dwell on the some-times intricate arguments used to establish consistency and asymptotic normality in nonlinear contexts. For completeness, we do provide some general results on consis-tency and asymptotic normality for general classes of estimators. However, for speciﬁc estimation methods, such as nonlinear least squares, we will only state assumptions that have real impact for performing inference. Unless the underlying regularity conditions—which involve assuming that certain moments of the population random variables are ﬁnite, as well as assuming continuity and di¤erentiability of the regres-sion function or log-likelihood function—are obviously false, they are usually just assumed. Where possible, the assumptions will correspond closely with those given previously for linear models.
340 Part III
The analysis of maximum likelihood methods in Chapter 13 is greatly simpliﬁed once we have given a general treatment of M-estimators. Chapter 14 contains results for generalized method of moments estimators for models nonlinear in parameters. We also brieﬂy discuss the related topic of minimum distance estimation in Chapter 14.
Readers who are not interested in general approaches to nonlinear estimation might use these chapters only when needed for reference in Part IV.
12M-Estimation
12.1 Introduction
We begin our study of nonlinear estimation with a general class of estimators known as M-estimators, a term introduced by Huber (1967). (You might think of the ‘‘M’’ as standing for minimization or maximization.) M-estimation methods include max-imum likelihood, nonlinear least squares, least absolute deviations, quasi-maximum likelihood, and many other procedures used by econometricians.
This chapter is somewhat abstract and technical, but it is useful to develop a uni-ﬁed theory early on so that it can be applied in a variety of situations. We will carry along the example of nonlinear least squares for cross section data to motivate the general approach.
In a nonlinear regression model, we have a random variable, y, and we would like to model EðyjxÞ as a function of the explanatory variables x, a K-vector. We already know how to estimate models of EðyjxÞ when the model is linear in its parameters: OLS produces consistent, asymptotically normal estimators. What happens if the re-gression function is nonlinear in its parameters?
Generally, let mðx;yÞ be a parametric model for EðyjxÞ, where m is a known function of x and y, and y is a P 1 parameter vector. [This is a parametric model because mð;yÞ is assumed to be known up to a ﬁnite number of parameters.] The dimension of the parameters, P, can be less than or greater than K. The parameter space, Y, is a subset of RP. This is the set of values of y that we are willing to con-sider in the regression function. Unlike in linear models, for nonlinear models the asymptotic analysis requires explicit assumptions on the parameter space.
An example of a nonlinear regression function is the exponential regression func-tion, mðx;yÞ ¼ expðxyÞ, where x is a row vector and contains unity as its ﬁrst ele-ment. This is a useful functional form whenever yb0. A regression model suitable when the response y is restricted to the unit interval is the logistic function, mðx;yÞ ¼ expðxyÞ=½1 þ expðxyÞ. Both the exponential and logistic functions are nonlinear in y. In any application, there is no guarantee that our chosen model is adequate for EðyjxÞ. We say that we have a correctly speciﬁed model for the conditional mean,
EðyjxÞ, if, for some yo A Y,
EðyjxÞ ¼ mðx;yoÞ ð12:1Þ
We introduce the subscript ‘‘o’’ on theta to distinguish the parameter vector appear-ing in EðyjxÞ from other candidates for that vector. (Often, the value yo is called ‘‘the true value of theta,’’ a phrase that is somewhat loose but still useful as short-hand.) As an example, for yb0 and a single explanatory variable x, consider the model mðx;yÞ ¼ y1xy2 . If the population regression function is EðyjxÞ ¼ 4x1:5, then
342 Chapter 12
yo1 ¼ 4 and yo2 ¼ 1:5. We will never know the actual yo1 and yo2 (unless we some-how control the way the data have been generated), but, if the model is correctly speciﬁed, then these values exist, and we would like to estimate them. Generic can-didates for yo1 and yo2 are labeled y1 and y2, and, without further information, y1 is any positive number and y2 is any real number: the parameter space is Y1 fðy1;y2Þ: y1 > 0;y2 A Rg. For an exponential regression model, mðx;yÞ ¼ expðxyÞ is a correctly speciﬁed model for EðyjxÞ if and only if there is some K-vector yo such that EðyjxÞ ¼ expðxyoÞ.
In our analysis of linear models, there was no need to make the distinction between the parameter vector in the population regression function and other candidates for this vector, because the estimators in linear contexts are obtained in closed form, and so their asymptotic properties can be studied directly. As we will see, in our theoret-ical development we need to distinguish the vector appearing in EðyjxÞ from a generic element of Y. We will often drop the subscripting by ‘‘o’’ when studying particular applications because the notation can be cumbersome.
Equation (12.1) is the most general way of thinking about what nonlinear least squares is intended to do: estimate models of conditional expectations. But, as a sta-tistical matter, equation (12.1) is equivalent to a model with an additive, unobserv-able error with a zero conditional mean:
y ¼ mðx;yoÞ þ u; EðujxÞ ¼ 0 ð12:2Þ
Given equation (12.2), equation (12.1) clearly holds. Conversely, given equation (12.1), we obtain equation (12.2) by deﬁning the error to be u1y ÿmðx;yoÞ. In interpreting the model and deciding on appropriate estimation methods, we should not focus on the error form in equation (12.2) because, evidently, the additivity of u has some unintended connotations. In particular, we must remember that, in writing the model in error form, the only thing implied by equation (12.1) is EðujxÞ ¼ 0. Depending on the nature of y, the error u may have some unusual properties. For example, if yb0 then ubÿmðx;yoÞ, in which case u and x cannot be independent. Heteroskedasticity in the error—that is, VarðujxÞ0VarðuÞ—is present whenever VarðyjxÞ depends on x, as is very common when y takes on a restricted range
of values. Plus, when we introduce randomly sampled observations fðxi; yiÞ: i ¼ 1;2;...;Ng, it is too tempting to write the model and its assumptions as
‘‘yi ¼ mðxi;yoÞ þ ui where the ui are i.i.d. errors.’’ As we discussed in Section 1.4 for the linear model, under random sampling the fuig are always i.i.d. What is usually meant is that ui and xi are independent, but, for the reasons we just gave, this as-sumption is often much too strong. The error form of the model does turn out to be
useful for deﬁning estimators of asymptotic variances and for obtaining test statistics.
M-Estimation 343
For later reference, we formalize the ﬁrst nonlinear least squares (NLS) assumption as follows:
assumption NLS.1: For some yo A Y, EðyjxÞ ¼ mðx;yoÞ.
This form of presentation represents the level at which we will state assumptions for particular econometric methods. In our general development of M-estimators that follows, we will need to add conditions involving moments of mðx;yÞ and y, as well as continuity assumptions on mðx;Þ.
If we let w1ðx; yÞ, then yo indexes a feature of the population distribution of w, namely, the conditional mean of y given x. More generally, let w be an M-vector of random variables with some distribution in the population. We let W denote the subset of RM representing the possible values of w. Let yo denote a parameter vector describing some feature of the distribution of w. This could be a conditional mean, a conditional mean and conditional variance, a conditional median, or a conditional distribution. As shorthand, we call yo ‘‘the true parameter’’ or ‘‘the true value of theta.’’ These phrases simply mean that yo is the parameter vector describing the
underlying population, something we will make precise later. We assume that yo belongs to a known parameter space YHRP.
We assume that our data come as a random sample of size N from the population; we label this random sample fwi: i ¼ 1;2;...g, where each wi is an M-vector. This assumption is much more general than it may initially seem. It covers cross section models with many equations, and it also covers panel data settings with small time series dimension. The extension to independently pooled cross sections is almost im-
mediate. In the NLS example, wi consists of xi and yi, the ith draw from the popu-lation on x and y.
What allows us to estimate yo when it indexes EðyjxÞ? It is the fact that yo is the value of y that minimizes the expected squared error between y and mðx;yÞ. That is,
yo solves the population problem
min Ef½y ÿ mðx;yÞ2g ð12:3Þ yAY
where the expectation is over the joint distribution of ðx; yÞ. This conclusion follows immediately from basic properties of conditional expectations (in particular, condi-tion CE.8 in Chapter 2). We will give a slightly di¤erent argument here. Write
½y ÿ mðx;yÞ2 ¼ ½y ÿ mðx;yoÞ2 þ 2½mðx;yoÞÿ mðx;yÞu
þ½mðx;yoÞ ÿ mðx;yÞ2 ð12:4Þ
...
- tailieumienphi.vn

nguon tai.lieu . vn