Xem mẫu

An overview of regression analysis 107 A term in Px∗2 can be cancelled from the numerator and denominator of (4A.29), and, recalling that xt = (xt −x ), this gives the variance of the slope coefficient as 2 var(β) = X (4A.30) (xt −x) so that the standard error can be obtained by taking the square root of (4A.30): v SE(β) = suX 1 (4A.31) (xt −x) Turning now to the derivation of the intercept standard error, this is much more difficult than that of the slope standard error. In fact, both are very much easier usingmatrixalgebra,asshowninthefollowingchapter.Thisderivationistherefore offered in summary form. It is possible to express αˆ as a function of the true α and of the disturbances, u : X hX X i ut x −xt xt αˆ = α + h ³ ´ i (4A.32) T x2 − xt Denoting all the elements in square brackets as g , (4A.32) can be written X αˆ −α = utgt From (4A.15), the intercept variance would be written var(αˆ) = E³Xutgt´2 = Xg2E¡u2¢ = s2 Xg2 Writing (4A.34) out in full for g2 and expanding the brackets, h ³X ´ X ³X ´X ³X ´³X ´ i s T x −2 xt x xt + x xt var(α¯) = h ³ ´ i T x2 − xt (4A.33) (4A.34) (4A.35) This looks rather complex, but, fortunately, if we take Px2 outside the square brackets in the numerator, the remaining numerator cancels with a term in the denominator to leave the required result: v u x2 SE(αˆ) = s X (4A.36) T (xt −x) 5 Further issues in regression analysis Learning outcomes In this chapter, you will learn how to ● construct models with more than one explanatory variable; ● derive the OLS parameter and standard error estimators in the multiple regression context; ● determine how well the model fits the data; ● understand the principles of nested and non-nested models; ● test multiple hypotheses using an F-test; ● form restricted regressions; and ● test for omitted and redundant variables. 5.1 Generalising the simple model to multiple linear regression Previously, a model of the following form has been used: yt = α +βxt +ut t = 1,2,...,T (5.1) Equation (5.1) is a simple bivariate regression model. That is, changes in the dependent variable are explained by reference to changes in one single explanatory variable x. What if the real estate theory or the idea that is sought to be tested suggests that the dependent variable is influenced by more than one independent variable, however? For example, simple esti-mation and tests of the capital asset pricing model can be conducted using an equation of the form of (5.1), but arbitrage pricing theory does not pre-suppose that there is only a single factor affecting stock returns. So, to give one illustration, REIT excess returns might be purported to depend on their sensitivity to unexpected changes in 108 Further issues in regression analysis 109 (1) inflation; (2) the differences in returns on short- and long-dated bonds; (3) the dividend yield; or (4) default risks. Having just one independent variable would be no good in this case. It would, of course, be possible to use each of the four proposed explanatory factors in separate regressions. It is of greater interest, though, and it is also more valid, to have more than one explanatory variable in the regression equation at the same time, and therefore to examine the effect of all the explanatory variables together on the explained variable. It is very easy to generalise the simple model to one with k regressors (independent variables). Equation (5.1) becomes yt = β1 +β2x2t +β3x3t +···+βkxkt +ut, t = 1,2,...,T (5.2) Thevariablesx2t,x3t,. ..,xkt arethereforeasetofk −1explanatoryvariables that are thought to influence y, and the coefficient estimates β2, β3,. .., βk are the parameters that quantify the effect of each of these explanatory variables on y. The coefficient interpretations are slightly altered in the multiple regression context. Each coefficient is now known as a partial regression coefficient, interpreted as representing the partial effect of the given explanatory variable on the explained variable, after holding con-stant, or eliminating the effect of, all the other explanatory variables. For example,β2 measures the effect of x2 on y after eliminating the effects of x3, x4,. .., xk. Stating this in other words, each coefficient measures the average change in the dependent variable per unit change in a given independent variable, holding all other independent variables constant at their average values. 5.2 The constant term In(5.2)above,astutereaderswillhavenoticedthattheexplanatoryvariables are numbered x2, x3,...– i.e. the list starts with x2 and not x1. So, where is x1? In fact, it is the constant term, usually represented by a column of ones of length T: ⎡1⎤ ⎢1⎥ 1 ⎣.⎦ 1 (5.3) 110 Real Estate Modelling and Forecasting Thus there is a variable implicitly hiding next to β1, which is a column vector of ones, the length of which is the number of observations in the sample.Thex1 intheregressionequationisnot usuallywritten,in thesame way that one unit of p and two units of q would be written as ‘p +2q’ and not ‘1p +2q’. β1 is the coefficient attached to the constant term (which was called α in the previous chapter). This coefficient can still be referred to as the intercept, which can be interpreted as the average value that y would take if all the explanatory variables took a value of zero. A tighter definition of k, the number of explanatory variables, is prob-ably now necessary. Throughout this book, k is defined as the number of ‘explanatory variables’ or ‘regressors’, including the constant term. This is equivalenttothenumberofparametersthatareestimatedintheregression equation. Strictly speaking, it is not sensible to call the constant an explana-toryvariable,sinceitdoesnotexplainanythinganditalwaystakesthesame values. This definition of k will be employed for notational convenience, however. Equation (5.2) can be expressed even more compactly by writing it in matrix form: y = Xβ +u (5.4) where: y is of dimension T ×1; Xis of dimension T ×k; β is of dimension k ×1; and u is of dimension T ×1. The difference between (5.2) and (5.4) is that all the time observations have been stacked up in a vector, and also that all the different explanatory variables have been squashed together so that there is a column for each in the X matrix. Such a notation may seem unnecessarily complex, but, in fact, the matrix notation is usually more compact and convenient. So, for example, if k is two – i.e. there are two regressors, one of which is the constantterm(equivalenttoasimplebivariateregressionyt = α +βxt +ut) – it is possible to write ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 21 1 ⎢ . ⎥ = ⎢. x.2⎥ β2 +⎣ .2 ⎥ (5.5) yT 1 x2T uT T ×1 T ×2 2 ×1 T ×1 so that the xij element of the matrix X represents the jth time observa-tion on the ith variable. Notice that the matrices written in this way are Further issues in regression analysis 111 conformable – in other words, there is a valid matrix multiplication and addition on the RHS.1 5.3 How are the parameters (the elements of the β vector) calculated in the generalised case? Previously, the residual sum of squares, Pu2, was minimised with respect to α and β. In the multiple regression context, in order to obtain estimates of the parameters, β1, β2,..., βk, the RSS would be minimised with respect to all the elements of β. Now, the residuals can be stacked in a vector: ⎡ ⎤ u = ⎢ u1 ⎥ (5.6) . ˆT The RSS is still the relevant loss function, and would be given in a matrix notation by equation (5.7): L = u0u = [ ˆ1 ˆ2 ··· ⎡ ˆ ⎤ ˆT ]⎢ u2 ⎥ = u2 +u2 +···+u2 = Xu2 ˆT (5.7) Usingasimilarproceduretothatemployedinthebivariateregressioncase– i.e. substituting into (5.7), and denoting the vector of estimated parameters as β – it can be shown (see the appendix to this chapter) that the coefficient estimates will be given by the elements of the expression ⎡β1⎤ β = ⎢β2⎥ = (X0X)−1X0y (5.8) ˆk If one were to check the dimensions of the RHS of (5.8), it would be observed to be k ×1. This is as required, since there are k parameters to be estimated by the formula for β. 1 The above presentation is the standard way to express matrices in the time series econometrics literature, although the ordering of the indices is different from that used in the mathematics of matrix algebra (as presented in chapter 2 of this book). In the latter case, xij would represent the element in row i and column j, although, in the notation used from this point of the book onwards, it is the other way around. ... - tailieumienphi.vn
nguon tai.lieu . vn