Xem mẫu

Real estate analysis: statistical tools 43 3.1.3 Panel data Panel data have the dimensions of both time series and cross-sections – e.g. the monthly prices of a number of REITs in the United Kingdom, France and the Netherlands over two years. The estimation of panel regressions is an interesting and developing area, but will not be considered further in this text. Interested readers are directed to chapter 10 of Brooks (2008) and the references therein. Fortunately, virtually all the standard techniques and analysis in econo-metrics are equally valid for time series and cross-sectional data. This book concentrates mainly on time series data and applications, however, since these are more prevalent in real estate. For time series data, it is usual to denote the individual observation numbers using the index t and the total number of observations available for analysis by T. For cross-sectional data, the individual observation numbers are indicated using the index i and the total number of observations available for analysis by N. Note that there is, in contrast to the time series case, no natural ordering of the observations in a cross-sectional sample. For example, the observations i might be on city office yields at a particular point in time, ordered alphabetically by city name. So, in the case of cross-sectional data, there is unlikely to be any useful information contained in the fact that Los Angeles follows London in a sample of city yields, since it is purely by chance that their names both begin with the letter ‘L’. On the other hand, in a time series context, the ordering of the data is relevant as the data are usually ordered chronolog-ically. In this book, where the context is not specific to only one type of data or the other, the two types of notation (i and N or t and T) are used interchangeably. 3.1.4 Continuous and discrete data As well as classifying data as being of the time series or cross-sectional type, we can also distinguish them as being either continuous or discrete, exactly as their labels would suggest. Continuous data can take on any value and are not confined to take specific numbers; their values are limited only by precision. For example, the initial yield on a real estate asset could be 6.2 per cent, 6.24 per cent, or 6.238 per cent, and so on. On the other hand, discrete data can take on only certain values, which are usually integers1 (whole numbers), and are often defined to be count numbers – for instance, the number of people working in offices, or the number of industrial units 1 Discretely measured data do not necessarily have to be integers. For example, until they became ‘decimalised’, many financial asset prices were quoted to the nearest 1/16th or 1/32nd of a dollar. 44 Real Estate Modelling and Forecasting transacted in the last quarter. In these cases, having 2,013.5 workers or 6.7 units traded would not make sense. 3.1.5 Cardinal, ordinal and nominal numbers Anotherwayinwhichwecan classifynumbersisaccording towhetherthey are cardinal, ordinal or nominal. This distinction is drawn in box 3.2. Box 3.2 Cardinal, ordinal and nominal numbers ● Cardinal numbers are those for which the actual numerical values that a particular variable takes have meaning, and for which there is an equal distance between the numerical values. ● On the other hand, ordinal numbers can be interpreted only as providing a position or an ordering. Thus, for cardinal numbers, a figure of twelve implies a measure that is ‘twice as good’ as a figure of six. Examples of cardinal numbers would be the price of a REIT or of a building, and the number of houses in a street. On the other hand, for an ordinal scale, a figure of twelve may be viewed as ‘better’ than a figure of six, but could not be considered twice as good. Examples include the ranking of global office markets that real estate research firms may produce. Based on measures of liquidity, transparency, risk and other factors, a score is produced. Usually, in this scoring, an office centre ranking second in transparency cannot be said to be twice as transparent as the office market that ranks fourth. ● The final type of data that can be encountered would be when there is no natural ordering of the values at all, so a figure of twelve is simply different from that of a figure of six, but could not be considered to be better or worse in any sense. Such data often arise when numerical values are arbitrarily assigned, such as telephone numbers or when codings are assigned to qualitative data (e.g., when describing the use of space, ‘1’ might be used to denote offices, ‘2’ to denote retail and ‘3’ to denote industrial, and so on). Sometimes, such variables are called nominal variables. ● Cardinal, ordinal and nominal variables may require different modelling approaches or, at least, different treatments. 3.2 Descriptive statistics Whenanalysingaseriescontainingmanyobservations,itisusefultobeable to describe the most important characteristics of the series using a small number of summary measures. This section discusses the quantities that are most commonly used to describe real estate and other series, which are known as summary statistics or descriptive statistics. Descriptive statistics are calculated from a sample of data rather than being assigned on the basis of theory. Before describing the most important summary statistics used in Real estate analysis: statistical tools 45 work with real estate data, we define the terms population and sample, which have precise meanings in statistics. 3.2.1 The population and the sample Thepopulationisthetotalcollectionofallobjectstobestudied.Forexample, in the context of determining the relationship between risk and return for UK REITs, the population of interest would be all time series observations on all REIT stocks traded on the London Stock Exchange (LSE). The population may be either finite or infinite, while a sample is a selec-tion of just some items from the population. A population is finite if it contains a fixed number of elements. In general, either all the observations for the entire population will not be available, or they may be so many in number thatitisinfeasibletoworkwiththem,inwhichcaseasampleofdataistaken for analysis. The sample is usually random, and it should be representative of the population of interest. A random sample is one in which each individ-ual item in the population is equally likely to be drawn. A stratified sample is obtained when the population is split into layers or strata and the num-ber of observations in each layer of the sample is set to try to match the corresponding number of elements in those layers of the population. The size of the sample is the number of observations that are available, or that the researcher decides to use, in estimating the parameters of the model. 3.2.2 Measures of central tendency The average value of a series is sometimes known as its measure of location or measure of central tendency. The average value is usually thought to measure the ‘typical’ value of a series. There are a number of methods that can be used for calculating averages. The most well known of these is the arithmetic mean(usuallyjusttermed‘themean’),whichissimplycalculatedasthesum of all values in the series divided by the number of values. The two other methods for calculating the average of a series are the mode and the median. The mode measures the most frequently occurring value in a series, which is sometimes regarded as a more representative measure of the average than the arithmetic mean. Finally, the median is the middle value in a series when the elements are arranged in an ascending order. For a symmetric distribution, the mean, mode and median will be coincident.Foranynon-symmetricdistributionofpointshowever,thethree summary measures will in general be different. Eachofthesemeasuresofaveragehasitsrelativemeritsanddemerits.The mean is the most familiar method to most researchers, but can be unduly affected by extreme values, and, in such cases, it may not be representative of most of the data. The mode is, arguably, the easiest to obtain, but it is 46 Real Estate Modelling and Forecasting not suitable for continuous, non-integer data (e.g. returns or yields) or for distributions that incorporate two or more peaks (known as bimodal and multimodal distributions, respectively). The median is often considered to be a useful representation of the ‘typical’ value of a series, but it has the drawback that its calculation is based essentially on one observation. Thus if, for example, we had a series containing ten observations and we were to double the values of the top three data points, the median would be unchanged. The geometric mean There exists another method that can be used to estimate the average of a series, known as the geometric mean. It involves calculating the Nth root of the product of N numbers. In other words, if we want to find the geometric mean of six numbers, we multiply them together and take the sixth root (i.e. raise the product to the power of 1/6th). In real estate investment, we usually deal with returns or percentage changes rather than actual values, and the method for calculating the geo-metric mean just described cannot handle negative numbers. Therefore we use a slightly different approach in such cases. To calculate the geometric mean of a set of N returns, we express them as proportions (i.e. on a (−1,1) scale) rather than percentages (on a (−100,100) scale), and we would use the formula RG = [(1 +r1)(1 +r2)...(1 +rN)]1/N −1 (3.1) where r1,r2,...,rN are the returns and RG is the calculated value of the geometric mean. Hence, what we would do would be to add one to each return, multiply the resulting expressions together, raise this product to the power 1/N and then subtract one right at the end. Which method for calculating the mean should we use, therefore? The answer is, as usual, ‘It depends.’ Geometric returns give the fixed return on the asset or portfolio that would have been required to match the actual performance, which is not the case for the arithmetic mean. Thus, if you assumed that the arithmetic mean return had been earned on the asset every year, you would not reach the correct value of the asset or portfolio at the end! It could be shown that the geometric return is always less than or equal to the arithmetic return, however, and so the geometric return is a downward-biased predictor of future performance. Hence, if the objective is to forecast future returns, the arithmetic mean is the one to use. Finally, it is worth noting that the geometric mean is evidently less intuitive and less commonly used than the arithmetic mean, but it is less affected by extreme outliers than the latter. There is an approximate relationship that holds Real estate analysis: statistical tools 47 between the arithmetic and geometric means, calculated using the same set of returns: RG ≈ RA − 1σ2 (3.2) where RG and RA are the geometric and arithmetic means, respectively, and σ2 is the variance of the returns. 3.2.3 Measures of spread Usually, the average value of a series will be insufficient to characterise a data series adequately, since two series may have the same average but very different profiles because the observations on one of the series may be much more widely spread about the mean than the other. Hence another important feature of a series is how dispersed its values are. In finance theory, for example, the more widely spread returns are around their mean value the more risky the asset is usually considered to be, and the same principle applies in real estate. The simplest measure of spread is arguably the range, which is calculated by subtracting the smallest observation from the largest. While the range has some uses, it is fatally flawed as a measure of dispersion by its extreme sensitivity to an outlying observation. A more reliable measure of spread, although it is not widely employed by quantitative analysts, is the semi-interquartile range, also sometimes known as the quartile deviation. Calculating this measure involves first ordering the dataandthensplittingthesampleintofourparts(quartiles)2 withequalnum-bersofobservations.Thesecondquartilewillbeexactlyatthehalfwaypoint, and is known as the median, as described above. The semi-interquartile range focuses on the first and third quartiles, however, which will be at the quarter and three-quarter points in the ordered series, and which can be calculated respectively by the following: Q1 = µN +1¶th value (3.3) and Q3 = 3 (N +1)th value (3.4) The semi-interquartile range is then given by the difference between the two: IQR = Q3 −Q1 (3.5) 2 Note that there are several slightly different formulae that can be used for calculating quartiles, each of which may provide slightly different answers. ... - tailieumienphi.vn
nguon tai.lieu . vn