Xem mẫu

Chapter 6 Examples In this chapter are some examples of novel applications of niche modeling, selected primarily to illustrate important counterexamples. Most competent niche model studies model the response of the species as an ‘inverted U’, and not linear responses. These examples illustrate some more unusual responses and uses of response functions. First, we discuss a statistic for evaluating the skill of a niche model based on the idea of a ‘draw’ from a population. The examples follow, illustrating the steps followed in developing a niche model: the use of presence only or presence/absence data, the use of a mask, and the selection of variables. There are no hard and fast rules about these alternatives, but they do influence results. The examples are: • house price increase, • Brown Treesnake threat, and • zebra mussel spread. Obviously house price increases is not a biological subject but is included to illustrate the generality of the approach. 6.0.1 Model skill Accuracy is the observed proportion of predicted values correct. Perfect accuracy would look like Table 6.1. However, the effective accuracy is only relative to the result expected if those predictions were made at random. For example, if 100% of test items were labeled P then achieving perfect accuracy would require no skill. There are many way of determining the skill of models. In predicting pres-ence and absence, statistics such as the Receiver Operating Statistic (ROC) [HM82] are effective and popular. Considering the specific case of randomly selected presence and absence points, the area under the ROC curve is a measure of the probability that the model correctly identifies P and A. 97 © 2007 by Taylor and Francis Group, LLC 98 Niche Modeling TABLE 6.1: Example of a contingency table with a perfect score. P A P 1.00 0.00 A 0.00 1.00 In presence/absence analysis (PA) the absence points are explicitly spec-ified, but often absence points are not available. In presence-only analysis (P), the absence points are generated in some way, typically by sampling all points (‘background’) at random. When only presence points are available, P analysis must be used with such statistics as the ROC. A review of the ways of quantifying the skill of a model is not attempted here, as an alternate way of viewing the problem is presented. In this view, the survey of the species can be seen as a draw from a population, where the population is the distribution of the species in the environment. Seen in this way, appropriate measures for quantifying the skill of models compare the distributions between what is observed and what would be expected if the draw were random. For example, consider an environmental variable not related to the distri-bution of a species. The distribution of environmental values in those points where the species occurs should be identical to the distribution of environmen-tal values in the whole region. But if the observed distribution of values differs significantly from the expected distribution, there is a basis for suspecting the occurrence of the species is related to that variable. There are three possibilities: • insignificant variables • significant variables • most significant variables Providing the set of observations of the species is large enough, and the set of variables includes those potentially causing the observed patterns in the species, it may be inferred that the variables with the most significant differences are causing the distribution. These will also be most skillful at estimating the probability of finding the species in new areas. Thus in this view, points of presence and points of absence are not comple-ments as is usually assumed. Here, absence points are viewed as a separate © 2007 by Taylor and Francis Group, LLC Examples 99 draw from a population, and potentially identifiable as a separate entity. 6.0.2 Calculating accuracy Where categories of environmental values are coded as colors in an image, the sets of presence points p, or background points b for determining model skill have probabilities as follows. Bi is the proportion of b in the background of a specific color i. Pi is the proportion of each occurrence point p of a color i. The Chi Square test for differences between distribution applied over envi-ronmental categories is one way to quantify model skill: 2 Pn (pPi−bBi)2 i=1 bBi The expected accuracy Pr is a simple and effective indicator of skill. To calculate Pr sum the maximum of the proportions of presence or absence pro-portions over each environmental category, and divide by two. Pr = Pi=1 max(Pi,Bi) This analog of accuracy is derived from the expected accuracy of a decision rule where a species is predicted present if in category i if Pi > Bi, and absence otherwise. Pr has an expected value of 0.5 when there is no relationship between the variable and the draw, and approaches the maximum value of one when all the drawn points fall into a single category in a large number of environmental categories. 6.1 Predicting house prices Predicting prices of real estate is similar to predicting species distributions. Niche models can be developed if locations such as cities and their house prices or increases are correlated with environmental variables. Here we model and predict the increase in house prices in metro areas of the United States in 2004. © 2007 by Taylor and Francis Group, LLC 100 Niche Modeling TheNationalAssociationofRealtorspublisheshousingstatisticsformetropoli-tan areas, as an excel spreadsheet of percent change in the median price of houses for past years and the last four past quarters. Metropolitan_Area, 2002, 2003, 2004, 2004:III, 2004:IV, 2005:I, 2005:II, 2005:III "Allentown-Bethlehem-Easton, PA-NJ", 161.1, 184.7, 207.3, 222.6, 210.5, 214.8, 242.7, N/A, N/A ... To perform the analysis we need the decimal coordinates of each metropoli-ton area. For example: -117.425 47.65888889 -74.42333333 39.36416667 A free server is available at http://geocoder.us for obtaining these coordi-nates. This server returns the latitude & longitude of any US address. Here is an example of a query for Phoenix, AZ where price of houses increased 55% in 2004: POST: http://geocoder.us/service/csv/geocode?city=Phoenix &state=AZ RETURNS: -112.0733333,33.44833333,Maricopa, Phoenix, AZ The next step is to extract coordinates of metro areas showing a median price increase greater than the value of interest. In this example, there are 24 metro areas with median house increases greater than 20%. These points are pasted into the prediction application. All the data were used in the prediction, and none were held back for validation. With this small number of points, there would have been substantial variation between variables se-lected due to subsetting of points. Had I been interested in more rigorous tests of statistical skill I would have repeated the analysis a number of times, estimating accuracy on a ‘held back’ set of points. 6.1.1 Analysis The following are the results of predicting the distribution of metro areas with increases greater than 20% in 2004. Some experiments show the conse-quence of these choices later. • dataset size - all terrestrial vs only climate variables • data type - annual climate vs monthly and other climate © 2007 by Taylor and Francis Group, LLC Examples 101 FIGURE 6.1: Predicted price increases >20% using altitude 2.5 minute variable selected by WhyWhere from the dataset of 528 All Terrestrial vari-ables. • mask - the areas of ocean are not included in the analysis • P analysis - the distribution of the P draw is compared with the B population. These were done using WhyWhere with the All Terrestrial dataset consist-ing of 528 terrestrial variables the analysis used defaults associated with this software. The results were as follows: Environmental Data from All Terrestrial (528) alt_2.5m: altitude 2.5 minutes resolution Accuracy 0.805 In the predicted distribution shown on Figure 6.1 the lighter the area the higher the predicted probability. At this fine scale the points of the metro areas are hard to see, but will be more apparent on later images. For comparison here are the results with fewer climate related variables run on a smaller data set of climate variables called ClimateAnnAve consisting of annual temperature, precipitation and standard deviations. The accuracies and maps are as follows: Environmental Data for .0 from ClimateAnnAv 0. lwcpr00 Legates Willmott Annual Corrected Precipitation (mm/year) Range 0 to 6626 millimeters/year Accuracy 0.787 The results show annual precipitation is the best predictor (Table 6.2). Figure 6.2 shows the predictions resulting from the model as derived from © 2007 by Taylor and Francis Group, LLC ... - tailieumienphi.vn
nguon tai.lieu . vn