Xem mẫu

9 Spatial Spatial Cluster Analysis, Regression, and Applications in Toponymical, Cancer, and Homicide Studies Spatial cluster analysis detects unusual concentrations or nonrandomness of events in space and time. Nonrandomness of events indicates the existence of spatial autocorrelation, and thus necessitates the usage of spatial regression in regression analysis of those events. Since the issues were raised several decades ago, applica-tions of spatial cluster analysis and spatial regression were initially limited because of their requirements of intensive computation. Recent advancements in software development, including availability of many free packages, have stimulated greater interests and wide applications. This chapter discusses spatial cluster analysis and spatial regression, and introduces related spatial analysis packages that implement some of the methods. Two application fields utilize spatial cluster analysis extensively. In crime stud-ies, it is often referred to as hot-spot analysis. Concentrations of criminal activities or hot spots in certain areas may be caused by (1) particular activities, such as drug trading (e.g., Weisburd and Green, 1995); (2) specific land uses, such as skid row areas and bars; or (3) interaction between activities and land uses, such as thefts at bus stops and transit stations (e.g., Block and Block, 1995). Identifying hot spots is useful for police and crime prevention units to target their efforts on limited areas. Health-related research is another field with wide usage of spatial cluster analysis. Does the disease exhibit any spatial clustering pattern? What areas experience a high or low prevalence of disease? Elevated disease rates in some areas may arise simply by chance alone or may be of no public health significance. The pattern generally warrants study only when it is statistically significant (Jacquez, 1998). Spatial cluster analysis is an essential and effective first step in any exploratory investigation. If the spatial cluster patterns of a disease do exist, case-control, retrospective cohort, and other observational studies can follow up. Rigorous statistical procedures for cluster analysis may be divided into point-based and area-based methods. Point-based methods require exact locations of individual occurrences, whereas area-based methods use aggregated disease rates in regions. Data availability dictates which methods are used. The common belief that point-based methods are better than area-based methods is not well grounded 167 © 2006 by Taylor & Francis Group, LLC 168 Quantitative Methods and Applications in GIS (Oden et al., 1996). In this chapter, Section 9.1 discusses point-based spatial cluster analysis, followed by a case study of Tai place-names (or toponymical study) in southern China using the software SaTScan in Section 9.2. Section 9.3 covers area-based spatial cluster analysis, followed by a case study of cancer patterns in Illinois in Section 9.4. Area-based spatial cluster analysis is implemented by some spatial statistics now available in ArcGIS. Other software, such as CrimeStat (Levine, 2002), provides similar functions. In addition, Section 9.5 introduces spatial regression, and Section 9.6 uses the package GeoDa to illustrate some of the methods in a case study of homicide patterns in Chicago. The chapter is concluded by a brief summary in Section 9.7. Other than ArcGIS, both SaTScan and GeoDa are free software for researchers. There are a wide range of methods for spatial cluster analysis and regression, and this chapter only introduces some exemplary methods, i.e., those most widely used and implemented in the aforementioned packages. 9.1 POINT-BASED SPATIAL CLUSTER ANALYSIS The methods for point-based spatial cluster analysis can be grouped into two categories: tests for global clustering and tests for local clusters. 9.1.1 POINT-BASED TESTS FOR GLOBAL CLUSTERING Tests for global clustering are used to investigate whether there is clustering throughout the study region. The test by Whittemore et al. (1987) computes the average distance between all cases and the average distance between all individuals (including both cases and controls). Cases represent individuals with the disease (or the events in general) being studied, and controls represent individuals without the disease (or the nonevents in general). If the former is lower than the latter, it indicates clustering. The method is useful if there are abundant cases in the central area of the study area, but not good if there is a prevalence of cases in peripheral areas (Kulldorff, 1998, p. 53). The method by Cuzick and Edwards (1990) examines the k nearest neighbors to each case and tests whether there are more cases (not controls) than what would be expected under the null hypothesis of a purely random configuration. Other tests for global clustering include Diggle and Chetwynd (1991), Grimson and Rose (1991), and others. 9.1.2 POINT-BASED TESTS FOR LOCAL CLUSTERS For most applications, it is also important to identify cluster locations or local clusters. Even when a global clustering test does not reveal the presence of overall clustering in a study region, there may be some places exhibiting local clusters. The geographical analysis machine (GAM) developed by Openshaw et al. (1987) first generates grid points in a study region, then draws circles of various radii around each grid point, and finally searches for circles containing a significantly high prevalence of cases. One shortcoming of the GAM method is that it tends to generate a high percentage of false positive circles (Fotheringham and Zhan, 1996). Since many significant circles overlap and contain the same cluster of cases, the Poisson © 2006 by Taylor & Francis Group, LLC Spatial Cluster Analysis, Spatial Regression, and Applications 169 tests that determine each circle’s significance are not independent, and thus lead to the problem of multiple testing. The test by Besag and Newell (1991) only searches for clusters around cases. Say k is the minimum number of cases needed to constitute a cluster. The method identifies the areas that contain the k – 1 nearest cases (excluding the centroid case), then analyzes whether the total number of cases in these areas1 is large relative to the total risk population. Common values for k are between 3 and 6 and may be chosen based on sensitivity analysis using different k values. As in the GAM, clusters identified by Besag and Newell’s test often appear as overlapping circles. But the method is less likely to identify false positive circles than the GAM, and is also less computationally intensive (Cromley and McLafferty, 2002, p. 153). Other point-based spatial cluster analysis methods not reviewed here include Rushton and Lolonis (1996) and others. The following discusses the spatial scan statistic by Kulldorff (1997), imple-mented in SaTScan. SaTScan is a free software program developed by Kulldorff and Information Management Services, available at http://www.satscan.org. Its main usage is to evaluate reported spatial or space-time disease clusters and to see if they are statistically significant. Like the GAM, the spatial scan statistic uses a circular scan window to search the entire study region, but takes into account the problem of multiple testing. The radius of the window varies continuously in size from 0 to 50% of the total population at risk. For each circle, the method computes the likelihood that the risk of disease is higher inside the window than outside the window. The spatial scan statistic uses either a Poisson-based model or a Bernoulli model to assess statistical significance. When the risk (base) population is available as aggregated area data, the Poisson-based model is used, and it requires case and population counts by areal units and the geographic coordinates of the points. When binary event data for case-control studies are available, the Bernoulli model is used, and it requires the geographic coordinates of all individuals. The cases are coded as ones and controls as zeros. For instance, under the Bernoulli model, the likelihood function for a specific window z is L(z, p,q) = pn (1− p)m−n qN−n (1− q)(M−m)−(N−n) (9.1) where N is the total number of cases in the study region, n is the number of cases in the window, M is the total number of controls in the study region, m is the number of controls in the window, p = n / m (probability of being a case within the window), and q = (N − n) / (M − m) (probability of being a case outside the window). The likelihood function is maximized over all windows, and the “most likely” cluster is one that is least likely to have occurred by chance. The likelihood ratio for the window is reported and constitutes the maximum likelihood ratio test statistic. Its distribution under the null hypothesis and its corresponding p value are deter-mined by a Monte Carlo simulation approach. The method also detects secondary clusters with the highest likelihood function for a particular window that do not overlap with the most likely cluster or other secondary clusters. © 2006 by Taylor & Francis Group, LLC 170 Quantitative Methods and Applications in GIS 9.2 CASE STUDY 9A: SPATIAL CLUSTER ANALYSIS OF TAI PLACE-NAMES IN SOUTHERN CHINA This project extends the toponymical study of Tai place-names in southern China, introduced in Sections 3.2 and 3.4, which focus on mapping the spatial patterns based on spatial smoothing and interpolation techniques. Mapping is merely descrip-tive and cannot identify whether the concentrations of Tai place-names in some areas are random. The answer relies on rigorous statistical analysis, in this case, point-based spatial cluster analysis. The software SaTScan (the current version is 5.1) is used to implement the study. The project uses the same datasets as in case studies 3A and 3B: mainly, the point coverage qztai with the item TAI identifying whether a place-name is Tai (= 1) or non-Tai (= 0). In addition, the shapefile qzcnty is provided for mapping the background. 1. Preparing data in ArcGIS for SaTScan: Implementing the Bernoulli model for point-based spatial cluster analysis in SaTScan requires three data files: a case file (containing location ID and number of cases in each location), a control file (containing location ID and number of controls in each location), and a coordinates file (containing location ID and Cartesian coordinates or latitude and longitude). The three files can be read by SaTScan through its Import Wizard. In the attribute table of qztai, the item TAI already defines the case number (= 1) for each location, and thus the case file. For defining the control file, open the attribute table of qztai in ArcGIS, add a new field NONTAI, and calculate it as NONTAI=1-TAI. For defining the coordinates file, use ArcToolbox > Coverage Tools > Data Management > Tables > Add XY Coordinates to add X-COORD and Y-COORD. Export the attribute table to a dBase file qztai.dbf. 2. Executing spatial cluster analysis in SaTScan: Activate SaTScan and choose Create New Session. A New Session dialog window is shown in Figure 9.1. Under the first tab, Input, use the Import Wizard to define the case file: clicking next to Case File > choose qztai.dbf as the input file > in the SaTScan Input Wizard dialog, choose qztai-id under Source File Variable for Location ID, and similarly TAI for Number of Cases. Define the Control File and the Coordinates File similarly. Under the second tab, Analysis, click Purely Spatial under Type of Analysis, Bernoulli under Probability Model, and High Rates under “Scan for Areas with.” Under the third tab, Output, input Taicluster as the Results File and check all four boxes under dBase. Finally, choose Execute Ctl+E under the main menu Session to run the program. Results are saved in various dBase files sharing the file name Taicluster, where the field CLUSTER identifies whether a place is included in a cluster (= 1 for the primary cluster, = 2 for the secondary cluster, = for those not included in a cluster). © 2006 by Taylor & Francis Group, LLC Spatial Cluster Analysis, Spatial Regression, and Applications 171 FIGURE 9.1 SaTScan dialog for point-based spatial cluster analysis. N Tai place-names Non-cluster Cluster 1 Cluster 2 County 0 10 20 40 60 80Kilometers FIGURE 9.2 Spatial clusters of Tai place-names in southern China. 3. Mapping spatial cluster analysis results: In ArcGIS, join the dBase file Taicluster.gis.dbf to the attribute table of qztai using the com-mon key (LOC_ID in Taicluster.gis.dbf and qztai-id in qztai). Figure 9.2 uses different symbols to highlight the places that are included in the primary and secondary clusters. The two circles are drawn by hand to show the approximate extents of clusters. The spatial cluster analysis confirms that the major concentration of Tai place-names is in the west of Qinzhou, and a minor concentration is in the middle. © 2006 by Taylor & Francis Group, LLC ... - tailieumienphi.vn
nguon tai.lieu . vn