Xem mẫu

2 MRF-Based Remote-Sensing Image Classification with Automatic Model Parameter Estimation Sebastiano B. Serpico and Gabriele Moser CONTENTS 2.1 Introduction......................................................................................................................... 39 2.2 Previous Work on MRF Parameter Estimation ............................................................. 40 2.3 Supervised MRF-Based Classification............................................................................. 42 2.3.1 MRF Models for Image Classification................................................................. 42 2.3.2 Energy Functions.................................................................................................... 43 2.3.3 Operational Setting of the Proposed Method.................................................... 44 2.3.4 The Proposed MRF Parameter Optimization Method..................................... 45 2.4 Experimental Results.......................................................................................................... 48 2.4.1 Experiment I: Spatial MRF Model for Single-Date Image Classification............................................................................................... 51 2.4.2 Experiment II: Spatio-Temporal MRF Model for Two-Date Multi-Temporal Image Classification.................................................................. 51 2.4.3 Experiment III: Spatio-Temporal MRF Model for Multi-Temporal Classification of Image Sequences ......................................... 55 2.5 Conclusions.......................................................................................................................... 56 Acknowledgments....................................................................................................................... 58 References ..................................................................................................................................... 58 2.1 Introduction Within remote-sensing image analysis, Markov random field (MRF) models represent a powerful tool [1], due to their ability to integrate contextual information associated with the image data in the analysis process [2,3,4]. In particular, the use of a global model for the statistical dependence of all the image pixels in a given image-analysis scheme typically turns out to be an intractable task. The MRF approach offers a solution to this issue, as it allows expressing a global model of the contextual information by using only local relations among neighboring pixels [2]. Specifically, due to the Hammersley– Clifford theorem [3], a large class of global contextual models (i.e., the Gibbs random fields, GRFs [2]) can be proved to be equivalent to local MRFs, thus sharply reducing the related model complexity. In particular, MRFs have been used for remote-sensing image analysis,forsingle-date[5],multi-temporal[6–8],multi-source[9],andmulti-resolution[10] 39 © 2008 by Taylor & Francis Group, LLC 40 Image Processing for Remote Sensing classification, for denoising [1], segmentation [11–14], anomaly detection [15], texture extraction [2,13,16], and change detection [17–19]. Focusing on the specific problem of image classification, the MRF approach allows one to express a ‘‘maximum-a-posteriori’’ (MAP) decision task as the minimization of a suitable energy function. Several techniques have been proposed to deal with this minimization problem,suchasthesimulatedannealing(SA),aniterativestochasticoptimizationalgorithm converging to a global minimum of the energy [3] but typically involving long execution times [2,5], the iterative conditional modes (ICM), an iterative deterministic algorithm converging to a local (but usually good) minimum point [20] and requiring much shorter computation times than SA [2], and the maximization of posterior marginals (MPM), which approximates the MAP rule by maximizing the marginal posterior distribution of the class label of each pixel instead of the joint posterior distribution of all image labels [2,11,21]. However,anMRFmodelusuallyinvolvestheuseofoneormoreinternalparameters,thus requiring a preliminary parameter-setting stage before the application of the model itself. In particular, especially in the context of supervised classification, interactive ‘‘trial-and-error’’ procedures are typically employed to choose suitable values for the model parameters [2,5,9,11,17,19], while the problem of fast automatic parameter-setting for MRF classifiers is still an open issue in the MRF literature. The lack of effective automatic parameter-setting techniques has represented a significant limitation on the operational use of MRF-based supervised classification architectures, although such methodologies are known for their ability to generate accurate classification maps [9]. On the other hand, the availability of the above-mentioned unsupervised parameter-setting procedures has contributed to an exten-sive use of MRFs for segmentation and unsupervised classification purposes [22,23]. In the present chapter, an automatic parameter optimization algorithm is proposed to overcome the above limitation in the context of supervised image classification. The method refers to a broad category of MRF models, characterized by energy functions expressed as linear combinations of different energy contributions (e.g., representing different typologies of contextual information) [2,7]. The algorithm exploits this linear dependence to formalize the parameter-setting problem as the solution of a set of linear inequalities, and addresses this problem by extending to the present context the Ho–Kashyap method for linear classifier training [24,25]. The well-known convergence properties of such a method and the absence of parameters to be tuned are among the good features of the proposed technique. The chapter is organized as follows. Section 2.2 provides an overview of the previous work on the problem of MRF parameter estimation. Section 2.3 describes the metho-dological issues of the method, and Section 2.4 presents the results of the application of the technique to the classification of real (both single-date and multi-temporal) remote sensing images. Finally, conclusions are drawn in Section 2.5. 2.2 Previous Work on MRF Parameter Estimation Several parameter-setting algorithms have been proposed in the context of MRF models for image segmentation (e.g., [12,22,26,27]) or unsupervised classification [23,28,29], although often resulting in a considerable computational burden. In particular, the usual maximum likelihood (ML) parameter estimation approach [30] exhibits good the-oretical consistency properties when applied to GRFs and MRFs [31], but turns out to be computationally very expensive for most MRF models [22] or even intractable [20,32,33], due to the difficulty of analytical computation and numerical maximization of the © 2008 by Taylor & Francis Group, LLC MRF-Based Remote-Sensing Image Classification 41 normalization constants involved by the MRF-based distributions (the so-called ‘‘parti-tion functions’’) [20,23,33]. Operatively, the use of ML estimates for MRF parameters turns out to be restricted just to specific typologies of MRFs, such as continuous Gaussian [1,13,15,34,35] or generalized Gaussian [36] fields, which allows an ML estimation task to be formulated in an analytically feasible way. Beyond such case-specific techniques, essentially three approaches have been sug-gested in the literature to address the problem of unsupervised ML estimation of MRF parameters indirectly, namely the Monte Carlo methods, the stochastic gradient approaches, or the pseudo-likelihood approximations [23]. The combination of the ML criterion with Monte Carlo simulations has been proposed to overcome the difficulty of computation of the partition function [22,37] and it provides good estimation results, but it usually involves long execution times. Stochastic-gradient approaches aim at maximiz-ing the log-likelihood function by integrating a Gibbs stochastic sampling strategy into the gradient ascent method [23,38]. Combinations of the stochastic-gradient approach with the iterated conditional expectation (ICE) and with an estimation–restoration scheme have also been proposed in Refs. [12,29], respectively. Approximate pseudo-likelihood functionals have been introduced [20,23], and they are numerically feasible, although the resulting estimates are not actual ML estimates (except in the trivial noncontextual case of pixel independence) [20]. Moreover the pseudo-likelihood approximation may underesti-mate the interactions between pixels and can provide unsatisfactory results unless the interactions are suitably weak [21,23]. Pseudo-likelihood approaches have also been developed in conjunction with mean-field approximations [26], with the expectation-maximization (EM) [21,39], the Metropolis–Hastings [40], and the ICE [21] algorithms, with Monte Carlo simulations [41], or with multi-resolution analysis [42]. In Ref. [20] a pseudo-likelihood approach is plugged into the ICM energy minimization strategy, by iteratively alternating the update of the contextual clustering map and the update of the parameter values. In Ref. [10] this method is integrated with EM in the context of multi-resolution MRFs. A related technique is the ‘‘coding method,’’ which is based on a pseudo-likelihood functional computed over a subset of pixels, although these subsets depend on the choice of suitable coding strategies [34]. Several empirical or ad hoc estimators have also been developed. In Ref. [33] a family of MRF models with polynomial energy functions is proved to be dense in the space of all MRFs and is endowed with a case-specific estimation scheme based on the method of moments. In Ref. [43] a least-square approach is proposed for Ising-type and Potts-type MRF models [22], and it formulates an overdetermined system of linear equations relating the unknown parameters with a set of relative frequencies of pixel-label configurations. The combination of this method with EM is applied in Ref. [44] for sonar image segmen-tation purposes. A simpler but conceptually similar approach is adopted in Ref. [12] and is combined with ICE: a simple algebraic empirical estimator for a one-parameter spatial MRF model is developed and it directly relates the parameter value with the relative frequencies of several class-label configurations in the image. On the other hand, the literature about MRF parameter-setting for supervised image classification is very limited. Any unsupervised MRF parameter estimator can also naturally be applied to a supervised problem, by simply neglecting the training data in the estimation process. However, only a few techniques have been proposed so far, effectively exploiting the available training information for MRF parameter-optimization purposes as well. In particular, a heuristic algorithm is developed in Ref. [7] aiming to optimize automatically the parameters of a multi-temporal MRF model for a joint super-vised classification of two-date imagery, and a genetic approach is combined in Ref. [45] with simulated annealing [3] for the estimation of the parameters of a spatial MRF model for multi-source classification. © 2008 by Taylor & Francis Group, LLC 42 Image Processing for Remote Sensing 2.3 Supervised MRF-Based Classification 2.3.1 MRF Models for Image Classification Let I ¼ {x1, x2, ..., xN} be a given n-band remote-sensing image, modeled as a set of N identicallydistributedn-variaterandomvectors.WeassumeMthematicclassesv1,v2, ..., vM to be present in the image and we denote the resulting setof classesbyV ¼ {v1,v2, ..., vM}and theclasslabel of the k-th image pixel(k ¼ 1,2,...,N) bysk 2V.Byoperatingin the context of supervised image classification, we assume a training set to be available, and we denote the index set of the training pixels by T {1,2,..., N} and the corresponding true class label of the k-th training pixel (k 2 T) by s*. When collecting all the feature vectors of the N image pixels in a single (N n)-dimensional column vector1 X ¼ col[x1, x2, ..., xN] and all the pixel labels in a discrete random vector S ¼ (s1, s2, ..., SN) 2 VN, the MAP decisionrule(i.e., theBayesruleforminimumclassificationerror [46])assigns totheimage data X the label vector S, which maximizes the joint posterior probability P(SjX), that is, S ¼ argmaxP(SjX) ¼ argmax½p(XjS)P(S)Š (2:1) S2VN S2VN where p(XjS) and P(S) are the joint probability density function (PDF) of the global feature vector X conditioned to the label vector S and the joint probability mass function (PMF) of the label vector itself, respectively. The MRF approach offers a computationally tractable solution to this maximization problem by passing from a global model for the statistical dependence of the class labels to a model of the local image properties, defined according to a given neighborhood system [2,3]. Specifically, for each k-th image pixel, a neighborhood Nk {1,2,..., N} isassumed tobe defined, such that, for instance,Nk includes the four (first-order neighborhood) or the eight (second-order neighborhood) pixels surrounding the k-th pixel (k ¼ 1,2,..., N). More formally, a neighborhood system is a collection {Nk}k¼1 of subsets of pixels such that each pixel is outside its neighborhood (i.e., k 2= Nk for all k¼1,2,..., N) and neighboring pixels are always mutually neighbors (i.e., k 2 Nh if and only if h 2 Nk for all k,h ¼1,2,..., N, k ¼ h). This simple discrete topological structure attached to the image data is exploited in the MRF framework to model the statistical relationships between the class labels of spatially distinct pixels and to provide a compu-tationally affordable solution to the global MAP classification problem of Equation 2.1. Specifically, we assume the feature vectors x1, x2, ..., xN to be conditionally independ-ent and identically distributed with PDF p(xjs) (x 2 Rn, s 2 V), that is [2], N p(XjS) ¼ p(xkjsk) (2:2) k¼1 and the joint prior PMF P(S) to be a Markov random field with respect to the above-mentioned neighborhood system, that is [2,3], . the probability distribution of each k-th image label, conditioned to all the other image labels, is equivalent to the distribution of the k-th label conditioned only to the labels of the neighboring pixels (k¼1,2,..., N): 1All the vectors in the chapter are implicitly assumed to be column vectors, and we denote by ui the i-th component of an m-dimensional vector u 2 Rm (i ¼ 1,2,..., m), by ‘‘col’’ the operator of column vector juxtaposition (i.e., col [u,v] is the vector obtained by stacking the two vectors u 2 Rm and v 2 Rn in a single (m þ n)-dimensional column vector), and by the superscript ‘‘T’’ the matrix transpose operator. © 2008 by Taylor & Francis Group, LLC MRF-Based Remote-Sensing Image Classification 43 P{sk ¼ vijsh:h ¼ k} ¼ P{sk ¼ vijsh:h 2 Nk}, i ¼ 1,2,...,M (2:3) . the PMF of S is a strictly positive function on VN, that is, P(S) > 0 for all S 2 VN. The Markov assumption expressed by Equation 2.3 allows restricting the statistical relationships among the image labels to the local relationships inside the predefined neighborhood, thus greatly simplifying the spatial-contextual model for the label distri-bution as compared to a generic global model for the joint PMF of all the image labels. However, as stated in the next subsection, due to the so-called Hammersley–Clifford theorem, despite this strong analytical simplification, a large class of contextual models can be accomplished under the MRF formalism. 2.3.2 Energy Functions Given the neighborhood system {Nk}k¼1 , we denote by ‘‘clique’’ a set Q of pixels (Q {1,2,..., N}) such that, for each pair (k,h) of pixels in Q, k and h turn out to be mutually neighbors, that is, k 2 Nh () h 2 Nk 8k,h 2 Q, k ¼ h (2:4) By marking the collection of all the cliques in the adopted neighborhood system by Q and the vector of the pixel labels in the clique Q 2 Q (i.e., SQ ¼ col[sk:k 2 Q]) by SQ, the Hammersley–Clifford theorem states that the label configuration S is an MRF if and only if, for any clique Q 2 Q there exists a real-valued function VQ (SQ) (usually named ‘‘potential function’’) of the pixel labels in Q, so that the global PMF of S is given by the following Gibbs distribution [3]: P(S) ¼ 1 expÿUprior(S), prior X where: Uprior(S) ¼ VQ(SQ) (2:5) Q2Q Q is a positive parameter, and Zprior is a normalizing constant. Because of the formal similarity between the probability distribution in Equation 2.5 and the well-known Max-well–Boltzmann distribution introduced in statistical mechanics for canonical ensembles [47], Uprior (), Q, and Zprior are usually named ‘‘energy function,’’ ‘‘temperature,’’ and ‘‘partition function,’’ respectively. A very large class of statistical interactions among spa-tially distinct pixels can be modeled in this framework, by simply choosing a suitable function Uprior (), which makes the MRF approach highly flexible. In addition, when coupling the Hammersley–Clifford formulation of the prior PMF P(S) with the conditional independenceassumptionstatedfortheconditionalPDFp(XjS)(seeEquation2.2),anenergy representation also holds for the global posterior distribution P(SjX), that is [2], P(SjX) ¼ 1 expÿUpost(SjX) (2:6) post X where Zpost,X is a normalizing constant and Upost () is a posterior energy function, given by: X X Upost(SjX) ¼ ÿQ lnp(xkjsk) þ VQ(SQ) (2:7) k¼1 Q2Q This formulation of the global posterior probability allows addressing the MAP classi-fication task as the minimization of the energy function Upost(jX), which is locally © 2008 by Taylor & Francis Group, LLC ... - tailieumienphi.vn
nguon tai.lieu . vn