Xem mẫu

Independent Component Analysis. Aapo Hyvarinen, Juha Karhunen, Erkki Oja Copyright  2001 John Wiley & Sons, Inc. ISBNs: 0-471-40540-X (Hardback); 0-471-22131-7 (Electronic) 17 Nonlinear ICA This chapter deals with independent componentanalysis (ICA) for nonlinear mixing models. A fundamental difficulty in the nonlinear ICA problem is that it is highly nonuniquewithoutsomeextraconstraints,whichareoftenrealizedbyusingasuitable regularization. We alsoaddressthenonlinearblindsourceseparation(BSS)problem. Contrarytothelinearcase, weconsiderit differentfromtherespectivenonlinearICA problem. After considering these matters, some methods introduced for solving the nonlinear ICA or BSS problems are discussed in more detail. Special emphasis is given to a Bayesian approach that applies ensemble learning to a flexible multilayer perceptron model for finding the sources and nonlinear mixing mapping that have most probablygivenrise to the observedmixeddata. Theefficiencyofthis methodis demonstratedusingbothartificialandreal-worlddata. Attheendofthechapter,other techniques proposed for solving the nonlinear ICA and BSS problems are reviewed. 17.1 NONLINEAR ICA AND BSS 17.1.1 The nonlinear ICA and BSS problems In many situations, the basic linear ICA or BSS model n X x A s (17.1) s a j j j is too simple for describing the observed data adequately. Hence, it is natural to x considerextensionofthelinearmodeltononlinearmixingmodels. Forinstantaneous 315 316 NONLINEAR ICA mixtures, the nonlinear mixing model has the general form x f s (17.2) where is the observed -dimensional data (mixture) vector, is an unknown real-f x m valued -component mixing function, and is an -vector whose elements are the m ns unknown independent components. n Assume now for simplicity that the number of independent components equals n the number of mixtures . The general nonlinear ICA problem then consists of m finding a mapping h R n Rn that gives components y hx (17.3) that are statistically independent. A fundamental characteristic of the nonlinear ICA problem is that in the general case, solutions always exist, and they are highly nonunique. One reason for this is that if and are two independent random x y variables, any of their functions and are also independent. An even more g y f x seriousproblemisthatinthenonlinearcase, and canbemixedandstillstatistically x y independent, as will be shown below. This is not unlike in the case of gaussian ICs in a linear mixing. In this chapter, we define BSS in a special way to clarify the distinction between finding independent components, and finding the original sources. Thus, in the respective nonlinear BSS problem, one should find the original source signals s that have generated the observed data. This is usually a clearly more meaningful and unique problem than nonlinear ICA defined above, provided that suitable prior information is available on the sources and/or the mixing mapping. It is worth emphasizing that if some arbitrary independent components are found for the data generated by (17.2), they may be quite different from the true source signals. Hence the situation differs greatly from the basic linear data model (17.1), for which the ICA or BSS problems have the same solution. Generally, solving the nonlinear BSS problemisnoteasy, andrequiresadditionalpriorinformationorsuitable regularizing constraints. An important special case of the general nonlinear mixing model (17.2) consists of so-called post-nonlinear mixtures. There each mixture has the form n X A a s i n x f ij j i i j (17.4) Thus the sources , are first mixed linearly according to the basic j n s j ICA/BSS model (17.1),but after that a nonlinearfunction is appliedto them to get f i the final observations . It can be shown [418] that for the post-nonlinear mixtures, x i the indeterminaciesare usually the same as for the basic linear instantaneousmixing model (17.1). That is, the sources can be separated or the independent compo-nents estimated up to the scaling, permutation, and sign indeterminaciesunder weak conditions on the mixing matrix and source distributions. The post-nonlinearity A assumption is useful and reasonable in many signal processing applications, because NONLINEAR ICA AND BSS 317 it can be thought of as a model for a nonlinear sensor distortion. In more general situations, it is a restrictive and somewhat arbitrary constraint. This model will be treated in more detail below. Another difficulty in the general nonlinear BSS (or ICA) methods proposed thus far is that they tend to be computationally rather demanding. Moreover, the compu-tational load usually increases very rapidly with the dimensionality of the problem, preventingin practice the applicationof nonlinearBSS methodsto high-dimensional data sets. The nonlinear BSS and ICA methods presented in the literature could be divided intotwobroadclasses: generativeapproachesandsignaltransformationapproaches [438]. In the generative approaches, the goal is to find a specific model that explains how the observations were generated. In our case, this amounts to estimating both the source signals and the unknown mixing mapping that have generated the f s observed data x through the general mapping (17.2). In the signal transformation methods, one tries to estimate the sources directly using the inverse transformation (17.3). In these methods, the number of estimated sources is the same as the number of observed mixtures [438]. 17.1.2 Existence and uniqueness of nonlinear ICA The question of existence and uniqueness of solutions for nonlinear independent componentanalysishas been addressedin [213]. The authorsshow that there always exists an infinity of solutions if the space of the nonlinear mixing functions is f not limited. They also present a method for constructing parameterized families of nonlinear ICA solutions. A unique solution (up to a rotation) can be obtained in the two-dimensional special case if the mixing mapping is constrained to be a f conformal mapping together with some other assumptions; see [213] for details. In the following, we present in more detail the constructive method introduced in [213] that always yields at least one solution to the nonlinear ICA problem. This proceduremight be considered as a generalization of the well-known Gram-Schmidt orthogonalization method. Given m independent variables y = y y and a m variable , a new variable = is constructedso that the set y y yx g y x m m is mutually independent. The construction is defined recursively as follows. Assume that we have already independentrandom variables which are jointly uniformlydistributed y y m m in m . Here it is not a restriction to assume that the distributions of the are y i uniform, since this follows directly from the recursion, as will be seen below; for a single variable, uniformitycan be attainedby the probabilityintegral transformation; see (2.85). Denotebyx anyrandomvariable,andbya a b somenonrandom m 318 NONLINEAR ICA scalars. Define g a a b p P x bjy a y a m y x m m (17.5) R b p a a d y x m p a a y m where and are the marginal probability densities of and , y x yp p y y x respectively(itisassumedhereimplicitlythatsuchdensitiesexist),and denotes P j the conditionalprobability. The in the argumentof is to remindthat depends p g g y x onthejointprobabilitydistributionof and . For , issimplythecumulative g m y x distribution functionof . Now, as defined abovegives a nonlineardecomposition, x g as stated in the following theorem. Theorem 17.1 Assume that are independent scalar random variables y y m that have a joint uniform distribution in the unit cube m . Let be any scalar x random variable. Define g as in (17.5), and set y g y y x p (17.6) m m y x Then is independent from the , and the variables are y y y yy m m m jointly uniformly distributed in the unit cube m . Thetheoremis provedin[213]. Theconstructivemethodgivenabovecan beused to decompose variables into independent components , x x n y yn n n giving a solution for the nonlinear ICA problem. This construction also clearly shows that the decomposition in independent com-ponents is by no means unique. For example, we could first apply a linear trans- formation on the to obtain another random vector = , and then compute Lx yx x = with being defined using the above procedure, where is replaced by g x g x . Thus we obtain another decomposition of into independent components. The x x resulting decomposition = is in general different from , and cannot be y yg Lx reduced to by any simple transformations. A more rigorous justification of the y nonuniquenessproperty has been given in [213]. Lin [278] has recently derived some interesting theoretical results on ICA that are useful in describing the nonuniqueness of the general nonlinear ICA problem. Let the matrices and denote the Hessians of the logarithmic probability H H s x densities and of the source vector and mixture (data) vector , x log p s slog p x s x respectively. Then for the basic linear ICA model (17.1) it holds that T H A H A s x (17.7) where is the mixing matrix. If the components of are truly independent, A H s s should be a diagonal matrix. Due to the symmetry of the Hessian matrices and H s , Eq. (17.7) imposes constraints for the elements of the matrix H n n nn x . Thusa constantmixingmatrix can besolvedby estimating at two different H A A x points, and assuming some values for the diagonal elements of H . s SEPARATION OF POST-NONLINEAR MIXTURES 319 If the nonlinear mapping (17.2) is twice differentiable, we can approximate it locally at any point by the linear mixing model (17.1). There is defined by the A first order term of the Taylor series expansion of at the desired point. f s f s s But now generally changes from point to point, so that the constraint conditions A (17.7) still leave degrees of freedom for determining the mixing matrix nn (omittingthe diagonalelements). This also shows that the nonlinearICA problem A is highly nonunique. Taleb and Jutten have consideredseparability of nonlinearmixturesin [418, 227]. Their general conclusion is the same as earlier: Separation is impossible without additional prior knowledge on the model, since the independence assumption alone is not strong enough in the general nonlinear case. 17.2 SEPARATION OF POST-NONLINEAR MIXTURES Before discussing approachesapplicable to general nonlinear mixtures, let us briefly consider blind separation methods proposed for the simpler case of post-nonlinear mixtures (17.4). Especially Taleb and Jutten have developed BSS methods for this case. Theirmainresultshave beenrepresentedin [418], anda short overviewoftheir studies on this problem can be found in [227]. In the following, we present the the main points of their method. A separation method for the post-nonlinearmixtures (17.4) should generally con-sist of two subsequent parts or stages: 1. A nonlinear stage, which should cancel the nonlinear distortions f i i . This part consists of nonlinear functions . The parameters n g u i i of each nonlinearity are adjusted so that cancellation is achieved (at least g i i roughly). 2. Alinearstagethatseparatestheapproximatelylinearmixtures obtainedafter v thenonlinearstage. Thisis doneasusualbylearninga separatingmatrix n n for which the components of the output vector = of the separating y Bv B system are statistically independent (or as independent as possible). Taleb and Jutten [418] use the mutual information between the components I y oftheoutputvector(seeChapter10)asthecostfunctionandindependence y y n criterion in both stages. For the linear part, minimization of the mutual information leads to the familiar Bell-Sejnowski algorithm (see Chapters 10 and 9) I y T T f x g B B (17.8) where components of the vector are score functions of the components y of i i the output vector : y d p u i u log p u i i (17.9) du p u i ... - tailieumienphi.vn
nguon tai.lieu . vn