Xem mẫu

Chapter 16 FPGA Design for Real-Time Implementation of Constrained Energy Minimization for Hyperspectral Target Detection Jianwei Wang, University of Maryland, Baltimore County Chein-I Chang, University of Maryland, Baltimore County Contents 16.1 Introduction .......................................................... 380 16.2 Constrained Energy Minimization (CEM) ............................. 381 16.3 Real-Time Implementation of CEM ................................... 382 16.3.1 Method 1: CEM Implementation .............................. 383 16.3.2 Method 2: CEM Implementation .............................. 388 16.4 Simulation Results .................................................... 392 16.5 Conclusions .......................................................... 394 References .................................................................. 395 The constrained energy minimization (CEM) has been widely used for hyperspectral detection and classification. The feasibility of implementing the CEM as a real-time processing algorithm in systolic arrays has also been demonstrated. The main chal-lenge of realizing the CEM in hardware architecture is the computation of the inverse of the data correlation matrix performed in the CEM, which requires a complete set of data samples. In order to cope with this problem, the data correlation matrix must be calculated in a causal manner that only needs data samples up to the sample at the timeitisprocessed.ThischapterpresentsaFieldProgrammableGateArrays(FPGA) design of such a causal CEM. The main feature of the proposed FPGA design is to use the Coordinate Rotation Digital Computer (CORDIC) algorithm, which can con-vert a Givens rotation of a vector to a set of shift-add operations. As a result, the CORDIC algorithm can be easily implemented in hardware architectures, and there-fore in FPGA. Since the computation of the inverse of the data correlation matrix involves a series of Givens rotations, the utility of the CORDIC algorithm allows the causal CEM to perform real-time processing in FPGA. In this chapter, an FPGA 379 © 2008 by Taylor & Francis Group, LLC 380 High-Performance Computing in Remote Sensing implementation of the causal CEM will be studied and its detailed architecture will be also described. 16.1 Introduction The importance of real-time processing has been recently realized and recognized in many applications. In some applications, e.g., on-board spacecraft data processing system, it is very useful to have high levels of processing throughput. Specially, as the data rate generated by spacecraft instruments is increasing, onboard science data processing has been largely absent from remote sensing missions. Many advantages can result from real-time processing. One is the detection of moving targets. This is critical and crucial in battlefields when moving targets such as tanks or missile launching vehicles pose real threats to ground troops. Real-time data processing provides timely intelligence information that can help to reduce causality. Another is onboard data processing. For space-borne satellites, real-time data processing can significantly reduce mass storage of data volume. A third advantage is chip design. It can be implemented in parallel and reduce computation load. Furthermore, it can also reduce payload in aircrafts and satellites. Over the past years, many subpixel detection and mixed pixel algorithms have been developed and shown to be very versatile. However, their applicability to real-time processing problems is generally restricted by the very complex computational workloads. In this chapter, we explore the feasibility of the Field Programmable Gate Arrays (FPGA) design for real-time implementation of a hyperspectral detection and clas-sification algorithm, called constrained energy minimization (CEM) [1], which has shown promise in hyperspectral data exploitation. The issue of real-time processing for CEM was also studied in [2], where its systolic array implementation was devel-oped. In recent years, rapid advances in VLSI technology have had a large impact on modern digital signal processing. Over the past thirty years, we have witnessed that the number of transistors per chip has doubled about once a year. Therefore, VLSI design of complex algorithms becomes more and more feasible. The major difficulty with implementing these algorithms in real time is the computation of the inverse of a matrix. Systolic arrays provide a possibility, but they require a series of Givens rotations to decompose a matrix into triangular matrices that can be implemented in real time. Unfortunately, such Givens rotations cannot be realized in hardware. In ordertoresolvethisissue,theGivensrotationsmustbeperformedbyoperationssuch as adds, ORs, XORs, and shifts that can be realized in hardware architectures. In order to do so, we make use of the Coordinate Rotation Digital Computer (CORDIC) algorithm developed by Volder [3], which allows us to convert a Givens rotation to a series of shifts-adds operations. Using systolic arrays architecture in conjunction with the CORDIC algorithm, we can implement the computation of a matrix inverse in a set of shifts-adds operations. As a result, it makes the FPGA design of CEM possible. This chapter presents the detailed FPGA design layout for CEM. © 2008 by Taylor & Francis Group, LLC FPGA Design for Real-Time Implementation 381 16.2 Constrained Energy Minimization (CEM) Assume that a remotely sensed image is a collection of image pixels denoted by {r1,r2,··· ,rN}, where ri = (ri1,ri2,··· ,riL)T for 1 ≤ i ≤ N is an L-dimensional pixel vector, N is the total number of pixels in the image, and L is the total number of spectral channels. Suppose that d = (d1,··· ,dL)T is the desired target signature of interest in the image. The goal is to design a finite impulse response (FIR) linear filterspecifiedby L filtercoefficients{w1,w2,··· ,wL},denotedbyan L-dimensional vectorw = (w1,··· ,wL)T thatcanbeusedtodetectthesignaturedwithoutknowing the image background. If we assume that yi is the output of the designed FIR filter resulting from the input ri, then yi can be expressed by L yi = wlril = rT w = wT ri (16.1) l=1 In order to detect the desired target signature d using the filter output yi, the FIR filter must be constrained by the following equation: X d w = dlwl = 1 (16.2) l=1 so that the d can pass through the filter while the output energies resulting from other signatures will be minimized. This problem is a well-known linearly constrained adaptive beamforming problem, called minimum variance distortionless response (MVDR), which can be cast as follows: min{wT RL×Lw} subject to the constraint: dT w = 1 (16.3) where RL×L = (1/N)·Pi=1 riri ¸is the autocorrelation sample matrix of the image and · N ¸ · N ¸ µ · N ¸¶ y2 = (rT w)T (rT w) = wT rirT w = wT RL×Lw i=1 i=1 i=1 (16.4) The solution to Eq. 16.4 can be obtained by [1]: −1 ∗ L×L dT RL×Ld (16.5) © 2008 by Taylor & Francis Group, LLC 382 High-Performance Computing in Remote Sensing 16.3 Real-Time Implementation of CEM One of the significant advantages of the CEM is that the correlation matrix RL×L in the optimal weights specified by Eq. 16.5 can be decomposed into a product of a unitary matrix Q and an upper triangular matrix R by either the Givens rotations or the Householder transform. Such a decomposition is commonly referred to as QR-decomposition. Another advantage is that the CEM can be implemented in real time where the correlation matrix can be carried out either line-by-line or pixel-by-pixel from left to right and top to bottom. For illustrative purpose, we assume that the correlation matrix is performed line-by-line and at each line t a data matrix Xt = [rt1,rt2,··· ,rtN] is formed up to this particular line. In this case, the RL×L in Eq. 16.3 is replaced by the data autocorrelation matrix of line t in the image, denoted by t: X · N ¸ · ¸ = rtirti = XtXt (16.6) t i=1 With QR-decomposition, Xt can be expressed by Xr = QtRt (16.7) upper Here, Qt is a unitary matrix with Q−1 = QT and Rt = and R = t is not necessarily of full rank, where 0 is a zero vector and ⎡ ∗ Rupper = ⎢ 0 ⎣ . 0 ∗ ··· ∗ ⎤ ∗ ··· ∗ ⎥ ... ... . ⎦ ··· 0 ∗ (16.8) is an upper triangular matrix, and ∗ in Rupper is a nonzero element. From Eq. 16.6, the inverse of t can be computed as X = N(XXT ) = N ©(Rupper)−1[(Rupper)T ]−1ª (16.9) t where the unitary matrix Qt is canceled out in Eq. 16.6 because Q−1 = QT . Substi-tuting in Eq. 16.9 for RL×L yields w∗ = ©(Rupper)−1[(Rupper)T ]−1ª·d(dT ©(Rupper)−1[(Rupper)T ]−1ªd)−1 (16.10) Since Rupper is an upper triangular matrix, so is (Rupper)−1. Therefore, Eq. 16.10 doesnotrequirecomputationofRL×L.Asaresult,itcanbeimplementedinreal-time processing. In this chapter, a Givens rotation is used to perform QR-decomposition. © 2008 by Taylor & Francis Group, LLC FPGA Design for Real-Time Implementation 383 r13 r12 r11 r23 r22 r21 _ _ _ 13 12 11 r33 r32 r31 _ _ 23 22 _ _ 33 32 CORDIC Circuit Sign CORDIC Circuit Sign CORDIC Circuit Sign _ _ _ 13 12 11 r23 r22 0 ~ ~ ~ 13 12 11 r33 r32 0 ~ ~ 23 22 ~ 33 Figure 16.1 Systolic array for QR-decomposition. 16.3.1 Method 1: CEM Implementation In Method 1, we decompose Eq. 16.5 into several components and each component is implemented by a separated hardware module: RL×Ld (XXT )−1d (Rupper)−1[(Rupper)T ]−1d dT RL×Ld dT (XXT )−1d dT (Rupper)−1[(Rupper)T ]−1]d To implement Eq. 16.11, five modules are required: (16.11) r Array of CORDIC circuits shown in Figure 16.1, where the pixel stream is fed into the module and the upper triangular matrix Rupper is updated in realtime. r Apply backsubstitution to obtain the inverse of Rupper, called invR. r Apply distributed arithmetic in order to calculate c = [(Rupper)T ]−1d = invRT ∗d. r Compute w = invRT ∗c. r The filter output energy can be obtained by applying an FIR filter to the current input pixel streams. The detailed implementation for each of the five modules is described as follows: r Generation of an upper triangular matrix. A set of CORDIC circuits is applied to perform a Givens rotation, as shown in Figure 16.1. For demonstrative pur-pose, let us assume L = 3. At first, two pixel streams (row 1 and row 2) are fed into the CORDIC circuit, and as a result, the first zero, which will occupy the r21 position, is introduced by a Givens rotation. The first pair (r11,r12) is © 2008 by Taylor & Francis Group, LLC ... - tailieumienphi.vn
nguon tai.lieu . vn