Melden

Jose Miguel MorenoFolgen

28. Jan 2013•0 gefällt mir## 0 gefällt mir

•301 Aufrufe## Aufrufe

Sei der Erste, dem dies gefällt

Mehr anzeigen

Aufrufe insgesamt

0

Auf Slideshare

0

Aus Einbettungen

0

Anzahl der Einbettungen

0

Nächste SlideShare

study Streaming Multigrid For Gradient Domain Operations On Large Images

Wird geladen in ... 3

28. Jan 2013•0 gefällt mir## 0 gefällt mir

•301 Aufrufe## Aufrufe

Sei der Erste, dem dies gefällt

Mehr anzeigen

Aufrufe insgesamt

0

Auf Slideshare

0

Aus Einbettungen

0

Anzahl der Einbettungen

0

Melden

Jose Miguel MorenoFolgen

study Streaming Multigrid For Gradient Domain Operations On Large ImagesChiamin Hsu

Slide11 icc2015T. E. BOGALE

Wireless Localization: Ranging (second part)Stefano Severi

A study of the worst case ratio of a simple algorithm for simple assembly lin...narmo

Nyquist criterion for zero ISIGunasekara Reddy

Module 13 Gradient And Area Under A Graphguestcc333c

- Modular Architecture for Ultrasound Beamforming with FPGAs J. F. Cruza, J. Camacho, J. Brizuela, J.M. Moreno, C. Fritsch Consejo Superior de Investigaciones Científicas (CSIC) UMEDIA Group, La Poveda (Arganda), 28500 Madrid, Spain Abstract. Reception beamforming requires a high-resolution time base to keep low delay quantization lobes. Usually signals acquired at the Nyquist sampling rate are up-sampled by a factor of 4 to 16 to achieve the required time- resolution. Although this has been carried out for decades using fractional delay interpolating filters, the interpolation and coherent addition processes use a large silicon area and consume power, mainly when implemented in FPGAs. These drawbacks are reduced if the beamformer is implemented in ASICs, but this precludes any retro-fitting operation. This work presents a beamfoming architecture to overcome these shortcomings, based on the DSP cells incorporated into state-of-the art FPGAs. Since the manufacturing process is similar to that of ASIC, these cells consume low power and processing speed is high. The proposed beamformer interleaves the interpolation and the coherent sum. Besides reducing hardware resources, the design is modular and scalable, which allows to trade off among the number of channels per device, time resolution and resource requirements. It can be easily tailored to very different applications, from the low- channel count systems to the more demanding ones with multi-beamforming capabilities. Keywords: Beamforming, dynamic focusing, interpolation, real-time imaging, parallel beamforming. PACS: 43.58-e, 43.35.Yb, 43.60-Lq INTRODUCTION The conventional delay-and-sum beamformer with dynamic focus introduces time-varying delays to the signals received by every array element to compensate the time-of-flight differences from every focus, usually located along a steering direction. This operation produces the aperture data, which are added together (coherent sum) to provide an output sample of the focused A-scan along the steering direction. High timing resolution is required to reduce the level of the delay quantization lobes [1]. To this purpose, several methods have been proposed, usually falling in one of the following categories: a) Dynamic interpolation techniques, where signals are up-sampled by a factor large enough to achieve the required time resolution and operate with a uniform sampling clock at the Nyquist rate. For this purpose, an interpolating filter for every channel is required before the coherent sum [2-3]. Since the operation is linear, it was suggested to use a single reconstruction filter at the beamformer output [4], but it fails for dynamic focusing because this function is not time-invariant. b) Oversampling ∆Σ techniques, which use low bit count sigma-delta A/Ds operated at a high sampling rate that provide the required time resolution and avoids interpolating hardware [5-6]. However, the availability of multi-channel, high-performance front-end analog processors in a single chip [7] makes this approach less attractive, since they use a similar number of interfacing lines, are less sensitive to digital noise, provide high resolution (up to 14 bits) and can operate with higher frequency transducers. c) Selective sampling techniques, which acquire the signals at the instant they arrive to the array element. These use a separate high resolution clock generator for every channel [8-10] and do not require any interpolation process. However, to take advantage of the new analog front-end processors that share the same clock for several channels, these techniques needs to be combined with an interpolation scheme [11]. So that, with the exception of ∆Σ beamformers, some interpolation process must be carried out. Many approaches are based on fractional delay filters, which allow splitting the sampling period in an integer number of sub-sampling intervals [12]. Here, a bank of L fractional delay filters allow to chose the equivalent sampling instant at 1/L, 2/L, ..., 1 the sampling period. L is closely related to the delay resolution and usually ranges from 4 to 16. On the other hand, every filter in the bank requires a number F of coefficients large enough to achieve the required accuracy (F~L). A common variant is to switch the coefficient set to that required for a given fractional delay.
- The interpolation and coherent sum processes require a large silicon area. For example, a parallel implementation demands L·F multiply-and-add operations per channel. Thus, only a moderate number of channels can be integrated into a single FPGA [13] and the power consumption is relatively high. The alternative is to switch to an ASIC design [14], but this precludes ulterior hardware updates and greatly reduces system flexibility. However, state-of-the art FPGAs include DSP cells intended to implement FIR filters. The DSP cells operate at high speed (above 350 MHz) and include, basically, a multiplier and adder with several possible configurations. The manufacturing process is similar to that used for ASICs, so their power consumption is rather low. This work addresses the design of a beamformer based on the DSP cells currently available in FPGAs. This yields the flexibility of upgrading and low cost of FPGAs together with the low power consumption of ASICs. Furthermore, the design is modular and scalable. The proposed approach allows implementing a complete 128- channel beamformer in a single, moderately sized FPGA. A SYSTOLIC BEAMFORMING ARCHITECTURE The beamforming process for an N-element array performs the following operation: N S (t ) = ∑ wi ri [t −τ i (t )] (1) i =1 where ri(t) is the signal acquired by channel i, wi is a weighting function (apodization) and τi(t) is the dynamically changing focusing delay for channel i. In the discrete time domain, the variable t is substituted by the index k to samples with t=kTS, where TS is the sampling period: N S (k ) = ∑ wi ri [k − Ti (k )] (2) i =1 The value Ti(k) represents the delay, in sampling periods, that must be applied to obtain the k-th output sample S(k) from the sequence of input samples {ri}. Unless Ti(k) is an integer, the value of ri[k- Ti(k)] is not available from the set {ri}, so that it must be estimated by an interpolation process. The interpolation can be performed for band- limited signals by means some kind of FIR or IIR filter or other functions (splines, polynomials, etc.) [12]. To obtain a resolution L times higher than the sampling period, the delay Ti(k) is rounded to the nearest multiple of 1/L, which yields an integer part Mi(k) and a fractional part mi(k)/L. The set of L possible fractional delays is {1/L, 2/L, ...(L-1)/L, 1}. The interpolated sample for a given delay Ti(k) = Mi(k)+mi(k) /L can be approximated by using a (P-1) order FIR filter (P coefficients) as: mi (k ) P ri [k − Ti (k )] = ri [k − M i ( k ) − L ]≈ ∑ hm i ,k ( j )ri [ k − M i (k ) − j ] (3) j =1 The filter is applied to the P samples in the set {ri} ahead the nik = k-Mi(k) sample, which changes dynamically and is handled by the focusing logic. Then, P ri [k − Ti (k )] ≈ ∑ hmi ,k ( j )ri (nik − j ) (4) j =1 Substitution in (2) yields, N P N P S (k ) = ∑ wi ∑ hm i ,k ( j )ri (nik − j ) = ∑∑ hm i ,k ( j ) wi ri ( nik − j ) (5) i =1 j =1 i =1 j =1 Changing the addition order and including the weight wi in the coefficients for channel i, P N S (k ) = ∑∑ hm i ,k ( j )ri (nik − j ) (6) j =1 i =1 Note that the coherent sum can be performed simultaneously with the filtering function in different channels. In a single cycle, a partial result for sample S(k) is obtained as the sum of partial results of the interpolation. After P cycles, the full result for S(k) is obtained. This mechanism allows the construction of an efficient systolic architecture for beamforming, based on the DSP cells included into state-of-the-art FPGAs. Figure 1 shows schematically the processing element for a single channel based on the DSP-48 cells of Xilinx FPGAs [15]. A small FIFO (1Kx18) adjusts the differences on the time of arrival of echoes to the array elements. Writing samples to the FIFO is carried out at the sampling rate clock CKS and is enabled once the integer delay Mi has elapsed. A/D converters of up to 18 bits can be used.
- Si(k) To channel i+1 z-1 48 Mi TC WE 18 Q D si(k) FRAC-FIR Q CH(i) D FIFO RD(i) 18 RD F 48 CKS E(i) 48 EMPTY F b Qi 16 CE F K-FOC ADR MEM-FOC F(i) CKS RD(i-1)·z-1 Si-1(k) FIGURE 1. Processing element for a systolic beamformer The focusing logic must provide the integer part nik as well as the fractional delay for every focus k. Here it is based on the Progressive Focusing Correction (PFC) technique described in [10], where a single bit Qi(k) codes the focusing delay: when Qi(k)=0, the fractional delay remains unchanged; when Qi(k) =1, the next fractional delay is advanced by a quantum 1/L. From some minimum range that sets the delay ni1, the successive delays are defined incrementally as Tk = Tk-1+ (1-Qk)TS, which provides implicitly the nik value. The counter F of b=log2L bits selects the set of filter coefficients for the current fractional delay in the FRAC- FIR filter. Data read from the FIFO pass through this filter, producing the aperture data si(k) that is added to the partial result from the precedent processing element. Note that F is incremented every time Qi = 1 to select the next fractional delay. This way, a round-robin sequence of fractional delays is produced. It is worth to note that, when the fractional delay is ‘1’, the sample at the FIFO passes to the filter output without interpolation. If, in this situation, a focusing code Qi=1 is produced, the next delay will be L-1/L, which is obtained by interpolation of the preceding P samples in the same cycle. In this case, the FIFO read is disabled since the required data is already available in the FIR filter. This exception is easily implemented (not shown). The adders are 48-bits wide, which avoids saturation effects (up to thousand of channels). On the other hand, new data samples are read on FIFO(i) when a read is performed in FIFO(i-1) after a pipeline delay and E(i)=0 (not empty). If the first FIFO read is enabled once the maximum initial delay max(Mi) has elapsed, there is no need to care about the FIFO empty flags, since all of them will have stored one or more samples. An important aspect is the system modularity. An N-channel system is implemented by simply pipelining N processing elements. No logic dependant on system size is required. Therefore, this approach allows using a larger FPGA to integrate more channels or to break the design in several devices for lower cost or other reasons. Figure 2 shows the overall beamformer architecture. In this case it uses a single DSP48 cell per channel, but the actual implementation code allows using D cells per channel. Note that another cell is used as the final accumulator. DSP48-A Accumulator RS Sk Ak FIFO (N) A DSP48-A Q rk rk-1 rk-2 rk-3 RD RDN INH VSN EMPTY EN h0 h1 h2 h3 FIFO (1) A DSP48-A Q rk rk-1 rk-2 rk-3 RD RD1 VS1 INH E1 B EMPTY h0 h1 h2 h3 ‘0’ PRO FIGURE 2. A modular, systolic beamforming architecture
- In the example shown, for a given set of coefficients selected by the focusing logic ({h0, ..., h3}), partial results for sample k are obtained in P steps (P=4 in this case) as: N Step j: cell i output= hi,j·ri, k-j, 1 ≤ i ≤ N → In the accumulator: Ak ( j ) = Ak ( j − 1) + ∑ hi, j ri,k − j (7) i =1 Following the P-th step the value Ak is transferred to register RS that stores the final result Sk. Thus, P processing cycles are required for every output sample. DSP cells operate with clocks above 320 MHz, while ultrasound signals are acquired at a much lower rate, typically below 40 MHz. This 8:1 ratio R on speed can be used to get the output samples at the same rate they are acquired using FIR filters with up to 8 coefficients (order 7 or below). Usually this is enough to yield accurate results using, for instance, Lagrange interpolators with L≤8 [12]. If higher filter orders were required, a number D>1 cells DSP-48 per channel should be used, but this is usually not required since the delay resolution obtained with these figures will be better than 1/32 the signal fundamental period. RESULTS Current VHDL description allows setting, for every processing element, the number D of DSP-48 cells, the number P of filter coefficients and the speed ratio R, where D=P/R. This way, the number of DSP-48 cells required for an N-element implementation, taking into account the final accumulator, is: P NC = N + 1 (8) R Setting N=128 elements, P=8 coefficients and R=8, yields a total NC=129 DSP-48 cells. Other required resources are BRAMs for FIFOs (128) and distributed memory for coefficient and data storage. All this logic fits in a low-cost, moderately sized FPGA of the Xilinx Spartan-6 family XC6SLX75, which has 172 BRAMs and 132 cells DSP-48, leaving room for other processing functions. CONCLUSION A systolic beamformer has been described. It efficiently uses the new hardware Digital Signal Processing (DSP) cells available in state-of-the-art FPGAs, which basically provide a multiplier and adder per cell and operate at high speed. The beamformer interleaves the coherent summation with the sample interpolation process required to achieve a high focusing delay resolution. This allows reducing the resource requirements; in particular, only one accumulator is required for all the process. The high-speed of DSP cells are advantageously used to perform several processing cycles in a single sampling period for real-time operation. Furthermore, every cell is time-multiplexed to perform the fractional filtering function, thus reducing the number of required cells. ACKNOWLEDGMENTS Work supported by project DPI2010-17648 of the Spanish Ministry for Science and Innovation. REFERENCES 1. D. K. Peterson and G. S. Kino, IEEE Trans. on Sonics and Ultrasonics, 31, 4, 337-351, (1984). 2. J.-Y. Lee, H. S. Kim, J. H. Song, J. Cho, T. K. Song, Proc. IEEE Ultrason. Symp., 2, 1396-1399, (2005). 3. C. H. Hu, X. C. Xu, J. M. Cannata, J. T. Yen, K. K. Shung, , IEEE Trans. UFFC, 53, 2, pp. 317-323, (2006). 4. R. A. Mucci, IEEE Trans. Acoust., Speech and Signal Proc., 32, 3, 548-558, (1984). 5. S. R. Freeman et al., IEEE Trans. UFFC, 46, 2, 320-332, (1999). 6. M. Kozak and M. Karaman, IEEE Trans. UFFC, 48, 4, 922–930, (2001). 7. B. Reeder, Analog Dialogue, 41-07, 2007 8. M. O’Donnell and M. G. Magrane, US. Patent No. 4.809.184, (28 Feb. 1989). 9. T. K. Song and J. F. Greenleaf, IEEE Trans. UFFC, 41, 3, 326-332, (1994). 10. C. Fritsch, M. Parrilla, A. Ibáñez, R. Giacchetta and O. Martínez, IEEE Trans. UFFC, 53, 10, 1820-1831, (2006) 11. J. Camacho, A. Ibáñez, M. Parrilla and C. Fritsch, Proc. IEEE Ultrason. Symp, 1631-1634, (2006). 12. T. I. Laakso, V. Valimaki, M. Karjalainen and U.K.Laine, IEEE Sig. Proc. Magazine, 30-58, Jan. 1996. 13. J. Camacho, M. Parrilla and C. Fritsch, ICU’2007, #1229, Vienna, April 9 - 13, (2007). 14. M. Karaman, A. Atalar and H. Köymen, IEEE Trans. Medical Imaging, 12, 4, 711-720, (1993). 15. Xilinx Inc., “XtremeDSP for Virtex-4 FPGAs”, UG 073, 2008.