AakccHH $d 6HHHHff@
d/ Footnote
TableFootnote**.\t.\t/ - :;,.!?t7
_Ema%
`IldelvTOC-Heading1Heading2#/(#(%)e())|/#:+ ATMBaseduAcoustAjithaAnilApplicationfocusedAraiAravindArgiro
AsianPacificAstaireAveBaiBalducciBellCore
BellCoresBiphonesBlahutBoggess BracewellBurrusC(s)CDROMsCDsCPUsCarnegieCepstral ChellappaChenChifongChoudaryChristensen ClarendonClarksonComputerAidedContextSensitive
Couvillion CouvillonD(s)DATLink DATbasedDATsDCTsDFD(iDFTsDSTsDeNiroDeshmukhDetDimitris
DiscreteTime0DoD
DoddingtonDossDotyDrsDuhamelE(eEbelEducEndpointExabyte$F(s)FD(iFFTsFFTsFax6Feiner
FileserverFlanganFlanneryFollettForwardbackwardFourierG(s)Ganapath
GanapathirajuGaussianGbyteGbytesGingellGoddard
Goertzels
GoodFellasGrawGroppGuassianGuoH(s)H(w)HMMsHalpinHamakerHanshengHaouiHarcourtHedberg Hemphill
HighSpeedHollimanISIPsIntel
InterfieldInternetItahashi JCSDlikeeJainJrKaldindi Kalidindi|KaoKarrelsaKhorosKiuKobeKondoKorgsKornmaneKubicaLDCsnLameliLeitchLettLexmarkaLiangvLowerreLundeiLuskM(s) Macmillan ManolakislMarkovMbpsMbytesMcMcCain McDonnellRMcGrawHillPMellon
Messerschmitt MingMingHoeMingsian
MonophonesMoorhead MoshtaqueoMultifrequency
Multilanguageo MuthusamytMxM
NBestNISTsNSFsNeerajNewswireNicolasNirmalasNissanNlog NonLaboreNostrandO(NlogO(s)OhlsonOpenvocabularyo OppenheimtOReillyPCbasedParallelizationPassinPelnPentiumxPhiladephiaPhotoCDsPiconePrenticePretreatmentProakisaProcPsQuachitaRAsoROMsRabinerRamaRasureRecaRollaa RorabaughRuckmannSBus
SPallen@eeSPwang@borgdSPwillia@eelSandedSankenSaxenaSchalkScorceseScorsese
SebastopolSeelamSellarsSessShiehuShuichiuSimrallrSittonSkjellumSobelSolarisSonySoutheastconSparcSparcstationSparcstationbasedSsStearnsnSubOptimal|Summer,FalllSuperMSPARCiSympT(s)
TannenbaumTelnTelecomi TeukolskyThomTokhih TownshendmTrimblen TriphonesTsukubaTukey(
UltraSparc
UnixBasedUnixformatted
VetterlingVickery ViterbiR WaldorfslWang Watkinsonr
WaveguidesWheatleyWheatlyiWilponWinshipoWintz WnnaWoodburyX(n)X(w)XiaoYamahasYaqinmYaquinYounanYuanpingYusufZhangZhengeacademiaaccm
adaptivelyaeakaoanalyse
antialiasinggapproxautocorrelationbbandpassbenchmarkedebenchmarkingbigramslbodypart camcordern caregivercase:obj case:subjscjonescompcomputationallytconfigurablecorpusrelatedcosteffective
coursewared(n)dBdBm
decorrelationgdefudeliverablesdeptdhdifferencingdigitizationdiscontinuities
downsampledownsampleddownsamplingdpile(i)e(n)emailenco
endpointerendpointingr endpointskenergybased ethernetsexrf(ief(i+df(x,y)
formbased fourstepifreeware frontendnftpftp:fullbandwidthf(igenRgprofp
granularitiesah(t)halfsample hardcopyheadtoheadhenders@isiphighqualityhiraganai,jinhouseindustryaccepteds
interdigitinterprocessj+djvickery
kaldindi@isiprkanjialandformleadingedge
liveinput log(error
longdistanceo longtermnlosslessmetricsmicmicsminiconferencemm
mulebased
multipassmultichannel
multicomputeramultidisciplinary
narrowbandnoduplicatenonJapanesenumuobjrobjectorientedsof(i+dongoingonlinegonsitelonesemesterparallelized
parallelizingjpartofspeech
periodicitiespperlphoneticallyricheplpp
printthrougheprioripromoeprototypingBr(i)redigitized realtimee
recognizerrecompilingfreeltoreel
resampling rescoringeruntimea sawtimbera
scanlinessecsselfmotivatedsemicontinuousoshsignal_detection
sinusoidal sinusoidsysituspeakerdependentspectrographspellingcorrectedstackeristateoftheartstepsizestudioqualitysubjsubnettapetotapetclltectiontelephonebandwidthatelephonebasedetemporallyconstrainedtentapethirdpartytimeconsuming timelinestimevaryingtoolboxe toolboxesvtoolkitatrigrams
twentytwotwostep
type:bodypartitype:caregiveruhuncompresseduncorrelatedundistortedeunterediutationutimen validatorm
validators voicebandsvoltvswaveform waveguidewheeler@isipworkstationbasedoy(iiy(i)y(n) ychen@ercuyrsr yusuf@ercite EquationVariablesticth9hqA(e7ngBNedN reO
reOre7Mf'Reel3Bing'Qnge'Ra 7Ma
Ps Pfm'Qse7uou7Tign7Uion7Udal7Tdsy7Vpea3den3rog(Slli(Sted3i6|>the7Vpsi8soq7Wubj7Wta7Xe7Xtio8uYhon8wYtha8zZne8|Zemp8[nst8[en8\rd8\ime1p dg 1r!ds1s"eing1u#ee 1v$fv1x%ftr3-%wen3.'wo40)e:b1@qty1Aqver5Lomp5Mnco5Nun6(Oe6P@ut6QBtim6RDato6SFato6THban6UJvs6VLm 6WN6Xsip aN32819: Body: Comparative Analysis of FFT Algorithms In Sequential and Parallel A15384: Body: Audible Frequency Detection and Classification Group aC33616: Body: An Algorithm To Determine The Scenic Quality Of Images Bj15952: Body: An Integrated Khoros and MPI System for the Development of Portable Parallel DSP Applications
N8s O-'QPBouQCign7R67UdBTsye8pea3f-3M}X(2jFMS>dN32819: Body: Comparative Analysis of FFT Algorithms In Sequential and Parallel72jI
NM>nN32819: Body: Comparative Analysis of FFT Algorithms In Sequential and Parallel1p2jK\OT>N32819: Body: Comparative Analysis of FFT Algorithms In Sequential and Parallel12jMPU>rA15384: Body: Audible Frequency Detection and Classification Groupt62jPQV>oN32819: Body: Comparative Analysis of FFT Algorithms In Sequential and Parallelod2jRcRW>yA15384: Body: Audible Frequency Detection and Classification GroupB A2jTSX>tA15384: Body: Audible Frequency Detection and Classification Grouph D2jWTY> C33616: Body: An Algorithm To Determine The Scenic Quality Of Images 2jYaUZ>eC33616: Body: An Algorithm To Determine The Scenic Quality Of ImagesO2j[V[>A15384: Body: Audible Frequency Detection and Classification Group2j^W\>C33616: Body: An Algorithm To Determine The Scenic Quality Of Images 2j`ieq>lC33616: Body: An Algorithm To Determine The Scenic Quality Of ImagesT-MH-<$lastpagenum>.<$monthname> <$daynum>, <$year>By:/"<$monthnum>/<$daynum>/<$shortyear>0;<$monthname> <$daynum>, <$year> <$hour>:<$minute00> <$ampm>dle1"<$monthnum>/<$daynum>/<$shortyear>2<$monthname> <$daynum>, <$year>By:3"<$monthnum>/<$daynum>/<$shortyear>4 <$fullfilename>od5
<$filename>yA61Department of Electrical and Computer Engineeringo7<$paratext[Heading1]>8
<$curpagenum>b9
<$marker1>:
<$marker2>; (Continued)Y<+ (Sheet <$tblsheetnum> of <$tblsheetcount>)T S=Heading & Page
<$paranum>Ya>Page
<$pagenum>?See Heading & Page%See <$paratext> on page<$pagenum>.j@ Table All7Table<$paranumonly>, <$paratext>, on page<$pagenum>cATable Number & PageW'Table<$paranumonly> on page<$pagenum>oetB
document_nameuDSP96maC
document_dateqDECEMBER2, 1996Disip_document_dateOctober26, 1996Eisip_organization_nameDigital Audio Processing GroupmoF
isip_figure_0 <$paranumonly>"Gisip_reference_0<$paranumonly>!++!"#A$Ae>B%QQ&'AyA()A*FLA
cu,A7eAnd)8?Au o9!DRccSCCTUVWXY!!Z##[--\//]@@^HH_JJ`LLaNNbPPcRRdTTeVVfXXgZZh\\i^^j``kbblddP7d)&ku%|3C||7%~H||7%9|!%%:}&&3&@37H3%B~%C~3&3P%G~%H~R333%L~3'|Z33\33'|'|'|3d3'||'M|333333|3%a~'|'~|''~'3|3|333|33}333'|~'/|'63 'z
~3~33
'N|'O~'P~3'R'S'T~'U3I.33\II.333'3`'3'33333|'3 |'M3!|33"|37#37$37%3&3'3$(|%a3)|'2*||'%+'%,'%.-'%".|3/30391|372383[16]84[17]8586879P89Q99R:8;8<~$=~8>8?8@8A~8B~8C8D8E8F~8G9SHI.8I8J8K8L8M`8N8O8P9TQ9UR9VS9WT3U3V3W3X|3Y|3Z|3[|3\|3]|3^3_.3`"3a3b3c93d3e3f]83g7]3h3i3j3k3l4m4n4oIII.$4p84q4r4s~4t4
uIV.4v84w4
x4y4z84{9X|89Y}89`~89[89\89]8$|9T%)9U1y9V6<^9W6=_36>`36?a36Ab36Bc36Cd36De36Ef36Fg36Gh36Hi36Ij36Jk36Kl32v|32|32|344343434V.j44VI.44W4X4Y44Z44[4\4]4^4_4`V44a44b4
4c44d44e44f9X4g9Y4h9`4i9[4j`9\4k9]4l$4m%)4n1y4o6<4p6=4q6>4r6?4s6A4t6B4u6C4v6D4w6E4x6F4zVII.4{H[1]4|[2]k4}[3]4~[4]4[5]|4[6]4[7]4[8]4 [9]j4
[10]4I[11]4W[12]4
[13]4[14]4[15]4
[18]4^[19]4[20]4
[21]4[22]4[23]4e[24]4[25]4Y[26]4[27]4[28]4l[29]4[30]4y[31]5~5555555555555}54~5]55555545]5555W55]454]5555^55]451]5555eI.2555II.57]5555III.]5y5~5555555555555}5555555555555W55IV.5555V.5^66]461]6666eI.6666 6!6"6#6$6%]56&56'56-Figure1: 6.6/61VI.621.632.643.654.665.676.687.698.6L6M6N6O6P6Q6R6S6T 6U
6V6W6X
6Y6Z6[6\6].56^66_66`66a66b66c66d66e66f66g66h6 6i6!6j6"6k 6#6l!6$6m"6%6n#6&6o$6'6p%6-6q&re6r'.6s(/6t)16u*6v+.636w,6x-3.6y.6z/66{061.6862638.6465666768696:6;6<6=6>6?6@6A6B6C6D6E6F6G6H6I6J6K6L6M6N6O6P6Q6R6S6T6U6V6W6X6Y6Z6[6\6]6^6_6`6a6b6c6d6e6f6g6h6i6j6k6l6m6n6o6p6q6r6s6t6u6v6w6x6y6z6{6|6}6~66666666666666666666s Table 1: 6sD6t6t7t7u7uu7uu7uu7uu7uu7 uu7
uu7uu7uu7
uu7uu7uu7uu7uu7u7u7u7u7u7u7u7u7u7u7u7u7u7 u7!u7"u7#u7$7%7&7'7(7>|=Pvdq=6dq>6QHld%?AtHldl@@8Hld%@7Hldu ? UT UTh7Balducci, Michael J.1
UR UThChoudary, Ajitha1
u+UP UThGray, David12
H' D!FHl!EE9H'!E@H 7HD UT UT( 9Comparative Analysis of FFT Algorithms In Sequential cUR UT@1and Parallel Form
+UP UT`, 3MichaelBalducci, AjithaChoudary, and JonHamaker
Paaalgorithm, Radix-2 algorithm, Radix-4 algorithm, and Split-Radix algorithm). All algorithms will LUT_be compared in sequential and parallel form based on computational time and complexity. This CrZStTtask will be accomplished by 1) implementing the FFT algorithms in sequential form, e hIma2)Implementing the FFT algorithms in parallel form, 3) Running a battery of performance tests on v[each of the implementations and 4) collecting statistical data by which we may compare the 'gbalgorithms in each of their forms. Using the data collected, we will determine which algorithm is `best suited for a given implementation and data set. The information we obtain will have future Thov_implications in the areas of image processing, speech processing, radar applications, and most rm,@sf"every facet of signal processing.
`ncINTRODUCTION
`m,Background
rit thUThe FFT is one of the important and most widely used Digital Signal Processing (DSP) n
ra^algorithms. FFT algorithms are efficient methods of calculating the DFT. The DFT converts the 1cinput signal from the discrete time domain, x(n), to the discrete frequency domain, X(w), and vice alg&l [versa. This is very useful in eliminating the unwanted noise signal from any communication nta4ctasignal (information bearing signal) being analyzed. This is possible, because once the signal is iB d`converted into the frequency domain the noise or unwanted signal frequencies can effectively be imPat_filtered. Then, by using the inverse FFT the communication signal can be converted back to the as ^g,`time domain. The FFT has many wide-ranging applications in nearly every signal processing field oclYincluding speech, image processing, communications, cellular phones, modems, and digital sz@ancontrol systems [7].
it ngcFor most applications computation time plays a significant role in the use of FFTs. The less time . Te ^spent computing means less utilization of resources; hence, more data can be processed in the d asame time. The computation time can be reduced using the symmetry, periodicity, etc. of the FFT. rVThe computation time can also be reduced using parallelism in FFTs. By using the FFT ceealgorithms in parallel, the data set can be separated into smaller blocks. The different blocks of fecbdata can then be processed at the same time on a single processor or by spreading the data across acbmultiple processors. There is, of course, a trade-off with this method: overhead of communication ss_between processes. To make efficient use of the parallel method the communication time must be one@itminimized [7].
@`emTheory of FFTs
8 moThe Fourier transform H(w) of a signal h(t) is given by the equation of Figure (1a), where h(t) is meF@ontthe time-domain signal, t is the time and w is the angular frequency. Using the Fourier Transform,
edHH'jIGothHHl{cHH*redH')J e Hce, the daFse UT UT`s.
TUR UThs 1
+UP UT` t
sheHH
in
edH UT UT`th9gO
c,ҵxx
(
(00
(K/
(K/ҵҵm-8p8p @t @t,Xxx`,Xxx`$H|<#$H|<#,Xt,Xt8pp8pp>|m->|m-BpBp@t@t:txx`:txx`&Lt&Lt0`p0`p$Hm-$Hm-8$HK"$HK"(P(P*TK"*TK".\Ǉ_.\Ǉ_0`Ü0`Ü2d<2d<x8xm>x8xmB4B4F0F0.X4.X4(|<#(|<#xéxévlvlpK/pK/f͘ҵf͘ҵrxrxvvlٰlٰf͘-
f͘-
hѠhѠz< z< ZhxSZhxSX`<X`<>|xS>|xS4h-
4h-
6l6l2dZ;2dZ;0`x0`x.\ҵ.\ҵ,Xé,Xé*T*T8pé8pé4h4h|!>|!B<7y9ɛ09͙٘09͙9͙:ܘ͙9BABBBBBCB@, , ++++ *@*@* **+&'@ '@@ ' @' @ /Fp ( Ïw=8ه;<DDX dٳdLɝ lٳpٙ 8ߙ@
lsXh@DDE nYL3ܙ (() oxw9̀ 0$($$DDD$B $ A@ $@B #D@ #h(((@ #P@ #h(((@@"$DDD@"x"@!!@ !x@ DDE@ ((*@@
@@@??((,0 <DD[ >? pa0 8 `0``0`@@
0`
@0
`
h@`p
@
0@
?@
@?@
@@
@@
@ 00
>
/ / /@/....... BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB B B B@B@B@B@AAAA@AA&
@
@
@
@
@
@
@
@
@
@
@
@
@
@
@@
@0
@=f<s
?@Y̖ff͙6gf$Ͱ
`@f33f͙6ff,0@?3f͙fcO0@٘ff03f͙fcL4@٘ff93f͙كfaL8@yُ;<<@ݓfٙ3ff`gff@f3``fff@?fϙ3>cfff@Yfٙ3fn`ff@fٙ3fv`fgf@;y3s<@@@ @ @@@@@> @@@n~@À@<@
@p8
@@0
@0
8qýy݀瞀&M2gd6gv33&͛6fl̃6ff3&7l
6ff|&6l6ff3s&7&nG6ff3&xwwwٝ&&&<&&&&&&&&&&&&&&&&&&&&&&&&&/BBBBBBBB B B B B@B@B@BBBBABBBBBBBBBBBBBBBBB B B B B@B@B@?@??>> >@>>>>> >@@>A>>BBBABBBBBBBBB P @@y0px0000009wgo0+0L36ɓ003330ٳ0033?0?0033p0009330ܹ0f~c
303ffl70ff`k
303ffl;0ffrs
c30ܙwݙ;v<7a>y=00x @@@@<>y0f0f?f|0
f
f
;vg` @@@@@ @999:::0:;<@< <<=>?@
=EndInset
6$:P>6$H
'lQ>H
lRR%H
'mR>H
QyUTUTh
ҀXDepartment of Electrical and Computer EngineeringDECEMBER2, 1996u
7NSx_=u
uuH$
T> H$
UTUTh-DSP96PAGE 20 OF 29dUViCCVWUCCHHWVUHHll7XUll7
dInset
UT UT`3YBlC UT UT h Q$2ZciUQ$c6 $[U
c6 $entgD
\UgD
7NSUTUT`2DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERINGd
]HH^]6HH
UT UT`q
M
7O_S`=q
M
qqqM
7P`_=qM
qqd'accHH'baHHOdBc `the time-domain signal is converted into the frequency domain. Likewise, the time-domain signal `is resolved into exponential components denoting amplitude as the frequency varies. The inverse $_fourier transform, which returns the frequency-domain signal back to the time-domain, is given 2DE2@ICby Figure (1b).
GI})9)g hV
WaThe Discrete Fourier transform (DFT) is the digital equivalent of the Fourier Transform. It is a W[bounded length sequence which is more practical than the infinite summation of the Fourier 7OW`Transform because the computations involved are significantly lesser and more manageable. For a M%Wgperiodic signal, the period of the signal is equal to the length of the sequence. The discrete Fourier 3@WZtransform is defined as shown in Figure (2a) and its inverse is shown in Figure (2b) [7].
M h_
i `s aComputation cost is a factor which greatly affects which DSP tool an engineer uses. In computing s,<3(`tiithe DFT directly, there are 4N2 multiplications and N(4N-1) additions, therefore the computation f=d3(` tjtime is of the order N2. For large N the computational costs are extremely high. The Fast Fourier Kd@`QTransforms (FFTs) have been developed to reduce these computational costs. [7]
aHH'caHHlIubb* thd'dOuum caHH'edny HH"ab""Wuna"`al Fast Fourier Transforms
th" bsc_FFTs use several different properties of DFTs to minimize the computations. The FFT serves to e i0b(2Yincrease the efficiency of the DFT by splitting the data set into smaller DFTs. The FFT h>bhi]algorithms utilize properties of DFT such as periodicity and the symmetry of cosine and sine NLbca\terms present to increase efficiency. For this project we will be using the following seven rdZb l{algorithms 1) Prime Factor algorithm (PFA) 2) Quick Fourier Transform algorithm (QFT) 3) ed hbmpgFast Hartley Transform 4) Decimation-In-Time-Frequency FFT algorithm (DITF) 5) Radix-2 v@bbMalgorithm 6) Radix-4 algorithm and 7) Split-Radix algorithm.
c`Prime Factor Algorithm is used when N can be factored into mutually prime factors. N should c_be able to be represented in the form of 2**p * 3**q *5**r * 7**s * 11**t * 13**u * 16**v etc. ertcni]where each coefficient is a prime number. If the data size is of that type, the Prime Factor ecthaAlgorithm is considerably faster than the others. Modules are written for 2,3,4,5,7 etc. and the sc a_data is transformed using these in nested loops. The computational complexity of PFA is of the or 30@cllorder O(Nlog2N) [8].
30 d laThe Quick Fourier Transform uses the symmetry of the cosine and sine terms in the DFT to Q30d3)edecrease the complexity of the DFT calculations. The data set is first broken into real cosine terms t30d\and imaginary sine terms and those into their even and odd parts. The length-(N+1) Discrete Ra30dVCosine Transform (DCT) terms and length-(N-1) Discrete Sine Transform (DST) terms are *30d s[recursively broken into a (N/2 +1) DCTs and (N/2-l) DSTs respectively. In this manner, the *s 830d16cnumber of additions and multiplications is dramatically decreased. Its developers claim that it is at F3030dctifaster by a factor of two over direct methods providing an order of complexity of O(Nlog2N). The ,Vf`d^developers also claim that it is the best algorithm for prime length data sets. The number of g3(dr fmultiplications involved in computation of the QFT are N2 and number of additions involved is x̰3(@dmeN2+4N [4].
sḭ eT gThe Hartley Transform is very similar to the FFT. The major difference is that the FHT reduces nto̰e tZthe computational cost of the FFTs kernel by eliminating the complex arithmetic. The FHT ̰eetareduces the number of calculations involved by performing real arithmetic instead of the complex T̰eacalculations performed by the FFT. For a given point, the FFT must perform 4 multiplications and ,̰e\two additions. For the same point, the FHT performs two additions while only performing one aḭe30]multiplication. The one draw back to the FHT is that additional computations are required to y̰30@eNbconvert the results to equivalent FFT results. The FHT has a complexity of O(Nlog2N) [6].
fer]The Decimation-In-Time-Frequency FFT Algorithm is a combination of the decimation-in- fs [time (DIT) and the decimation-in-frequency (DIF) algorithms. The DITF algorithm begins the Tr@fy `breakdown of the DFT using the DIT and switches at some intermediate point to the DIF. This is
utdqftg gre T H3R`@qghf lcH3R`@H RH RFootnotec Hq3@`@qhgifcaHq3@`@HzHzSingle Linet, H'qihkf,̰jjtiFootnotehe
ji il
onai
H3D`@qkilft FH3D`@HHDouble Lineo yHqlkofoqumnDouble Linexitmnlferionn-nmlcatfH
qolqfrhmppnsSingle LinepoeITsominHZqqorfq
TableFootnoteEGVR`@qrqfhEGVR`@EPwEPw
TableFootnoteods[Zl7tsZl7,ti UT UT`4The 1996 Mississippi State University Conference on
UR UT`
3`=tDigital Signal Processing
OUP UT`&le
n`UN UT`+
qUL UT`')What:EE 4773/6773 Project Presentations
eUJ UT`8Where:Simrall Auditorium, Mississippi State University
n-UH UT`()When:December2, 1996 1:00 to 4:00 PM
tUF UT`/
fUD UT`,
`0SUMMARY
UB UT 1pEThe Department of Electrical and Computer Engineering invites you to U@ UT1Fattend a miniconference on Digital Signal Processing, being given by U> UT1Astudents in EE6773Introduction to Digital Signal Processing. U< UT@1ЪPapers will be presented on:
F0U: UT`v7parallel implementations of fast Fourier transforms;
IU8 UT`<real-time audible frequency detection and classification;
bU6 UT`2 analysis of forestry images for scenic content.
{U4 UT *ipGStudents will present their semester-long projects at this conference. `=U2 UT*alFEach group will give a 12minute presentation, followed by 18 minutes U0 UT* 4Cof discussion. After the talks, each group will be available for a AudU. UT*piDliveinput realtime demonstration of their project. These projects 0 U, UT*UTFaccount for 50% of their course grade, so critical evaluations of the U* UT@*pprojects are welcome.HH'ud HHlcxee*nald'v1xxtrucHH'wvTHHf8be''`vxel' goubeffective because the bulk of computation time for the DIT is at the end of the algorithm and the gcbulk for the DIF is at the beginning. Thus, switching at some mid-point allows one to miss both of hei$gojSthe bulk regions of computation time. This gives a great decrease in the number of ese2@gbymultiplications. [5]
L`hssRadix-2 and Radix-4 algorithm
f i f_The Radix 2 and 4 algorithms can be implemented by either using decimation in time (Cooley and 0 tiUT]Tukey) or by decimation in frequency (Sande and Tukey). These algorithms use the data of the r3(3(i.uform of 2N. If the data is not of the form 2N then the zero padding is done. Using the fact that the fP3(iadata is of the form 2N, the summation is split into even and odd components and then the fPif8]computations are performed. By using these algorithms the number of computations are reduced afP30impegreatly compared to the Discrete Fourier Transform. The Radix-2 algorithm needs (N/2)log2(N) i30i. ccomplex multiplications and Nlog2N additions for both decimation in time and decimation in s o̰i. afrequency. The Radix-4 differs from Radix-2 in the way the samples are split. In case of Radix-4 ̰30ihthey are split into four. The Radix-4 algorithm requires 3N/8 log2N complex multiplications and in30imeb3N/2 log2N additions for both decimation in time and decimation in frequency. There is an 3i daincrease in number of additions but there is a decrease in number of multiplications compared to n3@iisthe Radix-2 [7].
c)3`j3(Split-Radix algorithm
C3 keThe split-radix algorithm uses makes use of both radix-2 and radix-4 in one algorithm. It also, like aTf83(ksilthe radix algorithms operates on data of the form 2N . The zero padding is used for the data not in rebf8krm^the required form. It uses radix-2 for the even samples of DFT and radix-4 for odd samples of pf8kNcDFT. It uses radix-4 for odd samples of DFT as these require multiplication by the twiddle factor. x-4~f8kx-^The radix-4 is used for the decomposition of odd samples to improve the efficiency. The split-f8k4 cradix algorithm is of the same order as radix-4. But it requires lesser than or equal computations 2f8@k b-as that of radix-4 depending on samples[7].
nf8`l3Parallel Systems
nf8 mf cConcurrent or parallel processing is a process in which a large task is divided into smaller tasks Raf8mgand the smaller tasks are distributed over the available processors, or large data set is divided into h rf8m idsmaller data sets and then processed on various processors. To distribute data or to gather results m f8merXafter or when computing, processors need to communicate. The communications between the adf8msaWprocessors can be provided by Message Passing Interface (MPI) [1]. MPI, as the acronym r of8masdsuggests, is a protocol with library calls that passes the data and instructions to the processors. th"f8m o]The user has to specify the MPI calls to be utilized with all the parameters. The advantages 0f8mx-\ofusing MPI over other message passing systemsare that it has the majority of the salient -4>f8mle\features ofother message passingsystems with a few exceptional features of its own. These roLf8@mssMfeatures include support for libraries and inter-groupcommunication [1][2].
nff8 ns ^The main issue thathas to be considered in writing code using MPI is the communications cost tf8nda`involved inmessage passing. Even though FFTalgorithms exhibit a high degree of parallelism, f8nr Tthey requirea lot of communications due to the high amount of data dependency. The f8HH'xvdy HHlu~ww*f8d'yi l{{heatll7'zy8ll7 h''9g{'`va
proposal for
`si
M$(agFComparative Analysis of FFT Algorithms In Sequential and Parallel 2@urForm
t@`g
tN`ep:submitted to fulfill the semester project requirement for
\`po
fj`nt
gx` [(EE 4773/6773: Digital Signal Processing
ue`ns
r` u
g`icSeptember 21, 1996
` i
e`ve
h`s submitted to:
`sm
`r Dr. Joseph Picone
` d
t`of2Department of Electrical and Computer Engineering
`y 413 Simrall, Hardy Rd.
l `Mississippi State University
.`
Box 9571
<`MS State, MS 39762
J`
X`
f`submitted by:
t`
`4Michael Balducci, Ajitha Choudary, Jonathan Hamaker
`Co
r`
FFParallel DSP Group
nti` NSF Engineering Research Center
`4The Institute for Signal and Information Processing
t `
`Mississippi State University
`67 Box 6176
`ue%Mississippi State, Mississippi 39762
` Tel: 601-325-2081
`Fax: 601-325-7692
`
h3email: {balducci, ajitha, hamaker}@erc.msstate.edu
smll7'{y
ll7lFIzz*ricd(#|y ~~HH($}|i
HH|Bo%%MS~62%n
Vcommunication overhead must be kept inview whenwriting the applications and must be nMiYminimized or overlapped with those of computations to receive the benefits that parallel
$n G]processing offers. Stability should also be considered when writing the applications. When t2n Iathecommunication costs exceed the computation costs the parallel implementation is no longer a @@n67.viable alternative to sequential code [1][2].
``o97PROJECT SUMMARY
p-2WThe purpose of this project is to make a performance comparison between FFT algorithms a, pte^implemented in sequence and those implemented in parallel. This project is being conducted in p`conjunction with the Parallel DSP Project at Mississippi State University. The two sides of the p`project are being coordinated by Aravind Ganapathiraju and Dr. Joseph Picone. The main tasks of @pun"this project are set forth below:
qTImplement the QFT, FHT, PFA, DITF, Radix-2, Radix-4, and Split-Radix FFT algorithms mpqe Xin a sequential form. The code for these algorithms will be written in C++ and compiled soqn Wunder Solaris 2.51 using the gnu compiler. A wrapper will be developed which will be a comq pYcommon interface to each of the algorithms. The wrapper will be capable of handling both i
qZreal and complex single-dimensional data. The majority of each sequential program will be q cTbased on public-domain code. Each public-domain program must be tested and modified th&@q p'before it can be put in parallel form.
in @ rYImplement the same algorithms in a parallel form. The high degree of parallelism in FFTs oNr`will be used to develop parallel forms of the sequential code. The parallel code will split the as\rVtasks into sub-tasks which will be executed on separate processors. The inter-process jr-2Pcommunication will be accomplished using MPI (Message Passing Interface.) It is foxrheTimportant to ensure that only the structure of the algorithm is changed and not the s @r calgorithm itself.
shi_Testing of sequential and parallel code. In order to test the code objectively, each algorithm l bsin^will be tested (in sequential and parallel form) with the same data sets. The tests will each sllUbe run on equivalent, unloaded Sparc stations. Each algorithm implementation will be bsieYanalyzed according to how fast it completes each data set and the number of arithmetic mslgZoperations. Testing will be done on the following architectures: Supremacist, 2 CPU Sparc @sloA20, Dual CPU Intel Pentium Pro, and a multiple CPU UltraSparc I.
i t^Analysis of test data. The test data will be compiled to determine the fastest algorithm, and @t-2Othe least complex algorithm among the sequential and parallel implementations.
fo(`uheEVALUATION
o eB ve ^In order to evaluate anything, a basis on which to compare must be chosen. We have elected to PvTeacontrast the performance of the various algorithms based on their execution time, complexity and ^@vwimemory usage.
seqx wel`Execution Time is a key factor in determining the efficiency of an algorithm. The execution loHH(&~|ntiHHlx}}*astd(goarslgHH(hoowHHre''Du'wan`time, however is greatly affected by several factors. Chief among these are the characteristics cwq]of the data (real/complex), the machine on which the code is executed, and the number of pari$wplaallel processors operating on the data. We will account for these factors by using the same data h2wic^sets for each algorithm implementation, and by running a large number of repetitions for each @wrifimplementation and data set. Execution time will be measured using the unix utility gprof for Nwn Vsystems on which it is available. For test run on the SuperMSPARC the SuperMSPARC Per\w[formance Monitoring tools at the MSU Engineering Research Center will be used, and for all j@w(g>other systems,.the unix utility utime() will be used.
x`Complexity is another key factor in choosing an algorithm. Two key indices of algorithm com'xan^plexity are number of multiplications and additions. We plan to measure these using functions xw^which take the place of the operators. These functions carry out the operations and increment @xwa counter.
oc yn dMemory usage is a factor which is hard to quantify but is crucial to the performance of an algoh yta_rithm. For instance, with infinite memory, one could just create a lookup table with all possiandyon\ble output values. However, memory is not infinite, thus it must be considered in algorithm onyabbcoding and development. We plan to quantify memory by overloading the C++ new() operator. yin]Much like the quantification of arithmetic operations we will implement a new version of the i@ye(Dnew() operator which increments a count each time it is called.
y . z a]These evaluations will be made on each implementation of each algorithm and will be analyzed t<@zdiaccording to the following:
e V {_Sequential-vs-Parallel - Each algorithm will first be tested to ensure that the data from and{Zthe parallel implementation is equivalent to the data from the sequential implementation. r{s ]Next, we will do simple statistical analysis to determine which implementation is faster for e{ j^each algorithm. We will determine the speedup using the average results of testing. We expect {te[the parallel implementation will be faster, but the communication overhead may prove otherpla@{rywise.
ZSequential-vs-Sequential & Parallel-vs-Parallel - Next we will compare the algorithms en_against each other. We expect that the order of algorithm speed in sequential form will not be me @the same as in parallel form.
n VOne of the systems to be used for testing is the SuperMSPARC. The SuperMSPARC is a 32-ngXprocessor mulitcomputer specially equipped to allow for the monitoring of parallel code tda]execution. The SuperMSPARCs performance monitoring tools allow for an parallel application een`developer to observe performance related events graphically, such as interprocess communication em( fband execution time, and link these events to particular locations of the users code. This better 6t Yenables to programmer to identify bottle necks and to examine the communication overhead rD@er!inherent to parallel processing.
` SeWThe SuperMSPARC uses a combination of custom designed hardware and software to collect nagcevent data. In order to collect data, the program must first be linked with the probe library at |saacompile time. This will allow for event data to be transmitted the Probe Acquisition Board(PAB). rHH(jHHl~*d(mMSRCHH(napaHHrper to o anllaThe formatting and transmitting of event data are the sole source of intrusiveness. The PAB is a elodsmall single-width card that sits on Suns SBus. The job of the PAB is to collect, time stamp (at a n$e cresolution of 100ns), and transmit the probes to a central recording site. The probe data can then Se2@us:be analyzed by other software tools developed at the ERC.
N \The data sets will each be comprised of varying data types. Both single-dimensional and two-p\ddimensional data will be tested. Each set will use four-byte float values of either real or complex tijddata. Each data set will vary in length and type. The data type variations will aid in pointing out xgalgorithms that may be more efficient for only certain data sets. By varying the size of the data set, a@Fthe limitations of the parallel implementations should also be shown.
`foPROJECT DEMONSTRATION
e ^Real-time analysis of this projects concepts will be made available through a graphical user Bubinterface (GUI). The GUI will be largely user configurable. The user will be able to decide which ro]algorithms are compared, which architecture each is run on, the number of CPUs each will use yolaand the type of data on which each will operate. The results of the real-time experiment will be tdi^available to the user in a series of frames. One frame will contain a real-time communication s Wsimulation which will represent communication between parallel processes and processor anyp[utilization. Another frame will give a real-time graphical representation of the number of ici(in_arithmetic operations and memory accesses that each algorithm performs. Yet another frame will par6on_contain a running timer on each algorithm chosen. In addition to giving the user the option of me Dro^algorithms, there will be an auto function which will determine the optimum algorithm for a Rl cgiven data set (chosen by the user). This GUI will be extremely useful not only as a visualization are`rcatool for novice users but also as an analytical tool for our group as we attempt to optimize the an@ilvarious algorithms.
oHH(pHHl*ame~ML-7Q`=~ML-ulaMQ-7R=wn MQ-s ad)zioeaHH)eofHHin!!and memoat!@Ye!MSU Argonne Joint Documentation.
$ a [Message Passing Interface Forum. MPI: A Message-Passing Interface Standard. Technical 2riVReport Computer Science Department Technical Report CS-94-230, University of @@l &Tennessee, Knoxville, TN, May 5 1994.
\ tr^Press, William H., Brian P. Flannery, Saul A. Teukolsky, William T. Vetterling. Numerical jtoZRecipes in C, The Art of Scientific Computing. Cambridge University Press, Cambridge, x@Mass., 1988. pp 407-447.
XH. Guo, G.A. Sitton, C.S. Burrus. The Quick Discrete Fourier Transform. ICASSP94 lacDigital Signal Processing. vol III. Institute for Electrical and Electronics Engineers. pp. @445-447, 1994.
L Saidi, Ali. Decimation-In-Time-Frequency FFT Algorithm. ICASSP94 !`Digital Signal Processing. vol III. Institute for Electrical and Electronics Engineers. sa@cepp. 453-456, 1994.
e-P taWImplementing 2-D and 3-D Discrete Hartley Transforms on a Massively Parallel SIMD al _Mesh Computer. Technical Report CS-TR-95-01, University of Central Florida, Orlando, ^Pr @riFL.
Fl< ukXProakis, John G. and Dimitris G. Manolakis. Digital Signal Processing: Principles, SJg.XAlgorithms, and Applications, Third Edition. Prentice Hall, Upper Saddle River, New
X@"Jersey, 1996. pp230-256, 394-494.
t e WTemperton, Clive. Nesting Strategies for Prime Factor FFT algorithms. Journal of vo@fo8Computational Physics, vol 82, Number 2, June 1989.
ZD. P. Kolba and T. W. Parks, A prime factor FFT algorithm using high speed convolution, @` Box 9571
JL`MS State, MS 39762
Z`
h`
v`submitted by:
`
`)David Gray, Craig McKnight, Stephen Wood
(`
n`
u`y
ehca9Audible Frequency Detection and Classification Group
`su2Department of Electrical and Computer Engineering
`Mississippi State University
s`s 216 Simrall, Hardy Rd.
ign`%Mississippi State, Mississippi 39762
`Tel: 601-325-2081
`Fax: 601-325-7692
,`(email: {gray,wcm1,srw1}@ece.msstate.edu
ll7*icll7l*of HML-7S=HML-ralMH-7T=sppMH-y
d*HH*HHdit%%
%`Kn ABSTRACT
e$ `Current music software relies on external input from MIDI capable devices. The purpose of this ue2Cl`project is to develop a software package for music education utilizing an acoustical instrument @isainterface so that players of all instruments can begin to utilize the computing power of todays sNssaworld. Musicians who play tones into a microphone will see those tones analyzed in the areas of \@l:relative and absolute pitch.
dx ]Tone recognition will be accomplished by transforming time domain data into frequency domain `data and sifting out the overtones of instruments played into a microphone. The transformation bwill likely be accomplished using a Fast Fourier Transform (FFT), spectral estimation techniques, dor Pronys Method. Once the note or notes are identified, the user will be given information about Kn@+the frequency content of the tones played.
n e` MINTRODUCTION
ofcThere has been an interest in using computers to aid in the creation of music for more than thirty an enhyears. Bell Telephone Laboratories developed a program called Music 4 as early as the 1960s. poZThrough various updates and revisions, this program eventually developed into what is now s gCSound. CSound allows the user to write programs that represent arrangements of music for comrm^different instruments that will be simulated by the computer. One of the new features of the ,s ^program allows the user to input information into the computer via a MIDI (Musical Instrument :@ri,Digital Interface) equipped instrument [1].
quV jOne area where the use of computers in music holds great promise is in education. Listen is an dceducation software package that teaches students relative pitch, harmony, intervals between notes, ofrn ^and chord structure in an interactive environment. For example, a student studying intervals Beawould be given an interval to play by the computer. The student would then play the interval on Th^the keyboard or other instrument and the computer would tell the student if the interval they nd`played was correct or if one of the notes was too high or too low. This package relies on MIDI om@rmDcodes being sent to the computer from a MIDI capable device [2].
f f YMIDI codes are a form of communication protocol decided upon by the music manufacturing anseindustry. The codes are transmitted serially at a 31.25 K bits/s data rate and contain information oaabout which key was played, when the event started, when the event stopped, what voice patch to @reGuse, and other aesthetics involved with the playing of the note [3].
no `In the case of a piano style keyboard being connected to a computer using MIDI, the information rvftraveling to the computer is generated digitally, transmitted digitally, and manipulated on a digital (Th`computer. The signal is never analog during the entire signal chain from the instrument to the 6plgcomputer. This is an obstacle for musical instruments in general since most are inherently analog. Din^MIDI controllers for instruments such as the guitar have been developed and in some cases the RcaYentire instruments themselves have been designed as the controllers, but an interface to e`@ttNaccommodate all instruments without changing hardware is not in wide use.
HH*t eHHl*HL-7U= voHL-yinHL-7V=InHL-anoQh-7W=eusQh-ormioQh~hʀh7X=redʀhmitd ʀhh2h7Y=hco2hal n2hHh; H7Z=e t; H;h; ;h?7[=sn ;h? i;h;; 7\=e f; ch t; M ;
7]=;
insum;HV7^=ainVVV 7_=we e.
l:7`=l:ll7a=lLllc7b=c-ccd,7XHH,td HH=@ɀ?display what notes are present, and the interval between them.
H$ cOf course, if there are no musicians present at the demonstration, the software can also be tested 2@Bin each of its modes using the arhcive that will have been built
N h.M
S ɀ[Simple layout of the finished product, including the main modules of the software and what a@functions they will perform.
}`
HH,HHl*d,BHH,CHHc`EVALUATION
( 7XbThe methods of testing the tuning capability and the interval detection capability of the project 6hwill be similar to one another. In both cases, evaluation will proceed in stages. For the tuner, the De,gfirst stage will be to input a file containing a digitized sinusoid consisting of only one frequency. @R@tAThe frequency will be a known value, but may be out of tune.
n cThe second stage will be to test the tuner with data files that contain signals that have multiple hat|[frequencies. These input signals will be data recorded from various instrument sources of finterest. When the tuner can accurately classify these multiple frequency signals, it will be tested _using real time input from instruments representing various instrument families. A commercial @Qtuner will be used to verify the results of the software package in real time.
e hegThe interval detection and display function of the project will be tested in a similar manner. First, ila Ibinput files containing intervals of pure sinusoids will be evaluated. Then data files containing npasignals recorded from various musical instruments will be evaluated Finally, real time interval @ll)detection and display will be evaluated.
HH,Ewl HHl*t hc7c=eiecigns ccdM
7d=dM
lltud,|fquesd HH,} omHHinilies. `r SCHEDULE
d( ul`Figure 2 indicates the schedule that we plan to observe in order to complete the project by the ay6@rodesignated date.
D hr. NO
a` I
nHH,sdsHHl6*RM>
7e=stRM>
ZZ
PCD to PPM^M4
7f= sp^M4
ff
conventionVAP
7g=VAP
__Interpretationb*R
7h=b*R
kkof imageQ2
7i=Q2
Q Q HistogramQ#&
7j=esQ#&
Q,Q,analysis QRM
7k=QRM
QZQZEdge-H$
$=lFiH$
lKK!r tQ^M+
7l=Q^M+
QfQf detectionQ#%
7m=Q#%
QQFourierQ.
7n=Q.
QQ transformtV*
7o=oPMV*
__Decisionspb
7p=nonb
kkboxQM
7q=nrpQM
QVQVQ;9
7r=Q;9
QDQDInput image=QG#
7s=QG#
QPQPin PCDQS
7t=QS
Q\Q\format 67u= 6HV 7v= 7~Qd*HH*7nHHN##Q#`PROJECT SUMMARY
PM(`Archive Development
siD bAn archive of time-domain data taken from various sources of interest will be key to developing a Rn`successful frequency detection and classification software package. An archive of sampled data `gfiles will be developed and used in this project for testing and evaluation procedures. It will be a n_collection of sampled data files from sources which will include a digital function generator, |dacoustic guitar, electric bass guitar, trumpet. Each file will focus on a particular aspect of the bproject. The files will range from containing single frequency data without instrument overtones @6to multiple frequency data with instrument overtones.
fFile names for the database will be standardized to the following format: