Texas Instruments
SemiconductorsDSP SolutionsSearchFeedBackTI Home
Product InformationIn the NewsToolsLiteratureSupport

Digital Signal Processing Solutions - TMS320C67x
Blue Band
 
TMS320C67x

Executive Summary
Device Features
'C67x Architecture
Device Specifications
Development Tools
Technical Documentation
TMS320C67x Third Parties
TMS320C67x Training
  Schedules for Europe
  and North America

TMS320C67x Assembly Benchmarks
TMS320C6000 Home

 


TMS320C67x Single Precision Floating Point Assembly Benchmarks 

 
 

FILTERS

Benchmark Description Formula
Block FIR The FIR assumes that the number of filter coefficients (numH) is a multiple of 2 and greater than or equal to 4 and the number of outputs (numY) is a multiple of 4 and greater than or equal to 4.  The input, output, and coefficient arrays must start on the same double-word boundary to avoid memory bank hits.  ((2*numH)+10)*(numY/4)+8 
For numH=64 and numY=64 
2216 cycles or 13.296 µsec
Block IIR The IIR assumes that the order is a multiple of 2 and greater than or equal to 4, and the number of outputs (numY) is a multiple of 2 and greater than or equal to order+2.  To avoid bank hits, the input and output arrays must be aligned on opposite double-word boundaries, and the a and b coefficient arrays must be aligned on opposite double-word boundaries. (order+10)*(numY-order)+15 
For order=16 and numY=64 
1263 cycles or 7.578 µsec 
Cascaded IIR Biquads The Biquad assumes that the number of biquads (numB) is a multiple of 2 and greater than or equal to 2, and it processes one input and produces one output.  There are no memory bank hits regardless of where the arguements are placed in memory. 4*(numB)+29 
For numB=8 
61 cycles or 366 nsec
Convolution The convolution assumes that the output array length (nr) is a multiple of 4 and greater than or equal to 4, and the second input array length (nb) is a multiple of 2 and greater than or equal to 4.  The first input array length should be (nr+nb-1) where the first nb-1 and last nb-1 values are zero.  If all three arrays are aligned on the same double-word boundary and nb is not a multiple of 4 there will be no memory bank hits (if it is a multiple of 4 there will be nr/4 bank hits). (nb/2)*nr+(nr/2)*5+8 
For nb=8 and nr=20 
138 cycles or 828 nsec
Cross Correlation The Correlation assumes that the output array length (nr) is a multiple of 4 and greater than or equal to 4, and the second input array length (nb) is a multiple of 2 and greater than or equal to 4.  The first input array length should be (nr+nb-1) where the first nb-1 and last nb-1 values are zero.  If all three arrays are aligned on the same double-word boundary and nb is not a multiple of 4 there will be no memory bank hits (if it is a multiple of 4 there will be nr/4 bank hits). (nb/2)*nr+(nr/2)*5+8 
For nb=8 and nr=20 
138 cycles or 828 nsec
Autocorrelation Autocorrelation assumes that the correlation is length M, the output array is length M and the input array is length (M+N) where the first M values are zero.  The value of N should be a multiple of 2 and greater than or equal to 4.  The value of M should be a multiple of 4 and greater than or equal to 4.  To prevent memory bank hits, the input array should be alligned on an even double-word boundary (bank 0), and the output array should be aligned on the next word boundary (bank 2).  (N/2)*M+(M/2)*5+9 
For M=8 and N=18 
101 cycles or 606 nsec
Return to top
 

VECTOR

Benchmark Description Formula
dot product The fuction performs the dot product of two vectors of length N where N is a multiple of 2 and greater than or equal to 10.  No memory bank hits occur if the arrays are aligned on opposite double-word boundaries. N/2 + 24 
For N=100 
74 cycles or 444 nsec
Matrix-Vector Multiply (any size) The function performs the multiplication of a n x m matrix by a m x 1 vector.  The a and b arrays should be placed on opposite double-word boundaries to prevent memory bank hits. (n+20)*m+1 
For m=3 and n=3 
70 cycles or 420 nsec 
 
Matrix-Vector Multiply (with even number of columns) The function performs the multiplication of a n x m matrix by a m x 1 vector.  The column dimension (m) must be greater than or equal to 2 and a multiple of 2.  The a and b arrays should be placed on opposite double-word boundaries to prevent memory bank hits. ((n/2)+24)*m+2 
For m=3 and n=20 
104 cycles or 624 nsec 
 
Weighted vector sum The function performs an N element vector sum of two vectors with one vector weighted by a constant. The result is stored in a third vector.  The value of N must be a multiple of 2 and greater than or equal to 12.  To prevent bank hits, the two input vectors should be aligned on opposite double-word boundaries. N+12 
For N=100 
112 cycles or 672 nsec
Vector Sum The function calculates the sum of two vectors of length N where N is a multiple of 2 and greater than or equal to 6.  To avoid memory bank hits, the vectors should be aligned on opposite double-word boundaries. N+8 
For N=100 
108 cycles or 648 nsec
Sum of squares The function calculates the sum of the squares of the N elements of the vector.  The value N must be a multiple of 2 and greater than or equal to 12.  This function performs extraneous loads. N/2 + 24 
For N=100 
74 cycles or 444 nsec
Return to top
 

FFTs

Benchmark Description Formula
Complex Radix 4 FFT  The function calculates the complex Radix 4 DIF FFT of size N with digit-reversed output and normal order input. (log4(N))*(14*N/4+23)+20 
For N=1024 
18,055 cycles or 108.33 µsec 
Complex Radix 2 FFT The function calculates the complex Radix 2 DIT FFT of size N with bit-reversed output, and coeffients, and normal order input.  (log2(N))*(5*N/2+21)+7+
(N/4)*(log2(N)) 
For N=1024 
28,377 cycles or 170.26 µsec 
Return to top
 

SEARCH

Benchmark Description Formula
Vector Max The function finds the maximum value in a vector of length N where N is a multiple of 3 and greater than or equal to 12.  No memory bank hits occur regardless of where arguments are in memory. 2*N/3 + 9 
For N=102 
77 cycles or 462 nsec
Return to top
 

MATH

Benchmark Description Formula
Single Precision Floating Point Reciprocal The function performs the reciprocal using the RCPDP instruction and 3 iterations of the Newton-Rhapson algorithm. 28 cycles
Double Precision Floating Point Reciprocal The function performs the reciprocal using the RCPSP instruction and 2 iterations of the Newton-Rhapson algorithm. 84 cycles
Return to top
 

3D GRAPHICS AND IMAGING

Benchmark Description Formula
3D Geometry Transformation This function performs the "front end" of a 3D graphics transformation pipeline.  It performs geometry transformation, clipping preprocessing, perspective projection, and viewpoint mapping. Approx 10.4M vertices/second
Collision Detection This function takes a vector of 3D points and translates them in one dimension.  The 1D distance from the translated point to the parameter "point" is calculated.  If the distance is less than the parameter "distance", a collision is detected and the address of point is returned.  There are no memory bank hits regardless of where the function parameters are placed in memory; but, the function performs extraneous loads. (N/2)*3+32 (worst case) 
For N=10,000 
15,032 cycles or 90.192 µsec
Return to top

SemiconductorsDSP SolutionsSearchFeedBackTI Home
© Copyright 1998 Texas Instruments Incorporated. All rights reserved.
Trademarks, Important Notice!