|
|
TMS320C67x Single Precision Floating Point Assembly Benchmarks |
FILTERS |
||
| Benchmark | Description | Formula |
| Block FIR | The FIR assumes that the number of filter coefficients (numH) is a multiple of 2 and greater than or equal to 4 and the number of outputs (numY) is a multiple of 4 and greater than or equal to 4. The input, output, and coefficient arrays must start on the same double-word boundary to avoid memory bank hits. | ((2*numH)+10)*(numY/4)+8
For numH=64 and numY=64 2216 cycles or 13.296 µsec |
| Block IIR | The IIR assumes that the order is a multiple of 2 and greater than or equal to 4, and the number of outputs (numY) is a multiple of 2 and greater than or equal to order+2. To avoid bank hits, the input and output arrays must be aligned on opposite double-word boundaries, and the a and b coefficient arrays must be aligned on opposite double-word boundaries. | (order+10)*(numY-order)+15
For order=16 and numY=64 1263 cycles or 7.578 µsec |
| Cascaded IIR Biquads | The Biquad assumes that the number of biquads (numB) is a multiple of 2 and greater than or equal to 2, and it processes one input and produces one output. There are no memory bank hits regardless of where the arguements are placed in memory. | 4*(numB)+29
For numB=8 61 cycles or 366 nsec |
| Convolution | The convolution assumes that the output array length (nr) is a multiple of 4 and greater than or equal to 4, and the second input array length (nb) is a multiple of 2 and greater than or equal to 4. The first input array length should be (nr+nb-1) where the first nb-1 and last nb-1 values are zero. If all three arrays are aligned on the same double-word boundary and nb is not a multiple of 4 there will be no memory bank hits (if it is a multiple of 4 there will be nr/4 bank hits). | (nb/2)*nr+(nr/2)*5+8
For nb=8 and nr=20 138 cycles or 828 nsec |
| Cross Correlation | The Correlation assumes that the output array length (nr) is a multiple of 4 and greater than or equal to 4, and the second input array length (nb) is a multiple of 2 and greater than or equal to 4. The first input array length should be (nr+nb-1) where the first nb-1 and last nb-1 values are zero. If all three arrays are aligned on the same double-word boundary and nb is not a multiple of 4 there will be no memory bank hits (if it is a multiple of 4 there will be nr/4 bank hits). | (nb/2)*nr+(nr/2)*5+8
For nb=8 and nr=20 138 cycles or 828 nsec |
| Autocorrelation | Autocorrelation assumes that the correlation is length M, the output array is length M and the input array is length (M+N) where the first M values are zero. The value of N should be a multiple of 2 and greater than or equal to 4. The value of M should be a multiple of 4 and greater than or equal to 4. To prevent memory bank hits, the input array should be alligned on an even double-word boundary (bank 0), and the output array should be aligned on the next word boundary (bank 2). | (N/2)*M+(M/2)*5+9
For M=8 and N=18 101 cycles or 606 nsec |
| Return to top | ||
VECTOR |
||
| Benchmark | Description | Formula |
| dot product | The fuction performs the dot product of two vectors of length N where N is a multiple of 2 and greater than or equal to 10. No memory bank hits occur if the arrays are aligned on opposite double-word boundaries. | N/2 + 24
For N=100 74 cycles or 444 nsec |
| Matrix-Vector Multiply (any size) | The function performs the multiplication of a n x m matrix by a m x 1 vector. The a and b arrays should be placed on opposite double-word boundaries to prevent memory bank hits. | (n+20)*m+1
For m=3 and n=3 70 cycles or 420 nsec |
| Matrix-Vector Multiply (with even number of columns) | The function performs the multiplication of a n x m matrix by a m x 1 vector. The column dimension (m) must be greater than or equal to 2 and a multiple of 2. The a and b arrays should be placed on opposite double-word boundaries to prevent memory bank hits. | ((n/2)+24)*m+2
For m=3 and n=20 104 cycles or 624 nsec |
| Weighted vector sum | The function performs an N element vector sum of two vectors with one vector weighted by a constant. The result is stored in a third vector. The value of N must be a multiple of 2 and greater than or equal to 12. To prevent bank hits, the two input vectors should be aligned on opposite double-word boundaries. | N+12
For N=100 112 cycles or 672 nsec |
| Vector Sum | The function calculates the sum of two vectors of length N where N is a multiple of 2 and greater than or equal to 6. To avoid memory bank hits, the vectors should be aligned on opposite double-word boundaries. | N+8
For N=100 108 cycles or 648 nsec |
| Sum of squares | The function calculates the sum of the squares of the N elements of the vector. The value N must be a multiple of 2 and greater than or equal to 12. This function performs extraneous loads. | N/2 + 24
For N=100 74 cycles or 444 nsec |
| Return to top | ||
FFTs |
||
| Benchmark | Description | Formula |
| Complex Radix 4 FFT | The function calculates the complex Radix 4 DIF FFT of size N with digit-reversed output and normal order input. | (log4(N))*(14*N/4+23)+20
For N=1024 18,055 cycles or 108.33 µsec |
| Complex Radix 2 FFT | The function calculates the complex Radix 2 DIT FFT of size N with bit-reversed output, and coeffients, and normal order input. | (log2(N))*(5*N/2+21)+7+
(N/4)*(log2(N)) For N=1024 28,377 cycles or 170.26 µsec |
| Return to top | ||
SEARCH |
||
| Benchmark | Description | Formula |
| Vector Max | The function finds the maximum value in a vector of length N where N is a multiple of 3 and greater than or equal to 12. No memory bank hits occur regardless of where arguments are in memory. | 2*N/3 + 9
For N=102 77 cycles or 462 nsec |
| Return to top | ||
MATH |
||
| Benchmark | Description | Formula |
| Single Precision Floating Point Reciprocal | The function performs the reciprocal using the RCPDP instruction and 3 iterations of the Newton-Rhapson algorithm. | 28 cycles |
| Double Precision Floating Point Reciprocal | The function performs the reciprocal using the RCPSP instruction and 2 iterations of the Newton-Rhapson algorithm. | 84 cycles |
| Return to top | ||
3D GRAPHICS AND IMAGING |
||
| Benchmark | Description | Formula |
| 3D Geometry Transformation | This function performs the "front end" of a 3D graphics transformation pipeline. It performs geometry transformation, clipping preprocessing, perspective projection, and viewpoint mapping. | Approx 10.4M vertices/second |
| Collision Detection | This function takes a vector of 3D points and translates them in one dimension. The 1D distance from the translated point to the parameter "point" is calculated. If the distance is less than the parameter "distance", a collision is detected and the address of point is returned. There are no memory bank hits regardless of where the function parameters are placed in memory; but, the function performs extraneous loads. | (N/2)*3+32 (worst case)
For N=10,000 15,032 cycles or 90.192 µsec |
| Return to top | ||




