FILTERS |
||
| Benchmark | Description | Formula |
| FIR-coefficients a multiple of 4 | This FIR assumes the number of filter coefficients is a multiple of 4 and the number of output samples is a multiple of 2. It operates on 16-bit data with a 32-bit accumulate. This routine has no memory hits regardless of where x, h, and y arrays are located in memory. The filter is M output samples and N coefficients. | M*(N+8)/2 + 6 For N=32 and M=100 2006 cycles or 10.03 µsec |
| FIR-coefficients a multiple of 8 | This FIR assumes the number of filter coeficients is a multiple of 8 and the number of output samples is a multiple of 2. It operates on 16-bit data with a 32-bit accumulate. This routine has no memory hits regardless of where x, h, and y arrays are located in memory. The filter is M output samples and N h coefficients. | M*N/2 + 13 For N=32 and M=100 1613 cycles or 8.06 µsec |
| Complex FIR | FIR operates on complex 16-bit data with a complex 32-bit accumulate. This routine has no memory hits regardless of where x, h, and y arrays are located in memory. The filter is M output samples and N coefficients. | 2*M*N + 10 For M = 100 and N = 32: 6410 cycles or 32 µsec |
| LMS FIR - coefficients a multiple of 2 | Least Mean Square Adaptive Filter. Computes an update of all N coeficients by adding the weighted error times the inputs to the original coefficients. This assumes single sample input followed by the last N-1 inputs and N coefficients. | 1.5*N+16 For N=30 61 cycles or 305 nsec |
| LMS FIR - coefficients a multiple of 8 | Least Mean Square Adaptive Filter. Computes an update of all N coeficients by adding the weighted error times the inputs to the original coefficients followed by an FIR with N coefficients and M output samples and an error calculation. This assumes that N is a multiple of 8. (N=number of data samples, multiple of 8 >=8) | M*(9/8*N+15)+5 |
| IIR filter | Performs an Auto-regressive moving-average (ARMA) filter with 4 auto-regressive filter coefficients and 5 moving-average filter coefficients for M output samples. Output vector is stored to two locations. This routined is used as a high pass filter in the VSELP vocoder. | (M*5 + 16) For M = 160: 816 cycles or 4.08 µsec |
| FIR Circular | Finite
Impulse Response Filter. Uses circular addressing with
initial index. Performs filtering 2 samples at a time. (N=number of data samples, even >=2) (M=number of filter coefficients, multiple of 4 >=4) |
M*(N+11)/2+13 For N=32 and M=32 701 cycles or 3.505 µsec |
| Lattice Analysis | Lattice
Filter - Inverse - Analysis. (N=number of coefficients) |
1.5*N+10 For N=10 25 cycles or 125 nsec |
| Lattice Synthesis | Lattice
Filter - Forward - Synthesis. (N=number of data samples, even >= 6) |
2N+18 For N=10 38 cycles or 190 nsec |
| IIR with 4 biquads cascaded | Infinite Impulse Response Filter. Direct Form II - 4 Multiplies. Processes 2 samples at a time. (N=number of cascaded biquads) | 4N+16 For N=10 56 cycles or 280 nsec |
| Autocorrelation | Performs autocorrelation of a 16-bit vector. Nested loop with M inner loop multiply accumulates and outer loops. | (N/2) *M + 16
+ M/4 For N=10 and M=160; 816 cycles or 4.08 µsec |
| Return to top | ||
VECTOR |
||
| Benchmark | Description | Formula |
| dot product | Dot product of two vectors of length N | N/2 + 8 For N = 100 58 cycles or 290 nsec |
| Weighted vector sum | Performs an N element vector sum of two vectors with one vector weighted by constant. The result is stored in a third vector. | N+10 For N = 40: 49 cycles or 245 nsec |
| Vector dot product and square | Performs an N element dot product and each of the N elements of one of the vectors is squared and accumulated. This is used to compute G in the VSELP coder. | N + 8 For N = 40: 48 cycles or 240 nsec |
| Block move | Move N 16-bit elements from one memory location to another. | N/2 + 5 For N = 40: 25 cycles or 125 nsec |
| Sum of squares | Each of N elements in a vector is squared and accumulated. This particular loop is used to compute Gl in the VSELP vocoder codebook search. | (N-1)/2 + 9 For N = 21: 19 cycles |
| Return to top | ||
FFTs |
||
| Benchmark | Description | Formula |
| Complex Radix 4 FFT | Complex Radix 4 FFT of size N | Log(base4)N *
(10 * N/4 + 33) + 7 + N/4 For N = 1024: 13228 cycles or 66 µsec |
| Complex Radix 2 FFT | Complex Radix 2 FFT of size N | Log(base2)N *
(4 * N/2 + 7) + 9 + N/4 For N = 1024: 20815 cycles or 104 µsec |
| Return to top | ||
SEARCH |
||
| Benchmark | Description | Formula |
| Minimum energy error search | Performs a dot product on 256 pairs of 9 element vectors and searches for the pair of vectors which produces the maximum dot product result. This is a large part of the VSELP vocoder codebook search. | (256/2)*9 +
14 1166 cycles or 5.83 µsec |
| Vector Max | Finds the maximum value in a vector of length N. | N/2 + 13 For N = 100: 64 cycles or 320 nsec |
| Vector Max Index | Finds the maximum value in a vector of length N and stores the index of that location. | 2N/3 + 12 For N = 100: 79 cycles or 395 nsec |
| codebook search for VSELP | Performs VSELP vocoder codebook search. The C source code for this was written by Motorola Systems Research Laboratories and is authorized by Motorola for the use of development of North American digital cellular standards. As such, the C code cannot be shown here. This routine performs the entire v_srch.c function as written by Motorola. It involves calculating correlations between weighted basis vectors and weighted speech vector (Rm's), C0, and 0.25 * sum of Djj for G0. It then calculates all Dmj and finishes calculating G0. It then initializes the best vector to be code vector zero and performs search by finding the vector that produces the highest C^2/G value. | Loop1 Loop2 Loop3 342 + 639 + 2087 = 3068cycles |
| Return to top | ||
MATH |
||
| Benchmark | Description | Formula |
| ADD40 | Adds two 40-bit values to produce a 40-bit result. This code sample is not a complete | N/A |
| ADD64 | Adds two 64-bit values to produce a 64-bit result. This code sample is not a complete function! | N/A |
| SUB40 | Subtracts one 40-bit value from another 40-bit value to produce a 40-bit result. This code sample is NOT a complete function! | N/A |
| SUB64 | Subtracts one 64-bit value from another 64-bit value to produce a 64-bit result. This code sample is NOT a complete function! | N/A |
| DIVMOD32 | This routine divides two 32 bit values and returns their quotient and remainder. The inputs are 32-bit numbers, and the result is a 32-bit number. Cycles (Min execution 16 cycles, Max execution 41 cycles).This code sample is NOT a complete function! | N/A |
| DIVMODU32 | This routine divides two unsigned 32 bit values and returns their quotient and remainder. The inputs are unsigned 32-bit numbers, and the result is a unsigned 32-bit number. Cycles (Min execution 18 cycles, Max execution 42 cycles)This code sample is NOT a complete function! | N/A |
| MPY32 | This routine takes two 32 bit integer values and calculates their product. The inputs are 32-bit integer, and the result is a 32-bit integer. Cycles (See routine) put the note. This code sample is NOT a complete function! | N/A |
| MPY3240 | This routine takes two 32 bit integer values and calculates their product. The inputs are 32-bit integer, and the result is a 40-bit integer. Cycles (See routine)This code sample is NOT a complete function! | N/A |
| MPYU3240 | This routine takes two 32 bit unsigned integer values and calculates their product. The inputs are 32-bit unsigned integer, and the result is a 40-bit unsigned integer.Cycles (See routine)This code sample is NOT a complete function! | N/A |
| MPY40 | This routine takes two 40 bit integer values and calculates their product. The inputs are 40-bit integer, and the result is a 40-bit integer. Cycles (See routine)This code sample is NOT a complete function! | N/A |
| MPY3264 | This routine takes two 32 bit integer
values and calculates their product. The inputs are 32-bit integer, and the result is a 64-bit integer. Cycles (See routine) |
N/A |
| MPYU3264 | This routine takes two 32 bit unsigned
integer values and calculates their product. The inputs are 32-bit unsigned integers, and the result is a 64-bit unsigned integer. Cycles (See routine) |
N/A |
| Return to top | ||
GRAPHICS |
||
| Benchmark | Description | Formula |
| IDCT | Inverse Discrete Cosine Transform. 2D, 8x8, 16-bit input, no rounding (Source-Chen IDCT) | For 1 8x8
block of 16-bit values 230 cycles or 1.150 µsec |
| DCT | Discrete Cosine Transform. 2D, 8x8, 16-bit input, no rounding (Source-Independent JPEG Group, Thomas G. Lane) | For 1 8x8
block of 16-bit values 226 cycles or 1.130 µsec |
| Gouraud | Gouraud Shading of a scanline of pixels. Four pixels of a line at a time are processed. (N=pixels >=4, multiple of 4 pixels) | 2N+7 For 1024 pixels taken 4 pixels at a time 2055 cycles or 10.275 µsec |
| Return to top | ||
TELECOM |
||
| Benchmark | Description | Formula |
| Viterbi Equalization | Viterbi Equalizer - GSM (N=number of data points) | 43N + 2 For N=120 5162 cycles or 25.810 µsec |
| Viterbi GSM | Viterbi Channel Decoder (GSM) (N=number of data points) | 38N + 12 +
N/4 For N=189 7242 cycles or 36.21 µsec |
| Viterbi IS54 | Viterbi
Channel Decoder (IS54) (N=number of data points) |
66.5*N+16 For N=189 5934 cycles or 29.67µsec |
| Viterbi V.32 | Viterbi V.32 PSTN Trellis Decoder. (N=number of data points) | 64 cycles or 320nsec |
| Return to top | ||
Bit Reversal |
||
| Benchmark | Description | CYCLES |
| Linear
Time Lookup Table |
The Bit-Reverse routine performs the bit-reversal of length N on an array of 16-bit complex data length N. | *7(n/4 + 2) + 14 For N = 1024 Cycle count - 1820 or 9.1µs Lookup Table Size: 32 Halfwords (64 Bytes) |
| Return to top | ||