Texas Instruments
SemiconductorsDSP SolutionsSearchFeedBackTI Home
Product InformationIn the NewsToolsLiteratureSupport

Blue Band

 
TMS320C67x

Executive Summary
Device Features
'C67x Architecture
Device Specifications
Development Tools
Technical Documentation
TMS320C67x Third Parties
TMS320C67x Training
  Schedules for Europe
  and North America

TMS320C6000 Home

 

TMS320C62x Assembly Benchmarks


FILTERS

Benchmark Description Formula
FIR-coefficients a multiple of 4 This FIR assumes the number of filter coefficients is a multiple of 4 and the number of output samples is a multiple of 2. It operates on 16-bit data with a 32-bit accumulate. This routine has no memory hits regardless of where x, h, and y arrays are located in memory. The filter is M output samples and N coefficients. M*(N+8)/2 + 6
For N=32 and M=100
2006 cycles or 10.03 µsec
FIR-coefficients a multiple of 8 This FIR assumes the number of filter coeficients is a multiple of 8 and the number of output samples is a multiple of 2. It operates on 16-bit data with a 32-bit accumulate. This routine has no memory hits regardless of where x, h, and y arrays are located in memory. The filter is M output samples and N h coefficients. M*N/2 + 13
For N=32 and M=100
1613 cycles or 8.06 µsec
Complex FIR FIR operates on complex 16-bit data with a complex 32-bit accumulate. This routine has no memory hits regardless of where x, h, and y arrays are located in memory. The filter is M output samples and N coefficients. 2*M*N + 10
For M = 100 and N = 32:
6410 cycles or 32 µsec
LMS FIR - coefficients a multiple of 2 Least Mean Square Adaptive Filter. Computes an update of all N coeficients by adding the weighted error times the inputs to the original coefficients. This assumes single sample input followed by the last N-1 inputs and N coefficients. 1.5*N+16
For N=30
61 cycles or 305 nsec
LMS FIR - coefficients a multiple of 8 Least Mean Square Adaptive Filter. Computes an update of all N coeficients by adding the weighted error times the inputs to the original coefficients followed by an FIR with N coefficients and M output samples and an error calculation. This assumes that N is a multiple of 8. (N=number of data samples, multiple of 8 >=8) M*(9/8*N+15)+5
IIR filter Performs an Auto-regressive moving-average (ARMA) filter with 4 auto-regressive filter coefficients and 5 moving-average filter coefficients for M output samples. Output vector is stored to two locations. This routined is used as a high pass filter in the VSELP vocoder. (M*5 + 16)
For M = 160:
816 cycles or 4.08 µsec
FIR Circular Finite Impulse Response Filter. Uses circular addressing with initial index. Performs filtering 2 samples at a time.
(N=number of data samples, even >=2)
(M=number of filter coefficients, multiple of 4 >=4)
M*(N+11)/2+13
For N=32 and M=32
701 cycles or 3.505 µsec
Lattice Analysis Lattice Filter - Inverse - Analysis.
(N=number of coefficients)
1.5*N+10
For N=10
25 cycles or 125 nsec
Lattice Synthesis Lattice Filter - Forward - Synthesis.
(N=number of data samples, even >= 6)
2N+18
For N=10
38 cycles or 190 nsec
IIR with 4 biquads cascaded Infinite Impulse Response Filter. Direct Form II - 4 Multiplies. Processes 2 samples at a time. (N=number of cascaded biquads) 4N+16
For N=10
56 cycles or 280 nsec
Autocorrelation Performs autocorrelation of a 16-bit vector. Nested loop with M inner loop multiply accumulates and outer loops. (N/2) *M + 16 + M/4
For N=10 and M=160;
816 cycles or 4.08 µsec
Return to top

VECTOR

Benchmark Description Formula
dot product Dot product of two vectors of length N N/2 + 8
For N = 100
58 cycles or 290 nsec
Weighted vector sum Performs an N element vector sum of two vectors with one vector weighted by constant. The result is stored in a third vector. N+10
For N = 40:
49 cycles or 245 nsec
Vector dot product and square Performs an N element dot product and each of the N elements of one of the vectors is squared and accumulated. This is used to compute G in the VSELP coder. N + 8
For N = 40:
48 cycles or 240 nsec
Block move Move N 16-bit elements from one memory location to another. N/2 + 5
For N = 40:
25 cycles or 125 nsec
Sum of squares Each of N elements in a vector is squared and accumulated. This particular loop is used to compute Gl in the VSELP vocoder codebook search. (N-1)/2 + 9
For N = 21:
19 cycles
Return to top

FFTs

Benchmark Description Formula
Complex Radix 4 FFT Complex Radix 4 FFT of size N Log(base4)N * (10 * N/4 + 33) + 7 + N/4
For N = 1024:
13228 cycles or 66 µsec
Complex Radix 2 FFT Complex Radix 2 FFT of size N Log(base2)N * (4 * N/2 + 7) + 9 + N/4
For N = 1024:
20815 cycles or 104 µsec
Return to top

SEARCH

Benchmark Description Formula
Minimum energy error search Performs a dot product on 256 pairs of 9 element vectors and searches for the pair of vectors which produces the maximum dot product result. This is a large part of the VSELP vocoder codebook search. (256/2)*9 + 14
1166 cycles or 5.83 µsec
Vector Max Finds the maximum value in a vector of length N. N/2 + 13
For N = 100:
64 cycles or 320 nsec
Vector Max Index Finds the maximum value in a vector of length N and stores the index of that location. 2N/3 + 12
For N = 100:
79 cycles or 395 nsec
codebook search for VSELP Performs VSELP vocoder codebook search. The C source code for this was written by Motorola Systems Research Laboratories and is authorized by Motorola for the use of development of North American digital cellular standards. As such, the C code cannot be shown here. This routine performs the entire v_srch.c function as written by Motorola. It involves calculating correlations between weighted basis vectors and weighted speech vector (Rm's), C0, and 0.25 * sum of Djj for G0. It then calculates all Dmj and finishes calculating G0. It then initializes the best vector to be code vector zero and performs search by finding the vector that produces the highest C^2/G value.
Loop1  Loop2  Loop3 
342 + 639 + 2087 = 3068cycles
Return to top

MATH

Benchmark Description Formula
ADD40 Adds two 40-bit values to produce a 40-bit result. This code sample is not a complete N/A
ADD64 Adds two 64-bit values to produce a 64-bit result. This code sample is not a complete function! N/A
SUB40 Subtracts one 40-bit value from another 40-bit value to produce a 40-bit result. This code sample is NOT a complete function! N/A
SUB64 Subtracts one 64-bit value from another 64-bit value to produce a 64-bit result. This code sample is NOT a complete function! N/A
DIVMOD32 This routine divides two 32 bit values and returns their quotient and remainder.  The inputs are 32-bit numbers, and the result is a 32-bit number. Cycles (Min execution 16 cycles, Max execution 41 cycles).This code sample is NOT a complete function! N/A
DIVMODU32 This routine divides two unsigned 32 bit values and returns their quotient and remainder.  The inputs are unsigned 32-bit numbers, and the result is a unsigned 32-bit number. Cycles (Min execution 18 cycles, Max execution 42 cycles)This code sample is NOT a complete function! N/A
MPY32 This routine takes two 32 bit integer values and calculates their product. The inputs are 32-bit integer, and the result is a 32-bit integer. Cycles (See routine)  put the note. This code sample is NOT a complete function! N/A
MPY3240 This routine takes two 32 bit integer values and calculates their product.  The inputs are 32-bit integer, and the result is a 40-bit integer. Cycles (See routine)This code sample is NOT a complete function! N/A
MPYU3240 This routine takes two 32 bit unsigned integer values and calculates their product.  The inputs are 32-bit unsigned integer, and the result is a 40-bit unsigned integer.Cycles (See routine)This code sample is NOT a complete function! N/A
MPY40 This routine takes two 40 bit integer values and calculates their product.  The inputs are 40-bit integer, and the result is a 40-bit integer. Cycles (See routine)This code sample is NOT a complete function! N/A
MPY3264 This routine takes two 32 bit integer values and calculates
their product. The inputs are 32-bit integer, and the result is a 64-bit
integer.
Cycles (See routine)
N/A
MPYU3264 This routine takes two 32 bit unsigned integer values and
calculates their product. The inputs are 32-bit unsigned integers, and
the result is a 64-bit unsigned integer.
Cycles (See routine) 
N/A
Return to top

GRAPHICS

Benchmark Description Formula
IDCT Inverse Discrete Cosine Transform. 2D, 8x8, 16-bit input, no rounding (Source-Chen IDCT) For 1 8x8 block of 16-bit values
230 cycles or 1.150 µsec
DCT Discrete Cosine Transform. 2D, 8x8, 16-bit input, no rounding (Source-Independent JPEG Group, Thomas G. Lane) For 1 8x8 block of 16-bit values
226 cycles or 1.130 µsec
Gouraud Gouraud Shading of a scanline of pixels. Four pixels of a line at a time are processed. (N=pixels >=4, multiple of 4 pixels) 2N+7
For 1024 pixels taken 4 pixels at a time
2055 cycles or 10.275 µsec
Return to top

TELECOM

Benchmark Description Formula
Viterbi Equalization Viterbi Equalizer - GSM (N=number of data points) 43N + 2
For N=120
5162 cycles or 25.810 µsec
Viterbi GSM Viterbi Channel Decoder (GSM) (N=number of data points) 38N + 12 + N/4
For N=189
7242 cycles or 36.21 µsec
Viterbi IS54 Viterbi Channel Decoder (IS54)
(N=number of data points)
66.5*N+16
For N=189
5934 cycles or 29.67µsec
Viterbi V.32 Viterbi V.32 PSTN Trellis Decoder. (N=number of data points) 64 cycles or 320nsec
Return to top

Bit Reversal

Benchmark Description CYCLES
Linear Time
Lookup Table
The Bit-Reverse routine performs the bit-reversal of length N on an array of 16-bit complex data length N. *7(n/4 + 2) + 14
For N = 1024 Cycle count - 1820 or 9.1µs
Lookup Table Size: 32 Halfwords (64 Bytes)
Return to top

SemiconductorsDSP SolutionsSearchFeedBackTI Home

© Copyright 1998 Texas Instruments Incorporated. All rights reserved.
Trademarks, Important Notice!