DSP engine drives time to market

Tensilica introduces the ConnX D2 16-bit dual-MAC digital signal processor (DSP) engine for its Xtensa LX dataplane processor cores for system-on-chip (SOC) designs. It is said to run virtually any C program unmodified resulting in faster time to market. With the large ITU software code base available, the ConnX D2 DSP engine is ideal for telecom infrastructure and VoIP (Voice over Internet Protocol) applications. With its small size (less than 70,000 gates), the customisable ConnX D2 DSP engine is also ideal for a wide variety of low-power portable consumer applications including mobile wireless devices, next-generation disk drives and data storage, home entertainment devices, and computer peripherals.

‘The ConnX D2 DSP engine is an important step for Tensilica into the broader communications DSP market,’ stated Steve Roddy, Tensilica’s vice president of marketing and business development. ‘Our customizable dataplane processors have long been used as DSPs in many applications, including our market-leading 24-bit HiFi audio engine DSPs. Now, with the ConnX D2 DSP engine joining the other members of the ConnX DSP family, we have communications DSPs at all major performance points, all with the full benefit of our extensive optimizing compiler technology.’

Advanced C Compiler – No Assembly Required

Tensilica’s proven Xtensa C/C++ compiler (XCC) produces optimized instruction streams for the ConnX D2 DSP engine directly from C code. Many other DSPs require extensive assembly language programming for maximum performance. For example, the ConnX D2 DSP engine running compiled C code requires 20 percent fewer cycles for a 256-point complex FFT (Fast Fourier Transform) algorithm than the dual-MAC TI C55x DSP running hand optimized assembly code.

Tensilica also tested the performance of the AMR-NB (Adaptive Multi Rate compression, narrow band) encoder and decoder algorithm, which required just 28.5 MHz on the ConnX D2 DSP engine when compiled from the original ITU reference code. This is about twice the performance of competitive licensable DSP cores using the pure reference ITU code.

Directly compiling C-code without the need for extensive iterations at the assembly code level lets the ConnX D2 DSP offer designers a shorter development cycle, which gets new products to market faster. And it lets designers use the large existing library of proven code immediately on the ConnX D2 DSP engine.

High-Performance Architecture

The ConnX D2 DSP engine option adds dual 16-bit MAC units and an 8-entry, 40-bit register file to the base architecture of the Xtensa LX DPU (dataplane processing unit). The ConnX D2 DSP engine utilizes two-way SIMD (Single Instruction, Multiple Data) instructions to provide high-performance on vectorizable C code.

The ConnX D2 DSP engine is also implemented with an improved form of VLIW (Very Long Instruction Word) instructions that delivers parallel performance without the code size bloat associated with most VLIW DSPs. This allows for parallelization of code across the two MACs/ALUs when vectorization is not feasible. This choice of vectorization or parallelization is used extensively by the compiler for fast performance on any algorithm.

Designed for DSP Acceleration

The ConnX D2 DSP engine supports a wide range of data types (e.g., 16-, 32-, and 40-bit integer and fixed point; 16-bit complex; 8- and 16-bit vector), seven addressing schemes, and data manipulation instructions including shifting, swapping, and logical operations to provide outstanding performance on DSP algorithms. For specific DSP algorithm acceleration, the ConnX D2 engine instructions include Add-Compare-Exchange (used with Viterbi), Add Modulo, Add Subtract, and Add Bit Reverse Base. Used in conjunction with a bit reversed addressing scheme, this instruction set delivers extremely efficient FFT implementations.

The ConnX D2 SIMD unit is supported by a comprehensive set of instructions for vector loads and stores that support multiple data widths and SIMD data register loading orders, which can be aligned or unaligned.

A Customizable DSP

If designers have specific optimizations in mind that are not included in the ConnX D2 DSP engine and Xtensa LX instruction sets, they can easily add multi-cycle execution units, registers, register files, and more using the automated Tensilica Instruction Extension (TIE) methodology.

Complete Tool Support

Every Xtensa LX DPU with (or without) the ConnX D2 DSP engine is automatically generated with a complete set of software development and modeling tools matched to the exact DPU configuration. Designers use Tensilica’s Xtensa Xplorer Eclipse-based GUI (graphical user interface) as the cockpit for the entire design experience. From Xtensa Xplorer, designers can profile their application code and make the changes in the processor necessary to speed up that code. Designers can also pick from options for processor interfaces, memories, operating systems support, EDA scripts, debug and trace, and more.
Tensilica also provides a comprehensive collection of code generation and analysis tools that speed the software application development process.

Performance and Power Consumption

When optimized for high frequency operation, an Xtensa processor with the ConnX D2 DSP engine delivers clock speeds up to 600MHz in 65nm GP. When optimized for low-area in cost sensitive applications, a fully configured Xtensa LX with ConnX D2 engine can occupy as little as 0.18mm2 (fully routed) in 65GP process technologies.

The ConnX D2 DSP option is very power efficient. Core power consumption will of course vary with the SOC designer’s choice of process technology and synthesis optimization targets. One example data point: a fully configured Xtensa LX core with the ConnX D2 DSP engine consumes only 52uW/MHz in 65 GP process technology (measured running an AMR-NB (VAD2) algorithm).


The ConnX D2 DSP option for the Xtensa LX processor will be available in October 2009