Multi-standard speech coding and its DSP implementation

In various communication devices, real-time voice compression is typically implemented on the DSP. A single encoding algorithm, due to the fixed rate and algorithm, the flexibility of the system is poor. More and more communication services require multiple, multiple encoding algorithms to provide a range of encoding rates and encoding algorithms, such as software radios, IP telephony, multimedia terminals, and the like.

G.729a is a high-quality medium-rate speech coding standard developed by the ITU with a coding rate of 8 kbps, which has been applied in many communication systems. The 16/32 kbps CVSD is a very good speech coding algorithm against channel errors, and has been widely used in military communication and aerospace communication. The 32 kbps ADPCM is a simple algorithm for waveform coding. It has good voice quality and anti-noise performance and is widely used in satellite communication and digital channel multiplication systems. The coding system combining these three algorithms has high flexibility at 8 kbps to 32 kbps.

This article refers to the address: http://

Since the calculation amount, storage capacity and accuracy requirements of speech compression are not too high, based on the price factor, the fixed-point DSP is sufficient for the requirements of speech codec. In this paper, TI's TMS320VC5409 fixed-point DSP is used to implement the above three speech codec algorithms. The implementation of the algorithm DSP passed the relevant tests. Among them, G.729a and ADPCM were tested by the test sequence provided by ITUT. CVSD was tested according to relevant standards in China.

After a brief introduction to the above three speech coding and TMS320VC5409, this paper introduces the software and hardware implementation of the algorithm, and gives the calculation amount and the hardware resources occupied by the algorithm.

1 DSP chip and speech coding algorithm

(1) Introduction to TMS320VC5409

TMS320VC5409 is a cost-effective fixed-point DSP chip produced by TI Company with an operation speed of 80MIPS/100MIPS. It has an improved Harvard architecture, a CPU, on-chip memory (32KB of ROM and 64KB of DARAM), on-chip peripherals, and a dedicated instruction structure. It has the following main advantages:

Â· 1 program bus and 3 data buses. With the dual-operand read capability of the storage area, single-cycle and three-operand instructions can be supported, which improves the operating efficiency and versatility of the program;

Advanced CPU hardware logic for application design improves chip performance;

Highly specific instruction structure provides faster algorithm implementation and more convenient optimization;

The on-chip peripherals include three McBSPs (multi-channel buffered serial ports), a six-channel DMA controller, an 8-bit HPI port, and a phase-locked loop clock generator.

Â· Modular structure facilitates rapid follow-up development;

Advanced IC processing technology enables high performance and low power consumption, and 5V static CMOS technology further reduces power consumption.

(2) G.729a algorithm

G.729 is the ITU standard for 8 kbps, using the "Conjugated Structure Algebraic Code Excited Linear Predictive Coding Scheme" (C-ASCELP) algorithm. This algorithm combines the advantages of waveform coding and parameter coding. Based on linear predictive coding techniques, it uses techniques such as vector quantization, analytical synthesis and sensory weighting. G.729a only reduces some computational complexity on the basis of G.729, maintains compatibility, and the quality is basically not reduced.

(3) 32 kbps ADPCM algorithm

The G726 is an ITU-defined adaptive differential pulse coding algorithm standard with four rates. In this project, a rate of 32 kbps is used. The ADPCM algorithm is a kind of waveform coding. It introduces the concept of prediction and difference based on PCM coding, and only encodes the difference between the actual value and the predicted value. In the encoding process, the current sample is predicted by the value of the past sample, and the prediction coefficient is adaptively adjusted so that the prediction error is small, thereby maintaining a high coding quality while reducing the code rate.

(4) CVSD (32kbps/16kbps) algorithm

Continuous variable slope delta modulation is a 1-bit differential waveform encoding. The adaptive magnitude varies with the statistical characteristics of the signal, and the maximum signal-to-noise ratio is obtained over a large dynamic range of the signal. And easy to implement, the circuit structure is simple.

The main technology: three consecutive 0 / triple 1 detection, that is, if there are three consecutive zeros or three consecutive ones in the code stream, it means that the signal is rising or falling, and the magnitude is adjusted to adapt to the signal change.

2 hardware system

(1) Introduction to the hardware board

At the origin, the analog signal is converted to an 8-bit A-law PCM signal by front-end processing circuitry and A/D sampling. The logarithmic PCM signal is converted into a linear code in the TMS320VC5409 and compression encoded. The output G.729a/ADPCM/CVSD code stream is transmitted on the channel.

The compressed code stream received by the receiving end is decoded into a logarithmic PCM signal in the DSP, and then subjected to D/A conversion and user circuit to finally obtain analog voice. The CPLD is used to generate a frame synchronization signal of 8 kHz, so that the hardware chips work together.

The A/D and D/A parts use a single MC14557 chip. The hardware system block diagram of the single signal is shown in Figure 1.

(2) Hardware selection of the algorithm

The program defines two flag variables, flag1, flag2. Using the maskable interrupts INT0~INT3 ^[1] provided by VC5409, two flag bits are set in the interrupt service routine to control the jump of the main program.

After the system is powered on, one of the pins INT0 to INT3 gives an interrupt request signal, and when the interrupt is detected during program execution, the encoding algorithm corresponding to the interrupt is executed. The main program then sets the IMR register to mask these interrupts until the next system reset. The INT0 interrupt is a jump without transcoding in the test, but is used in the application to select the 32 kbps CVSD algorithm. Table 1 shows the hardware interrupt and flag settings for the algorithm selection.

(3) Data stream input and output

The VC5409 provides three McBSP (Multichannel Buffered Serial Ports) ^[2] and integrates a hardware log PCM codec. Double buffer transmission and triple buffer reception of the serial port can ensure data continuity. The data stream length of the received and sent data can be 8, 12, 16, 20, 24, 32 bits, and each frame can have up to 128 words. Table 2 shows the serial port configuration used in this project.

For each algorithm, the 4-way codec requires full-duplex operation, so all three McBSPs are configured. Among them, McBSP0 is responsible for the transmission and reception of the PCM stream. The PCM stream is a 4-channel 8-bit A-law signal, so the defined word length is 8 bits; McBSP1 sends and receives the G.729 code stream. G.729 framing coding, frame length 10ms, 80bit per frame. In order to enable data to be sent and received conveniently and efficiently, the word length of the serial port is defined as 16 bits, so that every 5 frames are synchronized to receive a G.729 frame, which is 16Ã—5Ã—4 (road)=80Ã—4 bit.

In order to obtain the consistency of the data format, it is convenient for the serial port to transmit and receive the code stream, and the same code stream format is defined for ADPCM and CVSD, and is sent and received by McBSP2. as shown in picture 2.

The 32kbps ADPCM uses 4bit encoding for each sample, and specifies that its code stream is repeated twice for each codeword, that is, 8bit. The four channels have a total of 32 bits; the 16 kbps and 32 kbps CVSDs are 2 bits and 4 bits of code per bit, so the code stream is specified to be repeated 4 times and 2 times for each bit of the coded code word, that is, 8 bits are occupied. The 4-way signal is also 32bit.

(4) Data stream transmission (serial port and storage area)

The VC5409 provides six DMA ^[2] channels. The user can set the source address, destination address, amount of data transferred, synchronization events, and interrupt mode for each DMA channel.

Table 3 shows the configuration of each DMA channel in this project.

(5) Control of data transmission

As shown in Figure 3, the serial data stream is received on the DR pin of the McBSP, and the DX pin is sent. Data transmission and reception is triggered by a frame synchronization signal. Frame synchronization is provided by the CPLD and the bit clock is provided by an external crystal.

The data exchange between the serial port and the storage area is done by the CPU or DMA controller. When the receive register DRR is full (transmission register DXR is empty), the serial port issues a synchronization event (REVT/XEVT) to the DMA or an interrupt request (RINT/XINT) to the CPU to inform the DMA or CPU that the data transfer is ready.

For PCM and G.729 code streams, the data read and write of the serial port (McBSP0/McBSP1) is DMA mode.

Since G.729 uses framing coding, the amount of data to be processed in one codec is large. In order to avoid continuous stream overflow during DMA read data, the design buffer is doubled. These two buffers work in ping-pong mode, that is, when the DMA transfers one of the buffer data, the other buffer receives the next set of data from the serial port or the CPU. Since the VC5409's DMA support buffer is full or half full, the interrupt mode is generated. Therefore, as long as the two buffers are designed to be continuous, the ping-pong work can be conveniently performed without data overflow.

For the ADPCM/CVSD code stream, since the code stream length of each processing is short (32 bits), the CPU directly reads and writes the serial port (McBSP2) in the interrupt service routine instead of the DMA mode.

3 software system

(1) Correction of CVSD algorithm

CVSD is a 1-bit encoding method for each sample, so the 32 kbps and 16 kbps CVSD output signals are encoded by PCM signals sampled at 32 kHz and 16 kHz, respectively. The input of the actual CVSD encoder is always a sampling signal of 8 kHz. In order to meet the algorithm requirements, the input PCM code stream is interpolated and filtered.

According to the interpolation theorem, the low-pass filter can be selected to recover the original signal without distortion. In order to achieve a compromise between signal quality and computational complexity, a fifth-order elliptic IIR filter was designed.

1:2 interpolation (16kbps)

Molecular polynomial coefficients:

[1.02295e-01, 1.14533e-01, 2.41943e-01,

2.41943e-01, 1.45325e-01, 1.02295e-01]

Denominator polynomial coefficient:

[1.00000e+00, -1.26125e+00, 1.91846e+00, -1.21680e+00, 6.79321e+01, -1.54358e+01]

1:4 interpolation (32kbps)

Molecular polynomial coefficients:

[4.48200e+02, -6.9309e+02, 4.68041e+02,

[4.68041e+02, -6.9309e+02, 4.48200e+02]

Denominator polynomial coefficient:

[1.00000e+00, -3.56926e+00, 5.66631e+00, -4.83285e+00, 2.20789e+00, -4.2822e+01]

In addition, when performing spectrum test on the coded signal, it was found that the amplitude of the signal exceeded 3 db at 3 kHz. To this end, a spectrum adjustment module is added to the coding end to reduce the spectrum at 3 kHz by 3 db as compensation.

(2) Echo cancellation module of G.729 algorithm

The coding delay of the G.729 algorithm is 15ms, and the echo phenomenon is obvious. It is necessary to introduce an echo cancellation algorithm to suppress it.

In the general algorithm of the adaptive echo canceller, the largest amount of computation required is the update parameter part. The larger the order, the larger the amount of calculation. Taking into account the performance of the DSP and the computational requirements of the algorithm, an adaptive filter of 128 steps is used for echo cancellation.

The input of the echo cancellation module is a synthesized speech of the current frame input speech signal and the previous decoder output. The echo canceller echo-compensates a frame of the input signal by using the synthesized speech signal output from the decoder, and then supplies a frame of the input speech signal that cancels the echo back to the encoder as an input signal.

(3) Optimization of storage area ^[4]

1Because of the use of DP addressing, the variable name only indicates the offset; while the 4-way signal is time-sharing, the variables used in each path are the same as the program code, so you can use the same-named variable on different pages without confusing the signal data storage. Simplify the size of the program memory area under the premise of the area.

For the ADPCM program, the state variables of the codec each occupy 25 words, and the variables used by the codec occupy 14 words. Since the DP page has 128 words, it can be considered that the first two codec variables share one page of storage area; the latter two codec variables share one page of storage area. Thus, the storage capacity of the 2-way codec is 25 Ã— 2 Ã— 2 + 14 = 114 words < 128 words, which can be stored on the same page.

In order to distinguish between 2 variables on the same page and codec variables on the same page, their names should be different. Therefore, the program modules of the codec need two versions each, that is, the total program storage amount = one-way program amount Ã— 2.

2 pairs of CVSD and ADPCM algorithms, because the input and output signals are not framed (sample-by-sample processing), and the code stream format is consistent, so you can share the input and output memory cells.

(4) Optimization of code ^[3]

The TMS320C54xx provides a powerful hardware architecture and instruction set to support basic data processing operations. For assembly language, fully exploiting the potential of the instruction set can greatly reduce the complexity of the program and improve the running speed. For example, using the multiply-accumulate instruction MAC, MAS, etc., realize one multiplication and one add (subtraction) method in one clock cycle; use the DELAY instruction to implement variable update in one cycle, simplifying the implementation of the filter; using the loop Addressing, for the FIR and IIR filters, as long as the base address is set once in the main program, reducing the overhead; using the RPT+MVDD instruction to perform block shifting, reducing the overhead of frequent memory reads and writes; using double word operation instructions DADD, DADSUB, etc. The instruction operates on 32-bit variables; uses the EXP+NORM instruction to quickly calculate the exponent and mantissa of the fixed-point number; uses the RPT+DSUBT instruction to efficiently implement fixed-point division; and uses the RPT+FIRS instruction to efficiently implement the FIR filter calculation.

4 achieve results

The complexity and storage capacity of each algorithm are shown in Table 4, and the resources used are implemented.

Therefore, a piece of hardware resources on the VC5409 can meet the algorithm requirements. The performance of each algorithm of the actual system has also passed the relevant standard tests.

Conecting Terminals Without Screws

Conecting Terminals Without Screws,Cold Pressing Terminals,Low Pressure Cold Shrinkage Terminal,Cold Shrinkage Cable Terminals

Taixing Longyi Terminals Co.,Ltd. , https://www.longyicopperlugs.com