Real Time Speech Compression using Transform Coding on DSP TME320E15


ABSTRACT:

Speech is the basic means of communication. Speech digitization opens many avanues such as voice storage, digital transmission, error correction, encryption, etc. There are many techniques of voice digitization. Broadly speaking, they fall into the category of waveform coding, parametric coding and transform coding.

PCM, DPCM, ADPCM, DM, ADM etc. are the waveform coding techniques which aim at reconstruction of the speech waveform and exploit sample to sample correlation properties of speech in time domain. They provide good speech at 64 kbps to 16 kbps with low complexity of the codec.

The parametric coding techniques take recourse to voice production mechanism and code the parameters which represent voice production. These techniques provide bit rates from 2.4 kbps to 16 kbps with greater complexity of teh coded. LPC, CELP, AC-CELP, LD-CELP are some of the well known parametric coding techniques.

The transform domain coding techniques provide speech bit rates between the above coding techniques i.e., 16 kbps to 32 kbps. These techniques are of medium complexity and exploit the transform domain properties of speech.

The project involved real time speech compression using transform coding. Short term Fourier Transform of speech blocks of 8 ms provide frequency resolution of 125 Hz. These frequencies are judiciously coded for different bit rates ranging from 15 kbps to 30 kbps. It has been found that whereas 15 kbos speech lacks in intelligibility, 30 kbps speech provides almost natural quality. 25 kbps speech has been found to be of optimum quality in terms of bit rate and intelligibility.

Fast Fourier Transform (FFT) algorithm is used for DFT and IDFT computations using Texas Instruments' Digital Signal Processor TMS320E15. The basic hardware is built around TMS320E15 with 12 bit ADc/DAC and 8k x 16 RAM as data memory. The program space for the complete codec is about 1k x 16 bits of embedded program memory. The programs are developed in assembly language using the Evaluation Module (EVM) as a development and debugging tool. The assembly language programming is used for efficient implementation of the algorithms. It takes about 7.2 ms to process 8ms of speech. Thus providing the real time execution using a single processor. Sonograph, which represents speech in frequency domain, has been extensively used to compare the output speech with the input speech and arrive at the optimum bit rate.




      Home