PTCEC356 Speech Processing Syllabus:

PTCEC356 Speech Processing Syllabus – Anna University Part time Regulation 2023

COURSE OBJECTIVES:

 Study the fundamentals of speech signal and extracs various speech features
 Understand different speech coding techniques for speech compression applications
 Learn to build speech enhancement, text-to-speech synthesis system

UNIT I FUNDAMENTALS OF SPEECH

The Human speech production mechanism, Discrete-Time model of speech production, Speech perception – human auditory system, Phonetics – articulatory phonetics, acoustic phonetics, and auditory phonetics, Categorization of speech sounds, Spectrographic analysis of speech sounds, Pitch frequency, Pitch period measurement using spectral and cepstral domain, Formants, Evaluation of Formants for voiced and unvoiced speech.

UNIT II SPEECH FEATURES AND DISTORTION MEASURES

Significance of speech features in speech-based applications, Speech Features – Cepstral Coefficients, Mel Frequency Cepstral Coefficients (MFCCs), Perceptual Linear Prediction (PLP), Log Frequency Power Coefficients (LFPCs), Speech distortion measures–Simplified distance measure, LPC-based distance measure, Spectral distortion measure, Perceptual distortion measure.

UNIT III SPEECH CODING

Need for speech coding, Waveform coding of speech – PCM, Adaptive PCM, DPCM, ADPCM, Delta Modulation, Adaptive Delta Modulation, G.726 Standard for ADPCM, Parametric Speech Coding – Channel Vocoders, Linear Prediction Based Vocoders, Code Excited Linear Prediction (CELP) based Vocoders, Sinusoidal speech coding techniques, Hybrid coder, Transform domain coding of speech

UNIT IV SPEECH ENHANCEMENT

Classes of Speech Enhancement Algorithms, Spectral-Subtractive Algorithms – Multiband Spectral Subtraction, MMSE Spectral Subtraction Algorithm, Spectral Subtraction Based on Perceptual Properties, Wiener Filtering – Wiener Filters in the Time Domain, Wiener Filters in the Frequency Domain, Wiener Filters for Noise Reduction, Maximum-Likelihood Estimators, Bayesian Estimators, MMSE and Log-MMSE Estimator, Subspace Algorithms.

UNIT V SPEECH SYNTHESIS AND APPLICATION

A Text-to-Speech systems (TTS), Synthesizers technologies – Concatenative synthesis, Use of Formants for concatenative synthesis, Use of LPC for concatenative synthesis, HMM-based synthesis, Sinewave synthesis, Speech transformations, Watermarking for authentication of a speech, Emotion recognition from speech.

30 PERIODS
PRACTICAL EXERCISES: 30 PERIODS

1. Write a MATLAB Program to classify voiced and unvoiced segment of speech using various time-domain measures
2. Write a MATLAB Program to calculate the MFCC for a speech signal
3. Implement ITU-T G.722 Speech encoder in MATLAB
4. Write a MATLAB Program to implement Wiener Filters for Noise Reduction
5. Design a speech emotion recognition system using DCT and WPT in MATLAB

HARDWARE & SOFTWARE SUPPORT TOOLS:

 Personal Computer with MATLAB
 Microphone and Speakers

COURSE OUTCOMES:

At the end of this course, the students will be able to:
CO1: Understand the fundamentals of speech.
CO2: Extract various speech features for speech related applications
CO3: Choose an appropriate speech coder for a given application.
CO4: Build a speech enhancement system.
CO5: Build a text-to-speech synthesis system for various applications

TOTAL:60 PERIODS
TEXT BOOKS :

1. Shaila D. Apte, Speech and Audio Processing, Wiley India (P) Ltd, New Delhi, 2012
2. Philipos C. Loizou, Speech Enhancement Theory and Practice, Second Edition, CRC Press, Inc., United States, 2013

REFERENCES:

1. Rabiner L. R. and Juang B. H, Fundamentals of speech recognition, Pearson Education, 2003
2. Thomas F. Quatieri, Discrete-time speech signal processing – Principles and practice, Pearson, 2012.