Large Vocabulary in Continuous Speech Recognition Using HMM and Normal Fit
Hemakumar G, Punithavalli M, Thippeswamy K "Large Vocabulary in Continuous Speech Recognition Using HMM and Normal Fit". International Journal of Computer Trends and Technology (IJCTT) V42(2):102-107, December 2016. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.
Abstract -
this paper addresses the problem of
large vocabulary speaker independent continuous
speech recognition using the phonemes, Hidden
Markov Model (HMM) and Normal fit method. Here
we first detect for the voiced part in speech signal
through computing dynamic threshold in each
frame. Real Cepstrum coefficients are extracted as
features from the voiced frames. The Baum–Welch
algorithm is applied for training those features.
Then normal fit technique is applied, the outputted
values are labelled using correspondent phoneme or
syllable. The model is tested for 5 languages namely
English, Kannada, Hindi, Tamil and Telugu. The
automatic segmentation of speech signals average
accuracy rate is 95.42% and miss rate of about
4.58%. In the large vocabulary, average Word
Recognition Rate (WRR) is 85.16% and average
Word Error Rate (WER) is 14.84%. All
computations are done using mat lab.
References
[1] Douglas O Shaughnessy, Speech Communications: Human
and Machine, Universities Press (India) Private Limited,
Hyderabad, Reprinted on 2004.
[2] Sabato Marco Siniscalchi et Al., “Hermitian Polynomial
for Speaker Adaptation of Connectionist Speech
Recognition Systems”, IEEE Transactions on Audio,
Speech, And Language Processing, Vol. 21, NO. 10,
October 2013, page No 2151-2161.
[3] Martin Krawczyk and Timo Gerkmann, “STFT Phase
Reconstruction in Voiced Speech for an Improved Single-
Channel Speech Enhancement”, IEEE Transactions on
Audio, Speech and Language Processing, Vol. 22, No. 12,
December 2014, Pg. 1931-1940.
[4] Matthew McCallum et al., “Stochastic-Deterministic
MMSE STFT Speech Enhancement with General A Priori
Information”, IEEE Transactions on Audio, Speech and
Language Processing, Vol. 21, No. 7, July 2013, Pg. 1445-
1457.
[5] Jesper Rindom Jensen et al., “A Class of Optimal
Rectangular Filtering Matrices for Single-Channel Signal
Enhancement in the Time Domain”, IEEE Transactions on
Audio, Speech and Language Processing, Vol. 21, No. 12,
December 2013, Pg. 2595-2606.
[6] Yi Hu and Philipos C. Loizou, “Evaluation of Objective
Quality Measures for Speech Enhancement”, IEEE
Transactions on Audio, Speech and Language Processing,
Vol. 16, No. 1, January 2008, Pg. 229-238.
[7] Robert Rozman and Dusan M. Kodek, “Using asymmetric
windows in automatic speech recognition”, Speech
Communication 49 (2007), page no 268–276.
[8] Li Deng, “A dynamic, feature-based approach to the
interface between phonology and phonetics for speech
modeling and recognition”, Speech Communication 24
(1998), page no. 299 to 323.
[9] Yi Hu and Philipos C. Loizou, “Evaluation of Objective
Quality Measures for Speech Enhancement”, IEEE
Transactions on Audio, Speech and Language Processing,
Vol. 16, No. 1, January 2008, Pg. 229-238.
[10] Patricia Scanlon and Daniel P.W. Ellis, “Using Broad
Phonetic Group Experts for Improved Speech Recognition”,
IEEE transaction on Audio, Speech and Language
processing, VOL 15, No. 3, March 2007.
[11] Hemakumar G. and Punitha P., “Large Vocabulary Isolated
Word Recognition Using Syllable, HMM And Normal Fit”,
published by International Journal of Scientific &
Engineering Research, Volume 5, Issue 9, Sept-2014, Pg.
No: 34-37, ISSN: 2229-5518.
[12] Hemakumar G. and Punitha P., “Large Vocabulary Speech
Recognition: Speaker Dependent and Speaker
Independent”, Springer - Advances in Intelligent and Soft
Computing, Vol 339, Pg. No 73-80, Mandal et al (Eds):
Information Systems Design and Intelligent Applications.
[13] V. Kamakshi Prasad et al., “Continuous Speech
Recognition Using Automatically Segmented Data as
Syllabic Units”, Published at ICSP’02 Proceedings, 0-
7803-7488-6/02 © 2002 IEEE, Page No.235-238.
[14] Lalit R.Bahl, et al, “Estimating Hidden Markov Model
Parameters So as to maximize speech recognition
Accuracy”, IEEE Transactions on Audio, Speech and
Language processingvol.1,no.1, 1993.
[15] Nam Soo Kim et al., “On estimating Robust probability
Distribution in HMM based speech recognition”, IEEE
Transactions on Audio, Speech and Language processing,
vol.3, no.4, 1995.
[16] Thangarajan R., Natarajan A. M. and Selvam M. "Syllable
modeling in continuous speech recognition for Tamil
language", International Journal for Speech Technology,
vol. 12, pp.47 -57 2009.
[17] R. K. Aggarwal et al (2011), “Using Gaussian Mixtures for
Hindi Speech Recognition System”, International Journal
of Signal Processing, Image Processing and Pattern
Recognition Vol. 4, No. 4, December, 2011, page no 157-
170.
Keywords
Automatic Speech Recognition (ASR),
Speech Enhancement, Speech Perception, HMM and
Normal fit method.