Automatic speech recognition
Master
In Maynard (USA)
Description
-
Type
Master
-
Location
Maynard (USA)
-
Start date
Different dates available
6.345 introduces students to the rapidly developing field of automatic speech recognition. Its content is divided into three parts. Part I deals with background material in the acoustic theory of speech production, acoustic-phonetics, and signal representation. Part II describes algorithmic aspects of speech recognition systems including pattern classification, search algorithms, stochastic modelling, and language modelling techniques. Part III compares and contrasts the various approaches to speech recognition, and describes advanced techniques used for acoustic-phonetic modelling, robust speech recognition, speaker adaptation, processing paralinguistic information, speech understanding, and multimodal processing.
Facilities
Location
Start date
Start date
Reviews
Subjects
- Production
- Systems
- Materials
- Phonetics
- Algorithms
Course programme
Lectures: 2 sessions / week, 1.5 hours / session
This course introduces students to the rapidly developing field of automatic speech recognition. Its content is divided into three parts. Part I deals with background material in the acoustic theory of speech production, acoustic-phonetics, and signal representation. Part II describes algorithmic aspects of speech recognition systems including pattern classification, search algorithms, stochastic modelling, and language modelling techniques. Part III compares and contrasts the various approaches to speech recognition, and describes advanced techniques used for acoustic-phonetic modelling, robust speech recognition, speaker adaptation, processing paralinguistic information, speech understanding, and multimodal processing.
There will be two 90 minute lectures per week. To facilitate the coverage of a large quantity of material, copies of the lecture viewgraphs will be handed out. There will be no final exam for the course. Instead there will be two in-class quizzes each counting approximately 15% towards the final grade.
There will be weekly assignments consisting of both problems and mandatory laboratory work, so that students will be able to gain hands-on experience with the materials covered. Linux workstations will be made available to conduct laboratory work. A sign-up mechanism will be available via the 6.345 web-site to reserve time on these machines. Assignments must be turned in by the due date. Solutions will be provided along with the graded assignments. Each of the nine assignments will count approximately 5% towards the final grade.
During the last quarter of the course, assignments will end, and students will work on a term project that will count approximately 25% towards the final grade. Projects will be chosen in consultation with staff members, and typically involve creating and evaluating a speech recognizer along a dimension of interest to the student. Tool kits of key recognizer components will be provided, so that minimal programming skills are necessary.
A detailed outline of the class lectures and assignments is also available.
Lecturer: Jim Glass
Huang, Acero, and Hon. Spoken Language Processing. Upper Saddle River, NJ: Prentice-Hall, 2001. ISBN: 0130226165.
Jelinek. Statistical Methods for Speech Recognition. Cambridge, MA: MIT Press, 1998. ISBN: 0262100665.
Rabiner & Juang. Fundamentals of Speech Recognition. Upper Saddle River, NJ: Prentice-Hall, 1993. ISBN: 0130151572.
Duda, Hart, and Stork. Pattern Classification. New York, NY: Wiley & Sons, 2000. ISBN: 0471056693.
Stevens. Acoustic Phonetics. MIT Press, 1998. ISBN: 0262692503.
Don't show me this again
This is one of over 2,200 courses on OCW. Find materials for this course in the pages linked along the left.
MIT OpenCourseWare is a free & open publication of material from thousands of MIT courses, covering the entire MIT curriculum.
No enrollment or registration. Freely browse and use OCW materials at your own pace. There's no signup, and no start or end dates.
Knowledge is your reward. Use OCW to guide your own life-long learning, or to teach others. We don't offer credit or certification for using OCW.
Made for sharing. Download files for later. Send to friends and colleagues. Modify, remix, and reuse (just remember to cite OCW as the source.)
Learn more at Get Started with MIT OpenCourseWare
Automatic speech recognition