Automatic speech recognition

Master

In Maynard (USA)

Price on request

Description

  • Type

    Master

  • Location

    Maynard (USA)

  • Start date

    Different dates available

6.345 introduces students to the rapidly developing field of automatic speech recognition. Its content is divided into three parts. Part I deals with background material in the acoustic theory of speech production, acoustic-phonetics, and signal representation. Part II describes algorithmic aspects of speech recognition systems including pattern classification, search algorithms, stochastic modelling, and language modelling techniques. Part III compares and contrasts the various approaches to speech recognition, and describes advanced techniques used for acoustic-phonetic modelling, robust speech recognition, speaker adaptation, processing paralinguistic information, speech understanding, and multimodal processing.

Facilities

Location

Start date

Maynard (USA)
See map
02139

Start date

Different dates availableEnrolment now open

Questions & Answers

Add your question

Our advisors and other users will be able to reply to you

Who would you like to address this question to?

Fill in your details to get a reply

We will only publish your name and question

Reviews

Subjects

  • Production
  • Systems
  • Materials
  • Phonetics
  • Algorithms

Course programme

Lectures: 2 sessions / week, 1.5 hours / session


This course introduces students to the rapidly developing field of automatic speech recognition. Its content is divided into three parts. Part I deals with background material in the acoustic theory of speech production, acoustic-phonetics, and signal representation. Part II describes algorithmic aspects of speech recognition systems including pattern classification, search algorithms, stochastic modelling, and language modelling techniques. Part III compares and contrasts the various approaches to speech recognition, and describes advanced techniques used for acoustic-phonetic modelling, robust speech recognition, speaker adaptation, processing paralinguistic information, speech understanding, and multimodal processing.


There will be two 90 minute lectures per week. To facilitate the coverage of a large quantity of material, copies of the lecture viewgraphs will be handed out. There will be no final exam for the course. Instead there will be two in-class quizzes each counting approximately 15% towards the final grade.


There will be weekly assignments consisting of both problems and mandatory laboratory work, so that students will be able to gain hands-on experience with the materials covered. Linux workstations will be made available to conduct laboratory work. A sign-up mechanism will be available via the 6.345 web-site to reserve time on these machines. Assignments must be turned in by the due date. Solutions will be provided along with the graded assignments. Each of the nine assignments will count approximately 5% towards the final grade.


During the last quarter of the course, assignments will end, and students will work on a term project that will count approximately 25% towards the final grade. Projects will be chosen in consultation with staff members, and typically involve creating and evaluating a speech recognizer along a dimension of interest to the student. Tool kits of key recognizer components will be provided, so that minimal programming skills are necessary.


A detailed outline of the class lectures and assignments is also available.


Lecturer: Jim Glass


Huang, Acero, and Hon. Spoken Language Processing. Upper Saddle River, NJ: Prentice-Hall, 2001. ISBN: 0130226165.


Jelinek. Statistical Methods for Speech Recognition. Cambridge, MA: MIT Press, 1998. ISBN: 0262100665.


Rabiner & Juang. Fundamentals of Speech Recognition. Upper Saddle River, NJ: Prentice-Hall, 1993. ISBN: 0130151572.


Duda, Hart, and Stork. Pattern Classification. New York, NY: Wiley & Sons, 2000. ISBN: 0471056693.


Stevens. Acoustic Phonetics. MIT Press, 1998. ISBN: 0262692503.


Don't show me this again


This is one of over 2,200 courses on OCW. Find materials for this course in the pages linked along the left.


MIT OpenCourseWare is a free & open publication of material from thousands of MIT courses, covering the entire MIT curriculum.


No enrollment or registration. Freely browse and use OCW materials at your own pace. There's no signup, and no start or end dates.


Knowledge is your reward. Use OCW to guide your own life-long learning, or to teach others. We don't offer credit or certification for using OCW.


Made for sharing. Download files for later. Send to friends and colleagues. Modify, remix, and reuse (just remember to cite OCW as the source.)


Learn more at Get Started with MIT OpenCourseWare


Automatic speech recognition

Price on request