Statistics and Data Science (B.A. or B.S.)

Postgraduate

In New Haven (USA)

Price on request

Description

  • Type

    Postgraduate

  • Location

    New haven (USA)

Director of undergraduate studies: Sekhar Tatikonda, Rm. 338, 17 Hillhouse Ave., 432-4714; statistics.yale.edu; Major FAQ and guide; undergraduate major checklist

Facilities

Location

Start date

New Haven (USA)
See map
06520

Start date

On request

About this course

Students who wish to major in Statistics and Data Science are encouraged to take S&DS 220 or a 100-level course followed by S&DS 230. Students should complete the calculus prerequisite and linear algebra requirement (MATH 222 or 225) as early as possible, as they provide mathematical background that is required in many courses. To fulfill the requirements of the certificate, students must take five courses from four different areas of statistical data analysis. No course may be applied to satisfy the requirements of both a major and the certificate . No single course may count for...

Questions & Answers

Add your question

Our advisors and other users will be able to reply to you

Who would you like to address this question to?

Fill in your details to get a reply

We will only publish your name and question

Reviews

Subjects

  • Probability
  • Computational
  • Programming
  • Confidence Training
  • Medical training
  • Medical
  • Simulation
  • Statistics
  • Algorithms
  • Data analysis
  • Networks
  • Testing
  • Computing
  • Staff

Course programme

S&DS 101—106, Introduction to Statistics and Data Science

A basic introduction to statistics, including numerical and graphical summaries of data, probability, hypothesis testing, confidence intervals, and regression. Each course in this group focuses on applications to a particular field of study and is taught jointly by two instructors, one specializing in statistics and the other in the relevant area of application. The first seven weeks of classes are attended by all students in S&DS 101–106 together, as general concepts and methods of statistics are developed. The remaining weeks are divided into field-specific sections that develop the concepts with examples and applications. Computers are used for data analysis. These courses are alternatives; they do not form a sequence and only one may be taken for credit. No prerequisites beyond high school algebra. May not be taken after S&DS 100 or 109.

Students enrolled in S&DS 101–106 who wish to change to S&DS 109, or those enrolled in S&DS 109 who wish to change to S&DS 101–106, must submit a course change notice, signed by the instructor, to their residential college dean by Monday, October 2. The approval of the Committee on Honors and Academic Standing is not required.

S&DS 101a / E&EB 210a, Introduction to Statistics: Life SciencesJonathan Reuning-Scherer

Statistical and probabilistic analysis of biological problems, presented with a unified foundation in basic statistical theory. Problems are drawn from genetics, ecology, epidemiology, and bioinformatics.  QR
TTh 1pm-2:15pm

S&DS 102a / EP&E 203a / PLSC 452a, Introduction to Statistics: Political ScienceJonathan Reuning-Scherer

Statistical analysis of politics, elections, and political psychology. Problems presented with reference to a wide array of examples: public opinion, campaign finance, racially motivated crime, and public policy.  QR
TTh 1pm-2:15pm

S&DS 103a / EP&E 209a / PLSC 453a, Introduction to Statistics: Social SciencesJonathan Reuning-Scherer

Descriptive and inferential statistics applied to analysis of data from the social sciences. Introduction of concepts and skills for understanding and conducting quantitative research.  QR
TTh 1pm-2:15pm

S&DS 105a, Introduction to Statistics: MedicineJonathan Reuning-Scherer and Russell Barbour

Statistical methods used in medicine and medical research. Practice in reading medical literature competently and critically, as well as practical experience performing statistical analysis of medical data.  QR
TTh 1pm-2:15pm

S&DS 106a, Introduction to Statistics: Data AnalysisJonathan Reuning-Scherer and William Brinda

An introduction to probability and statistics with emphasis on data analysis.  QR
TTh 1pm-2:15pm

Courses in Statistics and Data Science

S&DS 100b, Introductory StatisticsStaff

An introduction to statistical reasoning. Topics include numerical and graphical summaries of data, data acquisition and experimental design, probability, hypothesis testing, confidence intervals, correlation and regression. Application of statistical concepts to data; analysis of real-world problems. May not be taken after S&DS 101–106 or 109.  QR
MWF 10:30am-11:20am

S&DS 109a, Introduction to Statistics: FundamentalsJonathan Reuning-Scherer

General concepts and methods in statistics. Meets for the first half of the term only. May not be taken after S&DS 100 or 101–106.  ½ Course cr
TTh 1pm-2:15pm

[ S&DS 110, An Introduction to R for Statistical Computing and Data Science ]

S&DS 123b / CPSC 123b / S&DS 523b, YData: An Introduction to Data ScienceJessi Cisewski-Kehe

Computational, programming, and statistical skills are no longer optional in our increasingly data-driven world; these skills are essential for opening doors to manifold research and career opportunities. This course aims to dramatically enhance knowledge and capabilities in fundamental ideas and skills in data science, especially computational and programming skills along with inferential thinking. YData is an introduction to Data Science that emphasizes the development of these skills while providing opportunities for hands-on experience and practice. YData is accessible to students with little or no background in computing, programming, or statistics, but is also engaging for more technically oriented students through extensive use of examples and hands-on data analysis. Python 3, a popular and widely used computing language, is the language used in this course. The computing materials will be hosted on a special purpose web server.  QR
MWF 10:30am-11:20am

* S&DS 150b, Data Science EthicsElisa Celis

In this course, we introduce, discuss, and analyze ethical issues, algorithmic challenges, and policy decisions that arise when addressing real-world problems via the lens of data science. We grapple with the normative questions of what constitutes bias, fairness, discrimination, or ethics when it comes to data science and machine learning in applications such as policing, health, journalism, and employment. We incorporate technical precision by introducing quantitative measures that allow us to study how algorithms codify, exacerbate and/or introduce biases of their own, and study analytic methods of correcting for or eliminating these biases. Lastly, we study the social implications of these decisions, and understand the legal, political and policy decisions that could be used to govern data-driven decision making by making them transparent and auditable. We read critical commentary by practitioners, state-of-the-art technical papers by data scientist and computer scientists, and samples of legal scholarship, moral and ethical philosophy, readings in sociology, and policy documents. We often ground our discussions around recent case studies, controversies, and current events. Prerequisites: One from S&DS 238, S&DS 241, S&DS 242, or the equivalent; and one from S&DS 230, ECON 131, or the equivalent. Suggested courses: one from: CPSC 470, S&DS 365, ECON 429, CPSC 365, CPSC 366, or equivalent; and one from: EP&E 215, PHIL 175, PHIL 177, SOCY 144, PLSC 262, PLSC 320, or equivalent.  SO
TTh 1pm-2:15pm

* S&DS 160b / AMTH 160b / MATH 160b, The Structure of NetworksRonald Coifman

Network structures and network dynamics described through examples and applications ranging from marketing to epidemics and the world climate. Study of social and biological networks as well as networks in the humanities. Mathematical graphs provide a simple common language to describe the variety of networks and their properties.  QR
TTh 11:35am-12:50pm

* S&DS 171b, YData: Text Data Science: An IntroductionStaff

Written language is the primary means by which humans document their observations of the world, including scientific discoveries, interpretations of history and art, health diagnoses, analyses of political events and economic trends, social interactions, and many others. Increasingly, this rapidly growing transcript is readily available in electronic form, and is being used in commercial applications and to advance scientific knowledge. Text Data Science is an introduction to computational and inferential methods that use text. The focus is on simple but often powerful text processing techniques that do not require linguistic analyses, to gain familiarity with working with text data. Sources used in the seminar include political speeches, Twitter feeds, scientific journals, online FAQ and discussion boards, Wikipedia, news articles, and consumer product reviews. Methodologies include scraping, wrangling, hashing, sorting, regressing, embedding, and probabilistic modeling. The course is based on the Python programming language within a cloud computing platform, and is paced to be accessible to students who have previously taken or are currently enrolled in YData (S&DS 123). Prerequisite: S&DS 123, which may be taken concurrently.  QR½ Course cr
Th 9:25am-11:15am

* S&DS 172b / EP&E 328b / PLSC 347b, YData: Data Science for Political CampaignsJoshua Kalla

Political campaigns have become increasingly data driven. Data science is used to inform where campaigns compete, which messages they use, how they deliver them, and among which voters. In this course, we explore how data science is being used to design winning campaigns. Students gain an understanding of what data is available to campaigns, how campaigns use this data to identify supporters, and the use of experiments in campaigns. This course provides students with an introduction to political campaigns, an introduction to data science tools necessary for studying politics, and opportunities to practice the data science skills presented in S&DS 123, YData.
Prerequisite: S&DS 123, which may be taken concurrently.  QR½ Course cr
Th 9:25am-11:15am

S&DS 220b, Introductory Statistics, IntensiveJoseph Chang

Introduction to statistical reasoning for students with particular interest in data science and computing. Using the R language, topics include exploratory data analysis, probability, hypothesis testing, confidence intervals, regression, statistical modeling, and simulation. Computing taught and used extensively, as well as application of statistical concepts to analysis of real-world data science problems. MATH 115 is helpful but not required. While no particular prior experience in computing is required, strong motivation to practice and learn computing are desirable.  QR
TTh 9am-10:15am

S&DS 230a or b, Data Exploration and AnalysisStaff

Survey of statistical methods: plots, transformations, regression, analysis of variance, clustering, principal components, contingency tables, and time series analysis. The R computing language and Web data sources are used. Prerequisite: a 100-level Statistics course or equivalent, or with permission of instructor.  QR
HTBA

S&DS 238a, Probability and StatisticsJoseph Chang

Fundamental principles and techniques of probabilistic thinking, statistical modeling, and data analysis. Essentials of probability, including conditional probability, random variables, distributions, law of large numbers, central limit theorem, and Markov chains. Statistical inference with emphasis on the Bayesian approach: parameter estimation, likelihood, prior and posterior distributions, Bayesian inference using Markov chain Monte Carlo. Introduction to regression and linear models. Computers are used for calculations, simulations, and analysis of data. After or concurrently with MATH 118 or 120.  QR
TTh 1pm-2:15pm

S&DS 241a / MATH 241a, Probability TheoryWinston Lin

Introduction to probability theory. Topics include probability spaces, random variables, expectations and probabilities, conditional probability, independence, discrete and continuous distributions, central limit theorem, Markov chains, and probabilistic modeling. After or concurrently with MATH 120 or equivalent.  QR
MW 9am-10:15am

S&DS 242b / MATH 242b, Theory of StatisticsAndrew Barron

Study of the principles of statistical analysis. Topics include maximum likelihood, sampling distributions, estimation, confidence intervals, tests of significance, regression, analysis of variance, and the method of least squares. Some statistical computing. After S&DS 241 and concurrently with or after MATH 222 or 225, or equivalents.  QR
MWF 9:25am-10:15am

S&DS 262a / AMTH 262a / CPSC 362a, Computational Tools for Data ScienceRoy Lederman

Introduction to the core ideas and principles that arise in modern data analysis, bridging statistics and computer science and providing students the tools to grow and adapt as methods and techniques change. Topics include principle component analysis, independent component analysis, dictionary learning, neural networks and optimization, as well as scalable computing for large datasets. Assignments will include implementation, data analysis and theory. Students require background in linear algebra, multivariable calculus, probability and programming. Prerequisites: after or concurrently with MATH 222, 225, or 231; after or concurrently with MATH 120, 230, or ENAS 151; after or concurrently with CPSC 100, 112, or ENAS 130; after S&DS 100-108 or S&DS 230 or S&DS 241 or S&DS 242.  QR
HTBA

S&DS 312a, Linear ModelsJoseph Chang

The geometry of least squares; distribution theory for normal errors; regression, analysis of variance, and designed experiments; numerical algorithms, with particular reference to the R statistical language. After S&DS 242 and MATH 222 or 225.  QR
MW 11:35am-12:50pm

* S&DS 314b, Introduction to Causal InferenceWinston Lin

Introduction to causal inference with applications to the social and health sciences. Topics include randomized experiments, matching and propensity score methods, sensitivity analysis, instrumental variables, and regression discontinuity designs. Mathematical problems, data analysis in R, and critical discussions of published applied research. Prerequisite: S&DS 242 and some programming experience in R.  QR
HTBA

S&DS 315a / PLSC 340, Measuring Impact and Opinion ChangeJoshua Kalla

This course introduces students to measuring impact. Political campaigns, marketers, governments, and non-profit organizations regularly try to produce opinion change through TV, radio, online ads, mail, and door-to-door canvassing. Are these efforts successful at producing opinion change? In this course, we learn how to use experiments and natural experiments to measure the impact of opinion change efforts, and how to be appropriately skeptical of findings that claim to measure impact. This course also teaches data analysis skills in R. Prerequisite: S&DS 242 and some programming experience in R.  QR
HTBA

S&DS 351b / EENG 434b / MATH 251b, Stochastic ProcessesAmin Karbasi

Introduction to the study of random processes including linear prediction and Kalman filtering, Poison counting process and renewal processes, Markov chains, branching processes, birth-death processes, Markov random fields, martingales, and random walks. Applications chosen from communications, networking, image reconstruction, Bayesian statistics, finance, probabilistic analysis of algorithms, and genetics and evolution. Prerequisite: S&DS 241 or equivalent.  QR
MW 1pm-2:15pm

S&DS 352b / MB&B 452b / MCDB 452b, Biomedical Data Science, Mining and ModelingMark Gerstein and Matthew Simon

Techniques in data mining and simulation applied to bioinformatics, the computational analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. Sequence alignment, comparative genomics and phylogenetics, biological databases, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, microarray normalization, and machine-learning approaches to data integration. Prerequisites: MB&B 301 and MATH 115, or permission of instructor.  SC
MW 1pm-2:15pm

S&DS 355a, Introductory Machine LearningJohn Lafferty

This course covers the key ideas and techniques in machine learning without the use of advanced mathematics. Basic methodology and relevant concepts are presented in lectures, including the intuition behind the methods. Assignments give students hands-on experience with the methods on different types of data. Topics include linear regression and classification, tree-based methods, clustering, topic models, word embeddings, recurrent neural networks, dictionary learning and deep learning. Examples come from a variety of sources including political speeches, archives of scientific articles, real estate listings, natural images, and several others. Programming is central to the course, and is based on the Python programming language. Prerequisites: Two of the following courses: S&DS 230, 238, 240, 241 and 242; previous programming experience (e.g., R, Matlab, Python, C++), Python preferred.  QR
HTBA

S&DS 361b / AMTH 361b, Data AnalysisStaff

Selected topics in statistics explored through analysis of data sets using the R statistical computing language. Topics include linear and nonlinear models, maximum likelihood, resampling methods, curve estimation, model selection, classification, and clustering. After S&DS 242 and MATH 222 or 225, or equivalents.  QR
MW 2:30pm-3:45pm

S&DS 363b, Multivariate Statistics for Social SciencesJonathan Reuning-Scherer

Introduction to the analysis of multivariate data as applied to examples from the social sciences. Topics include principal components analysis, factor analysis, cluster analysis (hierarchical clustering, k-means), discriminant analysis, multidimensional scaling, and structural equations modeling. Extensive computer work using either SAS or SPSS programming software. Prerequisites: knowledge of basic inferential procedures and experience with linear models.  QR
TTh 1pm-2:15pm

S&DS 364b / AMTH 364b / EENG 454b, Information TheoryAndrew Barron

Statistics and Data Science (B.A. or B.S.)

Price on request