The field of data science deals with the theories, methodologies and tools of applying statistical concepts and computational techniques to various data analysis problems related to science, engineering, medicine, business, etc. Data science is a highly interdisciplinary field. It can be extensively applied to economics, biology, health care as well as quantitative social science including global health, environmental science and humanities (e.g., digital media). The Data Science program prepares students with the concepts, theorems, algorithms and methodologies that are essential for data analysis, knowledge learning and information extraction of practical applications.

Courses listed below are recommended electives for the major. Students can also select other courses in different divisions as electives.

This major will prepare students for a variety of jobs requiring expertise in public administration, international development, political risk analysis, multinational investment and work in the non-profit sector at both the domestic and international levels. Graduates may also pursue further studies in economics, management, public policy, politics and other areas.

The fundamental concepts and tools of calculus, probability, and linear algebra are essential to modern sciences, from the theories of physics and chemistry that have long been tightly coupled to mathematical ideas, to the collection and analysis of data on complex biological systems. Given the emerging technologies for collecting and sharing large data sets, some familiarity with computational and statistical methods is now also essential for modeling biological and physical systems and interpreting experimental results. MF1 is an introduction to differential and integral calculus that focuses on the concepts necessary for understanding the meaning of differential equations and their solutions. It includes an introduction to a software package for numerical solution of ordinary differential equations.

This course focuses on the concept of energy and its relevance for explaining the behavior of natural systems. The conservation of energy and the transformations of energy from one form to another are crucial to the function of all systems, including familiar mechanical devices, molecular structures and reactions, and living organisms and ecosystems. By integrating perspectives from physics, chemistry, and biology, this course helps students see both the elegant simplicity of universal laws governing all physical systems and the intricate mechanisms at play in the biosphere. Topics include kinetic energy, potential energy, quantization of energy, energy conservation, cosmological and ecological processes.

The fundamental concepts and tools of calculus, probability, and linear algebra are essential to modern sciences, from the theories of physics and chemistry that have long been tightly coupled to mathematical ideas, to the collection and analysis of data on complex biological systems. Given the emerging technologies for collecting and sharing large data sets, some familiarity with computational and statistical methods is now also essential for modeling biological and physical systems and interpreting experimental results. MF2 is an introduction to probability and statistics with an emphasis on concepts relevant for the analysis of complex data sets. It includes an introduction to the fundamental concepts of matrices, eigenvectors, and eigenvalues.

This course focuses on the collective behavior of systems composed of many interacting components. The phenomena of interest range from the simple relaxation of a gas into an equilibrium state of well-defined pressure and temperature to the emergence of ever increasing complexity in living organisms and the biosphere. The course provides an overview of some fundamental differences between traditional disciplines as well as indications of how they complement each other some important contexts. Topics include thermodynamic (statistical mechanical) equilibrium, fundamental concepts of temperature, entropy, free energy, and chemical equilibrium, driven systems, fundamentals of biological and ecological systems.

Integrated Science 3 emphasizes the physics and chemistry concepts of oscillating systems, waves, and fields, and includes applications to human perception. In addition to their fundamental importance to physics and chemistry proper, these ideas are essential for developing an awareness of the principles employed by engineers in the construction of the electrical and optical devices that are ubiquitous in modern civilization. Topics include harmonic oscillators, sound waves, light, and reaction-diffusion patterns.

This major prepares graduates for advanced study in computer science, math, statistics and related areas, and for careers in fields such as science, engineering, health care, finance and economics as well as quantitative social science.

Integrated Science 4 has more of a chemistry/biology emphasis, with physics brought to bear as needed. It treats topics relevant to understanding organisms, biochemical engineering, and the environment. Topics include evolution, modern biology, ecosystems, hydrology, and climate.

The course covers some of the areas of scientific communication that a scientist needs to know and to master in order to successfully promote his or her research and career. Students will learn to recognize and construct logical arguments and become familiar with the structure of common publication formats. It will help students to advance their skills in communicating findings in textual, visual and verbal formats for a variety of audiences.

This course covers data and representations, functions, conditions, loops, strings, lists, sets, maps, hash tables, trees, stacks, graphs, object-oriented programming, programming interface and software engineering.

This course covers maximum likelihood estimation, linear discriminant analysis, logistic regression, support vector machine, decision tree, linear regression, Bayesian inference, unsupervised learning, and semi-supervised learning.

This course covers statistical inference, parametric method, sparsity, nonparametric methods, learning theory, kernel methods, computation algorithms and advanced learning topics.

This course introduces the principles and methodologies for data acquisition and visualization, along with tools and techniques used to clean and process data for visual analysis. It also covers the practical software tools and languages such as Tableau, OpenRefine and Python/Matlab.

This course covers interdisciplinary applications of data analysis for social science, behavioral modeling, health care, financial modeling, advanced manufacturing, etc. Students are expected to solve a number of practical problems by implementing data algorithms with R during their course projects.

This course covers probability models, random variables with discrete and continuous distributions, independence, joint distributions, conditional distributions, expectations, functions of random variables, central limit theorem, stochastic processes, random walks, and Markov chains.

This course covers pseudo inverse, inner product, vector spaces and subspaces, orthogonality, linear transformations and operators, projections, matrix factorization, and singular value decomposition.

This course covers Gaussian elimination, LU factorization, Cholesky decomposition, QR decomposition, Newton-Raphson method, binary search, convex function, convex set, gradient method, Newton method, Lagrange dual, KKT condition, interior point method, conjugate gradient method, random walk, and stochastic optimization.

This course covers sorting, order statistics, binary search, dynamic programming, greedy algorithms, graph algorithms, minimum spanning trees, shortest paths, SQL, file organization, hashing, sorting, query, schema, transaction management, concurrency control, rash recovery, distributed database, and database as a service.

This course covers Bayesian inference, prior and posterior distributions, multi-level models, model checking and selection, and stochastic simulation by Markov Chain Monte Carlo.

This course covers image formation and representation, camera geometry and calibration, multi-view geometry, stereo, 3D reconstruction from images, motion analysis, image segmentation, and object recognition.

This course covers Boolean retrieval, dictionary, index, vector space model, score, query, XML, language model, text classification, clustering, and web search.

This course covers speech production and perception, template-based recognition, hidden Markov modeling, language processing, robust recognition, speech inference, multimodal interface and applications.

This course covers cloud infrastructures, virtualization, distributed file system, software defined networks and storage, cloud storage, and programming models such as MapReduce and Spark.

This course covers neural network, deep belief network, Boltzmann machine, convolutional neural network, recurrent neural network, and deep learning applications for speech, image, video, etc.

This course covers Bayesian network, Markov random field, Gaussian graphical model, message passing, generalized linear model, expectation-maximization, factor analysis, state space model, conditional random field, variational inference, approximate inference, Dirichlet process, kernel graphical model and spectral algorithm.

This course covers uninformed search, informed search, constraint satisfaction, classical planning, neural network, deep learning, hidden Markov model, Bayesian network, Markov decision process, reinforcement learning, active learning and game theory.

As an introductory course in data science, this course will show students not only the big picture of data science but also the detailed essential skills of loading, cleaning, manipulating, visualizing, analyzing and interpreting data with hands on programming experience.

This course introduces the logical structure of digital media and explores computational media manipulation. The course uses the Python programming language to explore media manipulation and transformation. Topics include spatial and temporal resolution, color, texture, filtering, compression and feature detection.

This course covers image formation and representation, camera geometry and calibration, multi-view geometry, stereo, 3D reconstruction from images, motion analysis, image segmentation, and object recognition.

Prerequisite(s): STATS 302 Principles of Machine