Majors

At Duke Kunshan University, each major consists of an interdisciplinary set of courses that integrate different forms of knowledge and a distinct set of disciplinary courses that provide expertise in specific areas.

DATA SCIENCE
The field of data science deals with the theories, methodologies and tools of applying statistical concepts and computational techniques to various data analysis problems related to science, engineering, medicine, business, etc. Data science is a highly interdisciplinary field. It can be extensively applied to economics, biology, health care as well as quantitative social science including global health, environmental science and humanities (e.g., digital media). The Data Science program prepares students with the concepts, theorems, algorithms and methodologies that are essential for data analysis, knowledge learning and information extraction of practical applications.

Required Courses

Divisional Foundation Courses
Interdisciplinary Courses
Disciplinary Courses

Recommended Electives for the Major

Courses listed below are recommended electives for the major. Students can also select other courses in different divisions as electives.

Career Path

This major prepares graduates for advanced study in computer science, math, and statistics and for careers in fields such as science, engineering, health care, finance and economics as well as quantitative social science.

Mathematical Foundations 1

The fundamental concepts and tools of calculus, probability, and linear algebra are essential to modern sciences, from the theories of physics and chemistry that have long been tightly coupled to mathematical ideas, to the collection and analysis of data on complex biological systems. Given the emerging technologies for collecting and sharing large data sets, some familiarity with computational and statistical methods is now also essential for modeling biological and physical systems and interpreting experimental results. MF1 is an introduction to differential and integral calculus that focuses on the concepts necessary for understanding the meaning of differential equations and their solutions. It includes an introduction to a software package for numerical solution of ordinary differential equations.

Integrated Science 1

This course focuses on the concept of energy and its relevance for explaining the behavior of natural systems. The conservation of energy and the transformations of energy from one form to another are crucial to the function of all systems, including familiar mechanical devices, molecular structures and reactions, and living organisms and ecosystems. By integrating perspectives from physics, chemistry, and biology, this course helps students see both the elegant simplicity of universal laws governing all physical systems and the intricate mechanisms at play in the biosphere. Topics include kinetic energy, potential energy, quantization of energy, energy conservation, cosmological and ecological processes.

Mathematical Foundations 2

The fundamental concepts and tools of calculus, probability, and linear algebra are essential to modern sciences, from the theories of physics and chemistry that have long been tightly coupled to mathematical ideas, to the collection and analysis of data on complex biological systems. Given the emerging technologies for collecting and sharing large data sets, some familiarity with computational and statistical methods is now also essential for modeling biological and physical systems and interpreting experimental results. MF2 is an introduction to probability and statistics with an emphasis on concepts relevant for the analysis of complex data sets. It includes an introduction to the fundamental concepts of matrices, eigenvectors, and eigenvalues.

Integrated Science 2

This course focuses on the collective behavior of systems composed of many interacting components. The phenomena of interest range from the simple relaxation of a gas into an equilibrium state of well-defined pressure and temperature to the emergence of ever increasing complexity in living organisms and the biosphere. The course provides an overview of some fundamental differences between traditional disciplines as well as indications of how they complement each other some important contexts. Topics include thermodynamic (statistical mechanical) equilibrium, fundamental concepts of temperature, entropy, free energy, and chemical equilibrium, driven systems, fundamentals of biological and ecological systems.

Integrated Science 3

Integrated Science 3 emphasizes the physics and chemistry concepts of oscillating systems, waves, and fields, and includes applications to human perception. In addition to their fundamental importance to physics and chemistry proper, these ideas are essential for developing an awareness of the principles employed by engineers in the construction of the electrical and optical devices that are ubiquitous in modern civilization. Topics include harmonic oscillators, sound waves, light, and reaction-diffusion patterns.

Career Path

This major prepares graduates for advanced study in computer science, math, statistics and related areas, and for careers in fields such as science, engineering, health care, finance and economics as well as quantitative social science.

Integrated Science 4

Integrated Science 4 has more of a chemistry/biology emphasis, with physics brought to bear as needed. It treats topics relevant to understanding organisms, biochemical engineering, and the environment. Topics include evolution, modern biology, ecosystems, hydrology, and climate.

Scientific Writing and Presentations II

Scientific Writing and Presentations cover some of the areas of scientific communication that a scientist needs to know and to master in order to successfully promote his or her research and career. Students will learn to recognize and construct logical arguments and become familiar with the structure of common publication formats. It will help students to advance their skills in communicating findings in textual, visual and verbal formats for a variety of audiences.

Scientific Writing and Presentations I

Scientific Writing and Presentations cover some of the areas of scientific communication that a scientist needs to know and to master in order to successfully promote his or her research and career. Students will learn to recognize and construct logical arguments and become familiar with the structure of common publication formats. It will help students to advance their skills in communicating findings in textual, visual and verbal formats for a variety of audiences.

Introduction to Programming and Data Structures

This course covers data and representations, functions, conditions, loops, strings, lists, sets, maps, hash tables, trees, stacks, graphs, object-oriented programming, programming interface and software engineering.

Principles of Machine Learning

This course covers maximum likelihood estimation, linear discriminant analysis, logistic regression, support vector machine, decision tree, linear regression, Bayesian inference, unsupervised learning, and semi-supervised learning.

Statistical Machine Learning

This course covers statistical inference, parametric method, sparsity, nonparametric methods, learning theory, kernel methods, computation algorithms and advanced learning topics.

Data Acquisition and Visualization

This course introduces the principles and methodologies for data acquisition and visualization, along with tools and techniques used to clean and process data for visual analysis. It also covers the practical software tools and languages such as Tableau, OpenRefine and Python/Matlab.

Interdisciplinary Data Analysis

This course covers interdisciplinary applications of data analysis for social science, behavioral modeling, health care, financial modeling, advanced manufacturing, etc. Students are expected to solve a number of practical problems by implementing data algorithms with R during their course projects.

Probability, Random Variables and Stochastic Processes

This course covers probability models, random variables with discrete and continuous distributions, independence, joint distributions, conditional distributions, expectations, functions of random variables, central limit theorem, stochastic processes, random walks, and Markov chains.

Advanced Linear Algebra

This course covers pseudo inverse, inner product, vector spaces and subspaces, orthogonality, linear transformations and operators, projections, matrix factorization, and singular value decomposition.

Numerical Analysis and Optimization

This course covers Gaussian elimination, LU factorization, Cholesky decomposition, QR decomposition, Newton-Raphson method, binary search, convex function, convex set, gradient method, Newton method, Lagrange dual, KKT condition, interior point method, conjugate gradient method, random walk, and stochastic optimization.

Algorithms and Databases

This course covers sorting, order statistics, binary search, dynamic programming, greedy algorithms, graph algorithms, minimum spanning trees, shortest paths, SQL, file organization, hashing, sorting, query, schema, transaction management, concurrency control, rash recovery, distributed database, and database as a service.

Bayesian and Modern Statistics

This course covers Bayesian inference, prior and posterior distributions, multi-level models, model checking and selection, and stochastic simulation by Markov Chain Monte Carlo.

Computer Vision

This course covers image formation and representation, camera geometry and calibration, multi-view geometry, stereo, 3D reconstruction from images, motion analysis, image segmentation, and object recognition.

Search Engines

This course covers Boolean retrieval, dictionary, index, vector space model, score, query, XML, language model, text classification, clustering, and web search.

Speech Recognition

This course covers speech production and perception, template-based recognition, hidden Markov modeling, language processing, robust recognition, speech inference, multimodal interface and applications.

Cloud Computing

This course covers cloud infrastructures, virtualization, distributed file system, software defined networks and storage, cloud storage, and programming models such as MapReduce and Spark.

Deep Learning

This course covers neural network, deep belief network, Boltzmann machine, convolutional neural network, recurrent neural network, and deep learning applications for speech, image, video, etc.

Probabilistic Graphical Models

This course covers Bayesian network, Markov random field, Gaussian graphical model, message passing, generalized linear model, expectation-maximization, factor analysis, state space model, conditional random field, variational inference, approximate inference, Dirichlet process, kernel graphical model and spectral algorithm.

Artificial Intelligence

This course covers uninformed search, informed search, constraint satisfaction, classical planning, neural network, deep learning, hidden Markov model, Bayesian network, Markov decision process, reinforcement learning, active learning and game theory.

Introduction to Data Science

As an introductory course in data science, this course will show students not only the big picture of data science but also the detailed essential skills of loading, cleaning, manipulating, visualizing, analyzing and interpreting data with hands on programming experience.

Image Data Science

This course introduces the logical structure of digital media and explores computational media manipulation. The course uses the Python programming language to explore media manipulation and transformation. Topics include spatial and temporal resolution, color, texture, filtering, compression and feature detection.

Computer Vision

This course covers image formation and representation, camera geometry and calibration, multi-view geometry, stereo, 3D reconstruction from images, motion analysis, image segmentation, and object recognition.

Prerequisite(s): STATS 302 Principles of Machine