MA3402 Mathematical Foundations for Data Science Syllabus:
MA3402 Mathematical Foundations for Data Science Syllabus – Anna University Regulation 2021
UNIT I VECTOR SPACES & LINEAR MAPS
Vector Spaces, Definition of Vector spaces, Subspaces, sums of Subspaces, Direct Sums, Span and Linear Independence, bases, dimension – Definition of Linear Maps – Algebraic Operations on L(V,W) – Null spaces and Ranges – Fundamental Theorems of Linear Maps – Representing a Linear Map by a Matrix- Invertible Linear Maps- Isomorphic Vector spaces-Linear Map as Matrix Multiplication-Operators-Products and Quotients of Vector Spaces.
Suggested Activities:
● Exploration on the usage of vector spaces and linear maps
● Solving a problem by choosing appropriate representation for the given data.
Suggested Evaluation Methods:
● Assignments on problem solving using vector space and linear maps
● Tutorials on vector space and linear maps
UNIT II EIGENVALUES, EIGENVECTORS, AND INNER PRODUCT SPACES
Eigenvalues and Eigenvectors – Eigenvectors and Upper Triangular matrices – Eigenspaces and Diagonal Matrices – Inner Products and Norms – Linear functionals on Inner Product spacesOrthogonal Complements and Minimization Problems- Adjoints – Self-Adjoint Operators -Normal Operators.
Suggested Activities:
● Exploration on the usage of eigenvalues and eigenvectors
● External learning – different structures on norms and inner products
● Identifying the relationship between inner products and norms
Suggested Evaluation Methods:
● Assignments on problem solving using eigenvalues and eigenvectors
● Tutorials on inner products and norms
UNIT III PROBABILITY, RANDOM PROCESS AND STATISTICAL METHODS
Probability theory and axioms- Random variables- Discrete and Continuous Random Variables and Probability Distributions- Joint Probability Distributions- Random Process: Definitions – stationary in random processes – strict sense and wide sense stationary processes – autocorrelation and properties- Poisson and Gaussian processes and properties – Expectations and moments; Covariance and correlation – Statistics and sampling distributions – Method of Moments.
Suggested Activities:
● Demonstrating the probability distribution of different data sets.
● Exploring the correlations between the different features of data
● Identifying the use of statistical methods in data analysis
Suggested Evaluation Methods:
● Assignments on probability and random process
● Tutorials on statistical methods in data analysis
UNIT IV HIGH-DIMENSIONAL SPACE
The Law of Large Numbers – The Geometry of High Dimensions- Properties of the Unit Ball – Generating Points Uniformly at Random from a Ball – Gaussians in High Dimension – Random Projection and Johnson-Linden Strauss Lemma Dimension -Separating Gaussians- Fitting a Spherical Gaussian to Data.
Suggested Activities:
● Exploring the Geometry of High Dimensions
● Application of Random Projection and Johnson-Linden Strauss Lemma
Suggested Evaluation Methods:
● Assignments on high dimensional data representation and analysis
● Tutorials on Random Projection
UNIT V SINGULAR VALUE DECOMPOSITION
Singular Vectors – Singular Value Decomposition (SVD)- Best Rank-k Approximations -Left Singular Vectors-Power Method for Singular Value Decomposition- Applications of Singular Value Decomposition
Suggested Activities
● Exploring the Singular Value Decomposition
● Application of Singular Value Decomposition
Suggested Evaluation Methods
● Assignments on Singular Value Decomposition
● Tutorials on Best Rank-k Approximations
TOTAL: 45 PERIODS
COURSE OUTCOMES:
CO1: Find the basis and dimension of vector space and linear map
CO2: Obtain eigenvalues and eigenvectors of the data and represent them inner product space
CO3: Apply probability and random process concepts to in data analysis
CO4: Represent the large dimension data in high dimensional space and perform analysis
CO5: Apply Singular Value Decomposition on the data to simplify the problem
CO6: Demonstrate the use of mathematics in data science through a case study.
TEXT BOOKS:
1. S. Axler, Linear algebra done right, Springer,2017.
2. Peter Olofsson, Mikael Andersson, Probability, Statistics, and Stochastic Processes, 2nd Edition, Published by John Wiley & Sons, Inc., Hoboken, New Jersey 2012.
3. Avrim Blum, John Hopcroft, and Ravindran Kannan, Foundations of Data Science, Cambridge University Press; 1st edition 2020.
REFERENCES:
1. E. Davis, Linear algebra and probability for computer science applications, CRC Press, 2012.
2. J. V. Kepner and J. R. Gilbert, Graph algorithms in the language of linear algebra, Society for Industrial and Applied Mathematics,2011.
3. Elden Lars, Matrix methods in data-mining and pattern recognition, Society for Industrial and Applied Mathematics,2007.
