PTCCS346 Exploratory Data Analysis Syllabus:
PTCCS346 Exploratory Data Analysis Syllabus – Anna University Part time Regulation 2023
COURSE OBJECTIVES:
To outline an overview of exploratory data analysis.
To implement data visualization using Matplotlib.
To perform univariate data exploration and analysis.
To apply bivariate data exploration and analysis.
To use Data exploration and visualization techniques for multivariate and time series data.
UNIT I EXPLORATORY DATA ANALYSIS
EDA fundamentals – Understanding data science – Significance of EDA – Making sense of data – Comparing EDA with classical and Bayesian analysis – Software tools for EDA – Visual Aids for EDA- Data transformation techniques-merging database, reshaping and pivoting, Transformation techniques.
UNIT II EDA USING PYTHON
Data Manipulation using Pandas – Pandas Objects – Data Indexing and Selection – Operating on Data – Handling Missing Data – Hierarchical Indexing – Combining datasets – Concat, Append, Merge and Join – Aggregation and grouping – Pivot Tables – Vectorized String Operations.
UNIT III UNIVARIATE ANALYSIS
Introduction to Single variable: Distribution Variables – Numerical Summaries of Level and Spread – Scaling and Standardizing – Inequality.
UNIT IV BIVARIATE ANALYSIS
Relationships between Two Variables – Percentage Tables – Analysing Contingency Tables – Handling Several Batches – Scatterplots and Resistant Lines.
UNIT V MULTIVARIATE AND TIME SERIES ANALYSIS
Introducing a Third Variable – Causal Explanations – Three-Variable Contingency Tables and Beyond – Fundamentals of TSA – Characteristics of time series data – Data Cleaning – Time-based indexing – Visualizing – Grouping – Resampling.
30 PERIODS
PRACTICAL EXERCISES: 30 PERIODS
1. Install the data Analysis and Visualization tool: R/ Python /Tableau Public/ Power BI.
2. Perform exploratory data analysis (EDA) with datasets like email data set. Export all your emails as a dataset, import them inside a pandas data frame, visualize them and get different insights from the data.
3. Working with Numpy arrays, Pandas data frames , Basic plots using Matplotlib.
4. Explore various variable and row filters in R for cleaning data. Apply various plot features in R on sample data sets and visualize.
5. Perform Time Series Analysis and apply the various visualization techniques.
6. Perform Data Analysis and representation on a Map using various Map data sets with Mouse Rollover effect, user interaction, etc..
7. Build cartographic visualization for multiple datasets involving various countries of the world; states and districts in India etc.
8. Perform EDA on Wine Quality Data Set.
9. Use a case study on a data set and apply the various EDA and visualization techniques and present an analysis report.
COURSE OUTCOMES:
At the end of this course, the students will be able to:
CO1: Understand the fundamentals of exploratory data analysis.
CO2: Implement the data visualization using Matplotlib.
CO3: Perform univariate data exploration and analysis.
CO4: Apply bivariate data exploration and analysis.
CO5: Use Data exploration and visualization techniques for multivariate and time series data.
TOTAL: 60 PERIODS
TEXT BOOKS:
1. Suresh Kumar Mukhiya, Usman Ahmed, “Hands-On Exploratory Data Analysis with Python”, Packt Publishing, 2020. (Unit 1)
2. Jake Vander Plas, “Python Data Science Handbook: Essential Tools for Working with Data”, First Edition, O Reilly, 2017. (Unit 2)
3. Catherine Marsh, Jane Elliott, “Exploring Data: An Introduction to Data Analysis for Social Scientists”, Wiley Publications, 2nd Edition, 2008. (Unit 3,4,5)
REFERENCES:
1. Eric Pimpler, Data Visualization and Exploration with R, GeoSpatial Training service, 2017.
2. Claus O. Wilke, “Fundamentals of Data Visualization”, O’reilly publications, 2019.
3. Matthew O. Ward, Georges Grinstein, Daniel Keim, “Interactive Data Visualization: Foundations, Techniques, and Applications”, 2nd Edition, CRC press, 2015.
