Data Analysis using Python
Data Analysis using Python
Course Title: Data Analysis using Python
Duration: 15 Weeks (3 lectures per week)
Total Lectures: 45
Target Audience: Beginners to Intermediate learners
Prerequisites: Basic Python knowledge
Week 1: Introduction to Data Analysis & Python Basics
Lecture 1: What is Data Analysis? Applications and Use Cases
Lecture 2: Python Refresher (Data Types, Loops, Functions)
Lecture 3: Introduction to Jupyter Notebook, Pandas, and Numpy
Week 2: Working with Data in Pandas
Lecture 4: Loading Data from CSV, Excel, and Databases
Lecture 5: DataFrames and Series - Basic Operations
Lecture 6: Data Cleaning (Handling Missing Values, Duplicates)
Week 3: Data Transformation and Preprocessing
Lecture 7: Filtering, Sorting, and Aggregating Data
Lecture 8: Applying Functions and Lambda Expressions in Pandas
Lecture 9: Merging, Joining, and Concatenating DataFrames
Week 4: Exploratory Data Analysis (EDA) & Visualization
Lecture 10: Introduction to Data Visualization (Matplotlib & Seaborn)
Lecture 11: Univariate and Bivariate Analysis
Lecture 12: Advanced Plotting (Boxplots, Heatmaps, Pair Plots)
Week 5: Statistical Analysis and Probability
Lecture 13: Basic Statistics (Mean, Median, Mode, Variance, Standard Deviation)
Lecture 14: Probability Distributions and Sampling Techniques
Lecture 15: Hypothesis Testing (T-tests, Chi-square tests)
Week 6: Working with Time Series Data
Lecture 16: Introduction to Time Series Data
Lecture 17: Time Series Analysis (Rolling Mean, Resampling)
Lecture 18: Time Series Forecasting Basics
Week 7: Introduction to SQL for Data Analysis
Lecture 19: Basics of SQL and Connecting Python with SQL Databases
Lecture 20: Performing CRUD Operations using SQL
Lecture 21: Aggregations, Joins, and Subqueries
Week 8: Feature Engineering and Data Preprocessing
Lecture 22: Handling Outliers and Feature Scaling
Lecture 23: Encoding Categorical Variables
Lecture 24: Feature Selection Techniques
Week 9: Introduction to Machine Learning for Data Analysis
Lecture 25: Supervised vs Unsupervised Learning
Lecture 26: Introduction to Scikit-learn and ML Workflow
Lecture 27: Building a Simple Linear Regression Model
Week 10: Classification and Clustering Techniques
Lecture 28: Logistic Regression and Decision Trees
Lecture 29: k-Means and Hierarchical Clustering
Lecture 30: Evaluating ML Models (Precision, Recall, F1-score)
Week 11: Web Scraping and APIs for Data Collection
Lecture 31: Web Scraping using BeautifulSoup
Lecture 32: Introduction to APIs (REST, JSON)
Lecture 33: Extracting Data from APIs using Python
Week 12: Big Data and Cloud Platforms
Lecture 34: Introduction to Big Data and Hadoop
Lecture 35: Working with Google Colab & Kaggle Datasets
Lecture 36: Basics of AWS and Google Cloud for Data Analysis
Week 13: Automation & Data Pipelines
Lecture 37: Automating Reports with Pandas and Excel
Lecture 38: Introduction to Airflow and ETL Pipelines
Lecture 39: Deploying Data Analysis Scripts on Cloud
Week 14: Real-World Projects and Case Studies
Lecture 40: Analyzing Social Media Data (Twitter, Facebook)
Lecture 41: Customer Segmentation using Clustering
Lecture 42: Predictive Analysis on Real-World Datasets
Week 15: Capstone Project & Career Guidance
Lecture 43: Capstone Project Work
Lecture 44: Presentations & Peer Reviews
Lecture 45: Career Guidance - Resume Building, Interview Preparation
Final Deliverables:
One Capstone Project
Hands-on Assignments
A Portfolio with Real-World Projects
Would you like modifications based on your students’ level or focus areas?
Here’s a streamlined 15-week schedule for Data Analysis Using Python, excluding Python basics, designed for learners already proficient in Python syntax and core programming concepts. The focus is on data analysis tools, statistical methods, machine learning, and real-world applications:
Week 1: Advanced Python for Data Analysis
- Lecture 1: Functional Programming & Efficiency
- Lambda functions,
map
/filter
/reduce
, decorators, and generators.
- Lecture 2: Working with Jupyter & Data Tools
- Jupyter Notebook/Lab workflows, magic commands (
%timeit
, %%capture
), and Markdown integration.
- Lecture 3: Data Analysis Ecosystem Overview
- Introduction to Pandas, NumPy, and visualization libraries.
Week 2: NumPy for Numerical Computing
- Lecture 1: Array Operations & Vectorization
- Creating arrays, universal functions (
ufuncs
), and broadcasting.
- Lecture 2: Advanced Indexing & Masking
- Boolean arrays, structured arrays, and memory optimization.
- Lecture 3: Numerical Applications
- Solving linear equations, Fourier transforms, and random sampling.
Week 3: Pandas Fundamentals
- Lecture 1: Series & DataFrames Deep Dive
- Indexing, hierarchical indexing (
MultiIndex
), and missing data handling.
- Lecture 2: Data Ingestion & Export
- Reading/writing CSV, Excel, JSON, SQL databases, and web APIs.
- Lecture 3: Exploratory Data Analysis (EDA)
- Summary stats, correlation matrices, and basic visualizations.
Week 4: Data Manipulation & Cleaning
- Lecture 1: Advanced Filtering & Grouping
- Boolean indexing,
query()
method, and groupby
aggregations.
- Lecture 2: Merging & Reshaping Data
merge
, concat
, pivot_table
, and melt
.
- Lecture 3: Time Series Handling
datetime
conversions, resampling, and rolling statistics.
Week 5: Data Visualization
- Lecture 1: Matplotlib Customization
- Subplots, annotations, and styles.
- Lecture 2: Advanced Seaborn
- Heatmaps, pair plots, and regression plots.
- Lecture 3: Interactive Visualizations (Plotly)
- Dashboards, animations, and 3D plots.
Week 6: Statistical Analysis
- Lecture 1: Probability Distributions & Fitting
- Normal, binomial, and Poisson distributions; QQ plots.
- Lecture 2: Hypothesis Testing & Confidence Intervals
- t-tests, ANOVA, chi-square tests, and bootstrapping.
- Lecture 3: Bayesian Inference Basics
- Bayes’ theorem, prior/posterior distributions, and A/B testing.
Week 7: Data Preprocessing
- Lecture 1: Advanced Missing Data Techniques
- MICE (Multiple Imputation by Chained Equations), KNN imputation.
- Lecture 2: Outlier Detection
- Mahalanobis distance, DBSCAN, and Isolation Forest.
- Lecture 3: Feature Engineering
- Polynomial features, interaction terms, and target encoding.
Week 8: Machine Learning Foundations
- Lecture 1: Scikit-Learn Pipelines
- Preprocessing, feature unions, and cross-validation.
- Lecture 2: Regression Models
- Linear regression, regularization (Ridge/Lasso/ElasticNet).
- Lecture 3: Model Evaluation
- Metrics (MSE, R², MAE), learning curves, and bias-variance tradeoff.
Week 9: Classification & Clustering
- Lecture 1: Logistic Regression & SVM
- Decision boundaries, kernel methods, and class imbalance.
- Lecture 2: Decision Trees & Random Forests
- Splitting criteria, feature importance, and ensemble methods.
- Lecture 3: Clustering Algorithms
- K-means, hierarchical clustering, and silhouette analysis.
Week 10: Advanced Machine Learning
- Lecture 1: Gradient Boosting Machines (GBMs)
- XGBoost, LightGBM, and CatBoost.
- Lecture 2: Hyperparameter Tuning
- Grid search, random search, and Bayesian optimization.
- Lecture 3: Case Study: Predictive Modeling
- End-to-end project (e.g., credit risk prediction).
Week 11: NLP & Text Analysis
- Lecture 1: Text Preprocessing
- Tokenization, lemmatization, and stopword removal.
- Lecture 2: Vectorization Techniques
- Bag-of-words, TF-IDF, and word embeddings (Word2Vec, GloVe).
- Lecture 3: Sentiment Analysis & Topic Modeling
- Using spaCy and Gensim for LDA.
Week 12: Time Series Analysis
- Lecture 1: Time Series Decomposition
- Trend, seasonality, and noise decomposition.
- Lecture 2: Forecasting Models
- ARIMA, SARIMA, and Facebook Prophet.
- Lecture 3: Case Study: Demand Forecasting
- Using real-world sales data.
- Lecture 1: Parallel Computing with Dask
- Handling large datasets in Pandas-like workflows.
- Lecture 2: PySpark Basics
- Spark DataFrames, MLlib, and cluster computing concepts.
- Lecture 3: Cloud Integration (AWS/GCP)
- Storing data in S3/BigQuery and running analysis on EC2/Cloud VMs.
Week 14: Ethics & Deployment
- Lecture 1: Ethics in Data Analysis
- Bias detection, fairness metrics, and GDPR compliance.
- Lecture 2: Model Deployment
- Building APIs with Flask/FastAPI and Docker containers.
- Lecture 3: Visualization Dashboards
- Using Dash or Streamlit for interactive reporting.
Week 15: Final Project & Presentations
- Lectures: Capstone Project
- Students tackle a real-world dataset (e.g., COVID-19 trends, stock market analysis).
- Deliverables: Cleaned dataset, visualizations, model(s), and insights.
- Peer reviews and final presentations.
Assessment Structure
- Weekly Assignments: Focused on that week’s tools (e.g., Pandas manipulations, Seaborn plots).
- Midterm Project (Week 8): Exploratory analysis of a dataset with a written report.
- Final Project (Weeks 14-15): 40% of the grade, emphasizing end-to-end workflow and presentation.
This schedule prioritizes applied learning with minimal theory lectures. Adjustments can include:
- Adding Kaggle competitions for practical challenges.
- Incorporating SQL for data extraction (e.g., Week 13).
- Expanding on cloud tools if needed.