Learn with Yasir

Share Your Feedback

Data Analysis using Python


Data Analysis using Python

Course Title: Data Analysis using Python

Duration: 15 Weeks (3 lectures per week)

Total Lectures: 45

Target Audience: Beginners to Intermediate learners

Prerequisites: Basic Python knowledge


Week 1: Introduction to Data Analysis & Python Basics

Lecture 1: What is Data Analysis? Applications and Use Cases

Lecture 2: Python Refresher (Data Types, Loops, Functions)

Lecture 3: Introduction to Jupyter Notebook, Pandas, and Numpy


Week 2: Working with Data in Pandas

Lecture 4: Loading Data from CSV, Excel, and Databases

Lecture 5: DataFrames and Series - Basic Operations

Lecture 6: Data Cleaning (Handling Missing Values, Duplicates)


Week 3: Data Transformation and Preprocessing

Lecture 7: Filtering, Sorting, and Aggregating Data

Lecture 8: Applying Functions and Lambda Expressions in Pandas

Lecture 9: Merging, Joining, and Concatenating DataFrames


Week 4: Exploratory Data Analysis (EDA) & Visualization

Lecture 10: Introduction to Data Visualization (Matplotlib & Seaborn)

Lecture 11: Univariate and Bivariate Analysis

Lecture 12: Advanced Plotting (Boxplots, Heatmaps, Pair Plots)


Week 5: Statistical Analysis and Probability

Lecture 13: Basic Statistics (Mean, Median, Mode, Variance, Standard Deviation)

Lecture 14: Probability Distributions and Sampling Techniques

Lecture 15: Hypothesis Testing (T-tests, Chi-square tests)


Week 6: Working with Time Series Data

Lecture 16: Introduction to Time Series Data

Lecture 17: Time Series Analysis (Rolling Mean, Resampling)

Lecture 18: Time Series Forecasting Basics


Week 7: Introduction to SQL for Data Analysis

Lecture 19: Basics of SQL and Connecting Python with SQL Databases

Lecture 20: Performing CRUD Operations using SQL

Lecture 21: Aggregations, Joins, and Subqueries


Week 8: Feature Engineering and Data Preprocessing

Lecture 22: Handling Outliers and Feature Scaling

Lecture 23: Encoding Categorical Variables

Lecture 24: Feature Selection Techniques


Week 9: Introduction to Machine Learning for Data Analysis

Lecture 25: Supervised vs Unsupervised Learning

Lecture 26: Introduction to Scikit-learn and ML Workflow

Lecture 27: Building a Simple Linear Regression Model


Week 10: Classification and Clustering Techniques

Lecture 28: Logistic Regression and Decision Trees

Lecture 29: k-Means and Hierarchical Clustering

Lecture 30: Evaluating ML Models (Precision, Recall, F1-score)


Week 11: Web Scraping and APIs for Data Collection

Lecture 31: Web Scraping using BeautifulSoup

Lecture 32: Introduction to APIs (REST, JSON)

Lecture 33: Extracting Data from APIs using Python


Week 12: Big Data and Cloud Platforms

Lecture 34: Introduction to Big Data and Hadoop

Lecture 35: Working with Google Colab & Kaggle Datasets

Lecture 36: Basics of AWS and Google Cloud for Data Analysis


Week 13: Automation & Data Pipelines

Lecture 37: Automating Reports with Pandas and Excel

Lecture 38: Introduction to Airflow and ETL Pipelines

Lecture 39: Deploying Data Analysis Scripts on Cloud


Week 14: Real-World Projects and Case Studies

Lecture 40: Analyzing Social Media Data (Twitter, Facebook)

Lecture 41: Customer Segmentation using Clustering

Lecture 42: Predictive Analysis on Real-World Datasets


Week 15: Capstone Project & Career Guidance

Lecture 43: Capstone Project Work

Lecture 44: Presentations & Peer Reviews

Lecture 45: Career Guidance - Resume Building, Interview Preparation


Final Deliverables:

One Capstone Project

Hands-on Assignments

A Portfolio with Real-World Projects

Would you like modifications based on your students’ level or focus areas?

Here’s a streamlined 15-week schedule for Data Analysis Using Python, excluding Python basics, designed for learners already proficient in Python syntax and core programming concepts. The focus is on data analysis tools, statistical methods, machine learning, and real-world applications:


Week 1: Advanced Python for Data Analysis

  • Lecture 1: Functional Programming & Efficiency
    • Lambda functions, map/filter/reduce, decorators, and generators.
  • Lecture 2: Working with Jupyter & Data Tools
    • Jupyter Notebook/Lab workflows, magic commands (%timeit, %%capture), and Markdown integration.
  • Lecture 3: Data Analysis Ecosystem Overview
    • Introduction to Pandas, NumPy, and visualization libraries.

Week 2: NumPy for Numerical Computing

  • Lecture 1: Array Operations & Vectorization
    • Creating arrays, universal functions (ufuncs), and broadcasting.
  • Lecture 2: Advanced Indexing & Masking
    • Boolean arrays, structured arrays, and memory optimization.
  • Lecture 3: Numerical Applications
    • Solving linear equations, Fourier transforms, and random sampling.

Week 3: Pandas Fundamentals

  • Lecture 1: Series & DataFrames Deep Dive
    • Indexing, hierarchical indexing (MultiIndex), and missing data handling.
  • Lecture 2: Data Ingestion & Export
    • Reading/writing CSV, Excel, JSON, SQL databases, and web APIs.
  • Lecture 3: Exploratory Data Analysis (EDA)
    • Summary stats, correlation matrices, and basic visualizations.

Week 4: Data Manipulation & Cleaning

  • Lecture 1: Advanced Filtering & Grouping
    • Boolean indexing, query() method, and groupby aggregations.
  • Lecture 2: Merging & Reshaping Data
    • merge, concat, pivot_table, and melt.
  • Lecture 3: Time Series Handling
    • datetime conversions, resampling, and rolling statistics.

Week 5: Data Visualization

  • Lecture 1: Matplotlib Customization
    • Subplots, annotations, and styles.
  • Lecture 2: Advanced Seaborn
    • Heatmaps, pair plots, and regression plots.
  • Lecture 3: Interactive Visualizations (Plotly)
    • Dashboards, animations, and 3D plots.

Week 6: Statistical Analysis

  • Lecture 1: Probability Distributions & Fitting
    • Normal, binomial, and Poisson distributions; QQ plots.
  • Lecture 2: Hypothesis Testing & Confidence Intervals
    • t-tests, ANOVA, chi-square tests, and bootstrapping.
  • Lecture 3: Bayesian Inference Basics
    • Bayes’ theorem, prior/posterior distributions, and A/B testing.

Week 7: Data Preprocessing

  • Lecture 1: Advanced Missing Data Techniques
    • MICE (Multiple Imputation by Chained Equations), KNN imputation.
  • Lecture 2: Outlier Detection
    • Mahalanobis distance, DBSCAN, and Isolation Forest.
  • Lecture 3: Feature Engineering
    • Polynomial features, interaction terms, and target encoding.

Week 8: Machine Learning Foundations

  • Lecture 1: Scikit-Learn Pipelines
    • Preprocessing, feature unions, and cross-validation.
  • Lecture 2: Regression Models
    • Linear regression, regularization (Ridge/Lasso/ElasticNet).
  • Lecture 3: Model Evaluation
    • Metrics (MSE, R², MAE), learning curves, and bias-variance tradeoff.

Week 9: Classification & Clustering

  • Lecture 1: Logistic Regression & SVM
    • Decision boundaries, kernel methods, and class imbalance.
  • Lecture 2: Decision Trees & Random Forests
    • Splitting criteria, feature importance, and ensemble methods.
  • Lecture 3: Clustering Algorithms
    • K-means, hierarchical clustering, and silhouette analysis.

Week 10: Advanced Machine Learning

  • Lecture 1: Gradient Boosting Machines (GBMs)
    • XGBoost, LightGBM, and CatBoost.
  • Lecture 2: Hyperparameter Tuning
    • Grid search, random search, and Bayesian optimization.
  • Lecture 3: Case Study: Predictive Modeling
    • End-to-end project (e.g., credit risk prediction).

Week 11: NLP & Text Analysis

  • Lecture 1: Text Preprocessing
    • Tokenization, lemmatization, and stopword removal.
  • Lecture 2: Vectorization Techniques
    • Bag-of-words, TF-IDF, and word embeddings (Word2Vec, GloVe).
  • Lecture 3: Sentiment Analysis & Topic Modeling
    • Using spaCy and Gensim for LDA.

Week 12: Time Series Analysis

  • Lecture 1: Time Series Decomposition
    • Trend, seasonality, and noise decomposition.
  • Lecture 2: Forecasting Models
    • ARIMA, SARIMA, and Facebook Prophet.
  • Lecture 3: Case Study: Demand Forecasting
    • Using real-world sales data.

Week 13: Big Data & Cloud Tools

  • Lecture 1: Parallel Computing with Dask
    • Handling large datasets in Pandas-like workflows.
  • Lecture 2: PySpark Basics
    • Spark DataFrames, MLlib, and cluster computing concepts.
  • Lecture 3: Cloud Integration (AWS/GCP)
    • Storing data in S3/BigQuery and running analysis on EC2/Cloud VMs.

Week 14: Ethics & Deployment

  • Lecture 1: Ethics in Data Analysis
    • Bias detection, fairness metrics, and GDPR compliance.
  • Lecture 2: Model Deployment
    • Building APIs with Flask/FastAPI and Docker containers.
  • Lecture 3: Visualization Dashboards
    • Using Dash or Streamlit for interactive reporting.

Week 15: Final Project & Presentations

  • Lectures: Capstone Project
    • Students tackle a real-world dataset (e.g., COVID-19 trends, stock market analysis).
    • Deliverables: Cleaned dataset, visualizations, model(s), and insights.
    • Peer reviews and final presentations.

Assessment Structure

  • Weekly Assignments: Focused on that week’s tools (e.g., Pandas manipulations, Seaborn plots).
  • Midterm Project (Week 8): Exploratory analysis of a dataset with a written report.
  • Final Project (Weeks 14-15): 40% of the grade, emphasizing end-to-end workflow and presentation.

This schedule prioritizes applied learning with minimal theory lectures. Adjustments can include:

  • Adding Kaggle competitions for practical challenges.
  • Incorporating SQL for data extraction (e.g., Week 13).
  • Expanding on cloud tools if needed.