IT Training
Data Science & Analytics
Data Science and Analytics are pivotal disciplines at the forefront of modern decision-making and innovation. At Sazan Consulting, we leverage these disciplines to extract meaningful insights from data, guiding businesses towards informed strategies and smarter operations. From predictive modeling to machine learning algorithms, our expertise empowers organizations to uncover trends, optimize processes, and drive sustainable growth in today’s data-driven world.
The training primarily focuses on enhancing industry-level skills for working in the Data Science and Data Engineering domains. It emphasizes practical exercises, hands-on projects, and real-world applications rather than theoretical concepts.
- Duration: 6-7 weeks
- Classes only on Weekends
Program Outline
Pre-requisite for Program: Good communication skills, Python
Job roles: Data Scientist, Machine Learning Engineer, Data Analyst,
Business Intelligence Analyst, AI Specialist, Research Scientist
(AI/ML), NLP Engineer, Analytics Consultant, Data Product Manager.
Module 1: Introduction to Data Science
What is Data Science?
Definition, Overview, and Role of a Data Scientist
Data Science vs. Data Analytics vs. Business Intelligence
Real-World Use Case: Airbnb’s Data Science for Price Optimization
2) Why Learn Data Science?
Importance of Data-Driven Decisions
Industry Applications (Finance, Healthcare, E-commerce, etc.)
Case Study: How Netflix Uses Data Science to Enhance User
Experience
3) Data Science Workflow
Data Collection, Preparation, Modeling, Evaluation, and Deployment
Tools and Technologies (Python, R, SQL, Excel, etc.)
Module 2: Python for Data Science
1) Introduction to Python Programming
Python Basics: Variables, Data Types, Control Structures
Data Structures: Lists, Dictionaries, Tuples, Sets Functions, Loops, and Conditionals
2) NumPy for Numerical Computing
Arrays, Element-Wise Operations, Array Manipulation
Case Study: Simulating Data for Stock Market Predictions
3) Pandas for Data Manipulation
DataFrames, Series, Filtering, Merging, Grouping
Use Case: Analyzing Sales Data for Retail Companies
Module 3: Data Wrangling & Cleaning
1) Data Cleaning
Handling Missing Data, Duplicates, Outliers, and Inconsistent Data Tools: Pandas, NumPy, scikit-learn
2) Feature Engineering
Creating New Features, Encoding Categorical Variables, Scaling, and Normalization
Use Case: Building a Credit Risk Model for a Bank
3) Data Transformation
Log Transform, Binning, Polynomial Features
Use Case: House Price Prediction by Transforming Features for Better Accuracy
Module 4: Data Visualization
1) Introduction to Data Visualization
Importance of Visualization in Data Science
Tools: Matplotlib, Seaborn
2) Exploratory Data Analysis (EDA)
Creating Histograms, Box Plots, Pair Plots, Heatmaps
Case Study: Visualization of Customer Churn Data for a Telecom Company
3) Advanced Visualization Techniques
Using Plotly and Tableau for Interactive Dashboards
Case Study: Building a Sales Dashboard for a Retail Company
Module 5: Statistics & Probability for Data Science
1) Descriptive Statistics
Measures of Central Tendency (Mean, Median, Mode)
Measures of Dispersion (Variance, Standard Deviation, Skewness, Kurtosis)
2) Probability Distributions
Normal Distribution, Poisson, Binomial, Uniform
Use Case: Predicting Sales Trends Using Probability Distributions
3) Hypothesis Testing
Null and Alternative Hypothesis, T-tests, Chi-Square, P-Values
Use Case: A/B Testing for Website Optimization
Module 6: Machine Learning Basics
1) Introduction to Machine Learning
Supervised vs. Unsupervised Learning, Terminology, and Concepts
2) Supervised Learning Algorithms
Linear Regression, Logistic Regression
Use Case: Predicting House Prices Using Linear Regression
3) Unsupervised Learning Algorithms
Clustering (K-Means, Hierarchical), Dimensionality Reduction (PCA)
Use Case: Customer Segmentation Using K-Means Clustering
4) Model Evaluation
Train/Test Split, Cross-Validation, Metrics (Accuracy, Precision, Recall, F1-Score)
Use Case: Evaluating a Fraud Detection Model in Banking
Module 7: Advanced Machine Learning
1) Decision Trees and Random Forests
Building Trees, Feature Importance, Overfitting
Use Case: Predicting Employee Attrition Using Random Forests
2) Gradient Boosting & XGBoost
Boosting Techniques, Hyperparameter Tuning
Use Case: Predicting Loan Default Using XGBoost
3) Support Vector Machines
Concepts, Kernels, and Hyperplane
Use Case: Image Classification Using SVM
Module 8: Deep Learning
1) Introduction to Neural Networks
Structure of a Neural Network, Forward and Backpropagation
Use Case: Handwritten Digit Classification Using Neural Networks
2) Convolutional Neural Networks (CNNs)
Convolutions, Pooling, Dropout, and Architectures (LeNet, VGG)
Use Case: Image Recognition for Retail Product Detection
3) Recurrent Neural Networks (RNNs) and LSTMs
Sequential Data, Long Short-Term Memory (LSTM) Use Case: Predicting Stock Prices Using LSTMs
Module 9: Natural Language Processing (NLP)
1) Introduction to NLP
Tokenization, Stop Words, Lemmatization, and Stemming
Use Case: Sentiment Analysis of Movie Reviews
2) Text Vectorization
TF-IDF, Word2Vec, Embeddings
Use Case: Building a Text Classification Model for Spam Detection
3) Advanced NLP Techniques
BERT, GPT, Transformers
Use Case: Building a Chatbot Using Transformer Models
Module 10: Big Data & Cloud Computing
1) Introduction to Big Data
Apache, Spark, Distributed Computing
Use Case: Processing Large Datasets in Financial Services
2) Data Science in the Cloud
AWS, Google Cloud, Azure for Data Science
Use Case: Deploying Machine Learning Models in AWS SageMaker
Module 11: Model Deployment & MLOps
1) Introduction to MLOps
CI/CD for ML Models, Model Monitoring, and Management
Use Case: Deploying a Real-Time Fraud Detection Model in
Production using Docker and Kubernetes
2) Model Deployment Techniques
Flask, FastAPI, Docker, Kubernetes for Model Serving
Use Case: Building a REST API for a Prediction Model
Module 12: Capstone Project & Real-World Use Cases
1) Capstone Project
Choose a Real-World Data Science Problem (Predictive Analytics, NLP, or Computer Vision)
Full Pipeline: Data Collection, Cleaning, Modeling, Evaluation, and Deployment
2) Real-World Use Cases will be discussed in class.
Pre-requisite for Program:
- Familiar with programming language like Python.
- Familiar with SQL, NoSQL.
- Basic understanding of Mathematics and Statistics.
- Basic Git knowledge (optional).
- Awareness of cloud resources like google colab.
- Must be available for 8 hours class per week, and at least 2 hours