Data Science Full Stack Roadmap 2022
Python, Data Structure, Pandas, Numpy, Matplotlib, Statistics, Machine Learning, NLP, Computer Vision, PyTorch, SQL, Big Data, PySpark, Azure
Table of contents
- 1 | Python Programming and Logic Building
- 2 | Data Structure & Algorithms
- 3 | Pandas Numpy Matplotlib
- 4 | Statistics
- 5 | Machine Learning
- 6 | Natural Language Processing
- 7 | Computer Vision
- 8 | Data Visualization with Tableau
- 9 | Structure Query Language (SQL)
- 10 | BigData and PySpark
- 11 | Development Operations with Azure
- 12 | Five Major Projects and Git
I completed my Master of Technology in Data Science, no doubt it is an amazing field. I studied 18 different subjects, completed 1 Thesis, and 1 capstone project in my 2 years of MTech journey.
With all those subjects I am able to build a roadmap for all those who want to get a kick start as Data Scientist.
If you don't want to spend 2 years on a master's degree but want to learn on your own, this will be the best roadmap for you. It will give you a bird' eye view of where you are now and where you want to be in the future.
The roadmap is divided into 12 sections
- Python Programming and Logic Building
- Data Structure & Algorithms
- Pandas Numpy Matplotlib
- Statistics
- Machine Learning
- Natural Language Processing
- Computer Vision with PyTorch
- Data Visualization with Tableau
- Structure Query Language (SQL)
- Big Data and PySpark
- Development Operations with Azure
- Five Major Projects and Git
To understand the complexities of any technology, clear the fundamentals first.
Let's Go!!
1 | Python Programming and Logic Building
I will prefer Python Programming Language. Python is the best for starting your programming journey. Here is the roadmap of python for logic building.
- Python basics, Variables, Operators, Conditional Statements
- List and Strings
- While Loop, Nested Loops, Loop Else
- For Loop, Break, and Continue statements
- Functions, Return Statement, Recursion
- Dictionary, Tuple, Set
- File Handling, Exception Handling
- Object-Oriented Programming
- Modules and Packages
Get a detailed Python Core Roadmap
2 | Data Structure & Algorithms
Data Structure is the most important thing to learn not only for data scientists but for all the people working in computer science. With data structure, you get an internal understanding of the working of everything in software.
Understand these topics
- Types of Algorithm Analysis
- Asymptotic Notation, Big-O, Omega, Theta
- Stacks
- Queues
- Linked List
- Trees
- Graphs
- Sorting
- Searching
- Hashing
3 | Pandas Numpy Matplotlib
Python supports n-dimensional arrays with Numpy. For data in 2-dimensions, Pandas is the best library for analysis. You can use other tools but tools have drag-and-drop features and have limitations. Pandas can be customized as per the need as we can code depending upon the real-life problem.
Numpy
- Vectors, Matrix
- Operations on Matrix
- Mean, Variance, and Standard Deviation
- Reshaping Arrays
- Transpose and Determinant of Matrix
- Diagonal Operations, Trace
- Add, Subtract, Multiply, Dot, and Cross Product.
Pandas
- Series and DataFrames
- Slicing, Rows, and Columns
- Operations on DataFrame
- Different ways to create DataFrame
- Read, Write Operations with CSV files
- Handling Missing values, replace values, and Regular Expression
- GroupBy and Concatenation
Matplotlib
- Graph Basics
- Format Strings in Plots
- Label Parameters, Legend
- Bar Chart, Pie Chart, Histogram, Scatter Plot
4 | Statistics
Descriptive Statistics
- Measure of Frequency and Central Tendency
- Measure of Dispersion
Probability Distribution
- Gaussian Normal Distribution
- Skewness and Kurtosis
Regression Analysis
- Continuous and Discrete Functions
- Goodness of Fit
- Normality Test
ANOVA
- Homoscedasticity
- Linear and Non-Linear Relationship with Regression
Inferential Statistics
- t-Test
- z-Test
Hypothesis Testing
- Type I and Type II errors
- t-Test and its types
- One way ANOVA
- Two way ANOVA
- Chi-Square Test
- Implementation of continuous and categorical data
5 | Machine Learning
The best way to master machine learning algorithms is to work with the Scikit-Learn framework. Scikit-Learn contains predefined algorithms and you can work with them just by generating the object of the class. These are the algorithm you must know including the types of Supervised and Unsupervised Machine Learning:
- Linear Regression
- Logistic Regression
- Decision Tree
- Gradient Descent
- Random Forest
- Ridge and Lasso Regression
- Naive Bayes
- Support Vector Machine
- KMeans Clustering
Other Concepts and Topics for ML
- Measuring Accuracy
- Bias-Variance Trade-off
- Applying Regularization
- Elastic Net Regression
- Predictive Analytics
- Exploratory Data Analysis
6 | Natural Language Processing
If you are interested in working with Text, you should do some of the work an NLP Engineer do and understand the working of Language models.
- Sentiment analysis
- POS Tagging, Parsing,
- Text preprocessing
- Stemming and Lemmatization
- Sentiment classification using Naive Bayes
- TF-IDF, N-gram,
- Machine Translation, BLEU Score
- Text Generation, Summarization, ROUGE Score
- Language Modeling, Perplexity
- Building a text classifier
- Identifying the gender
7 | Computer Vision
To work on image and video analytics we can master computer vision. To work on computer vision we have to understand images.
- PyTorch Tensors
- Understanding Pretrained models like AlexNet, ImageNet, ResNet.
- Neural Networks
- Building a perceptron
- Building a single layer neural network
- Building a deep neural network
Recurrent neural network for sequential data analysis
Convolutional Neural Networks
- Understanding the ConvNet topology
- Convolution layers
- Pooling layers
Image Content Analysis
- Operating on images using OpenCV-Python
- Detecting edges
- Histogram equalization
- Detecting corners
- Detecting SIFT feature points
8 | Data Visualization with Tableau
How to use it Visual Perception
Tableau
- What is it, How it works, Why Tableau
- Connecting to Data
- Building charts
- Calculations
Dashboards
- Sharing our work
- Advanced Charts, Calculated Fields, Calculated Aggregations
- Conditional Calculation, Parameterized Calculation
9 | Structure Query Language (SQL)
Setup SQL server
- Basics of SQL
- Writing queries
- Data Types
Select
- Creating and deleting tables
- Filtering data
- Order
- Aggregations
Truncate
Primary Key
- Foreign Key
- Union
MySQL
Complex Questions
- Solving Interview Questions
10 | BigData and PySpark
BigData
- What is BigData?
- How is BigData applied within Business?
PySpark
- Resilient Distributed Datasets
- Schema
- Lambda Expressions
- Transformations
Actions
Data Modeling
- Duplicate Data
- Descriptive Analysis on Data
Visualizations
ML lib
- ML Packages
Pipelines
Streaming
Packaging Spark Applications
11 | Development Operations with Azure
Foundation of Data Systems
- Data Models
- Storage
- Encoding
Distributed Data
- Replication
- Partitioning
Derived Data
- Batch Processing
- Stream Processing
Microsoft Azure
- Azure Data Workloads
- Azure Data Factory
- Azure HDInsights
- Azure Databricks
- Azure Synapse Analytics
- Relational Database in Azure
- Non-relational Database in Azure
12 | Five Major Projects and Git
Git - Version Control System
We follow project-based learning and we will work on all the projects in parallel.
Join the Data Science & ML Full Stack WhatsApp Group here:
chat.whatsapp.com/IzkKGbimpB50Sxyg2mgn6E
Connect with me on these platforms:
Twitter: twitter.com/hemansnation
LinkedIn: linkedin.com/in/hemansnation
GitHub: github.com/hemansnation
Instagram: instagram.com/masterdexter.ai