A complete educational journey through classical Machine Learning — built from scratch, line by line.
Understand the math. Build the code. Train the mind.
⭐ Star this repo • 🍴 Fork it • 📚 Documentation • 🚀 Get Started
- 🎯 Overview
- ✨ Features
- 🏗️ Tech Stack
- 📂 Repository Structure
- 🚀 Quick Start
- 🧮 Algorithms Implemented
- 📊 Dataset Collection
- 📸 Visualizations
- 🎓 Learning Path
- 🤝 Contributing
- 👨💻 Author
- 📄 License
- ⭐ Support
Welcome to Machine Learning Algorithms — your comprehensive playground for mastering classical ML! This repository features hand-crafted implementations of every major algorithm, from the ground up.
- Learn by Building: Every algorithm implemented from scratch using pure NumPy
- Compare & Contrast: Side-by-side comparisons with industry-standard Scikit-Learn
- Visual Learning: Beautiful plots and visualizations that bring theory to life
- Real-World Applications: Applied examples on diverse datasets
- Educational Focus: Clear documentation, math explanations, and code comments
💡 Perfect for students, developers, data scientists, and AI enthusiasts who want to truly understand Machine Learning from first principles.
|
|
|
|
| Category | Technologies |
|---|---|
| Language | |
| Core Libraries | |
| ML Framework | |
| Visualization | |
| Environment | |
| Version Control |
Machine-learning-Algorithm/
│
├── 📁 Supervised Learning
│ ├── 📂 Regression
│ │ ├── LinearRegression/ # Simple & Multiple Linear Regression
│ │ ├── PolynomialRegression/ # Polynomial Regression
│ │ └── GradientDescent/ # Batch, Mini-Batch, Stochastic GD
│ │
│ ├── 📂 Classification
│ │ ├── LogisticRegression/ # Binary & Multi-class Classification
│ │ ├── KNN/ # K-Nearest Neighbors
│ │ ├── NaiveBayes/ # Gaussian, Multinomial, Bernoulli
│ │ ├── SupportVectorMachines/ # SVM with Kernel Tricks
│ │ ├── DecisionTrees/ # CART Algorithm
│ │ └── NeuralNetworks/ # Perceptron & MLP
│ │
│ └── 📂 Ensemble Methods
│ ├── RandomForest/ # Random Forest Classifier & Regressor
│ ├── Bagging/ # Bootstrap Aggregating
│ ├── AdaBoost/ # Adaptive Boosting
│ ├── GradientBoosting/ # Gradient Boosting Machines
│ └── XGBoost/ # Extreme Gradient Boosting
│
├── 📁 Unsupervised Learning
│ ├── 📂 Clustering
│ │ ├── K-Means-clustering/ # K-Means from Scratch
│ │ ├── HierarchicalClustering/ # Agglomerative & Divisive
│ │ └── DBSCAN/ # Density-Based Clustering
│ │
│ └── 📂 Dimensionality Reduction
│ └── PCA/ # Principal Component Analysis
│
├── 📁 DataSets/ # Curated Real-World Datasets
│ ├── iris.csv # Classification Dataset
│ ├── heart.csv # Healthcare Dataset
│ ├── Social_Network_Ads.csv # Marketing Dataset
│ ├── ipl-matches.csv # Sports Analytics
│ ├── zomato.csv # Restaurant Data
│ └── student_clustering.csv # Educational Data
│
├── 📁 Visualizations/ # Plots & Charts
├── 📄 requirements.txt # Python Dependencies
├── 📄 CONTRIBUTING.md # Contribution Guidelines
├── 📄 LICENSE # MIT License
└── 📄 README.md # You are here!
- Python 3.9 or higher
- pip package manager
- Git
git clone https://github.com/Nitin-Prata/Machine-learning-Algorithm.git
cd Machine-learning-AlgorithmWindows (PowerShell)
python -m venv .venv
.\.venv\Scripts\Activate.ps1macOS / Linux
python3 -m venv .venv
source .venv/bin/activatepip install -r requirements.txtjupyter notebookYour browser will open automatically at http://localhost:8888
- Navigate to any algorithm folder (e.g.,
LinearRegression/) - Open the Jupyter Notebook (
.ipynbfile) - Run cells sequentially to see the implementation
- Experiment with parameters and datasets
- Compare scratch implementation with Scikit-Learn
📊 Regression Algorithms (5)
| Algorithm | From Scratch | Scikit-Learn | Notebook |
|---|---|---|---|
| Linear Regression | ✅ | ✅ | 📓 |
| Multiple Linear Regression | ✅ | ✅ | 📓 |
| Polynomial Regression | ✅ | ✅ | 📓 |
| Ridge Regression | ✅ | ✅ | 📓 |
| Lasso Regression | ✅ | ✅ | 📓 |
🎯 Classification Algorithms (8)
| Algorithm | From Scratch | Scikit-Learn | Notebook |
|---|---|---|---|
| Logistic Regression | ✅ | ✅ | 📓 |
| K-Nearest Neighbors (KNN) | ✅ | ✅ | 📓 |
| Naive Bayes | ✅ | ✅ | 📓 |
| Support Vector Machine (SVM) | ✅ | ✅ | 📓 |
| Decision Trees | ✅ | ✅ | 📓 |
| Random Forest | ✅ | ✅ | 📓 |
| Neural Networks (MLP) | ✅ | ✅ | 📓 |
| Softmax Regression | ✅ | ✅ | 📓 |
🌳 Ensemble Methods (5)
| Algorithm | From Scratch | Scikit-Learn | Notebook |
|---|---|---|---|
| Random Forest | ✅ | ✅ | 📓 |
| Bagging | ✅ | ✅ | 📓 |
| AdaBoost | ✅ | ✅ | 📓 |
| Gradient Boosting | ✅ | ✅ | 📓 |
| XGBoost | ✅ | ✅ | 📓 |
🔍 Clustering Algorithms (3)
| Algorithm | From Scratch | Scikit-Learn | Notebook |
|---|---|---|---|
| K-Means Clustering | ✅ | ✅ | 📓 |
| Hierarchical Clustering | ✅ | ✅ | 📓 |
| DBSCAN | ✅ | ✅ | 📓 |
📉 Dimensionality Reduction (1)
| Algorithm | From Scratch | Scikit-Learn | Notebook |
|---|---|---|---|
| Principal Component Analysis (PCA) | ✅ | ✅ | 📓 |
⚡ Optimization Algorithms (3)
| Algorithm | From Scratch | Notebook |
|---|---|---|
| Batch Gradient Descent | ✅ | 📓 |
| Mini-Batch Gradient Descent | ✅ | 📓 |
| Stochastic Gradient Descent | ✅ | 📓 |
| Category | Count | Implementation Status |
|---|---|---|
| Regression | 5 | ✅ Complete |
| Classification | 8 | ✅ Complete |
| Ensemble Methods | 5 | ✅ Complete |
| Clustering | 3 | ✅ Complete |
| Dimensionality Reduction | 1 | ✅ Complete |
| Optimization | 3 | ✅ Complete |
| TOTAL | 25 | ✅ Complete |
All datasets are curated, cleaned, and ready to use in the /DataSets folder.
| Dataset | Size | Features | Use Case | Domain |
|---|---|---|---|---|
iris.csv |
150 | 4 | Multi-class Classification | Botany |
heart.csv |
303 | 13 | Binary Classification | Healthcare |
Social_Network_Ads.csv |
400 | 4 | Marketing Classification | Business |
ipl-matches.csv |
756 | 18 | Regression & Analysis | Sports |
zomato.csv |
9551 | 21 | Clustering | Food Industry |
student_clustering.csv |
2000 | 7 | Clustering | Education |
Decision boundaries, loss curves, clustering plots, and feature importance visualizations are included in each notebook.
-
Week 1-2: Linear & Logistic Regression
- Start with
LinearRegression/ - Move to
LogisticRegression/ - Understand cost functions and gradient descent
- Start with
-
Week 3: Classification Basics
- Explore
KNN/ - Study
NaiveBayes/ - Practice on iris dataset
- Explore
-
Week 4: Tree-Based Methods
- Learn
DecisionTrees/ - Build intuition with visualizations
- Learn
-
Week 5-6: Advanced Classification
- Master
SupportVectorMachines/ - Understand kernel tricks
- Implement
NeuralNetworks/
- Master
-
Week 7: Ensemble Methods
- Study
RandomForest/ - Compare with
Bagging/ - Understand bootstrap aggregating
- Study
-
Week 8: Clustering
- Implement
K-Means-clustering/ - Explore
HierarchicalClustering/ - Try
DBSCAN/
- Implement
-
Week 9-10: Boosting Algorithms
- Deep dive into
AdaBoost/ - Master
GradientBoosting/ - Optimize with
XGBoost/
- Deep dive into
-
Week 11: Dimensionality Reduction
- Understand
PCA/ - Apply to high-dimensional data
- Understand
-
Week 12: Optimization Techniques
- Compare gradient descent variants
- Implement custom optimizers
- Hyperparameter tuning
After completing this repository, you will:
- ✅ Understand the mathematical foundations of ML algorithms
- ✅ Implement algorithms from scratch using NumPy
- ✅ Debug and optimize ML code effectively
- ✅ Compare custom implementations with Scikit-Learn
- ✅ Visualize model behavior and decision boundaries
- ✅ Apply algorithms to real-world datasets
- ✅ Choose the right algorithm for specific problems
- ✅ Tune hyperparameters for optimal performance
Contributions are always welcome! Here's how you can help:
- 🐛 Report bugs and issues
- 💡 Suggest new algorithms to implement
- 📝 Improve documentation
- 🎨 Add visualizations
- 📊 Contribute new datasets
- ✨ Optimize existing code
- 🧪 Add unit tests
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Please read CONTRIBUTING.md for detailed guidelines.
🎓 B.Tech in Computer Science (AI) | India 🇮🇳
💼 Machine Learning, AI Education & Open Source
"Learn the math. Build the code. Train the mind." 🧠
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2024 Nitin Pratap Singh
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files...
If you find this repository helpful, please consider:
| Action | Why? |
|---|---|
| ⭐ Star this repository | Show appreciation & help others discover it |
| 🍴 Fork it | Create your own version & experiment |
| 👀 Watch | Get notified of updates |
| 💬 Share | Help the ML community learn |
| 🐛 Report Issues | Help improve the project |
| 🤝 Contribute | Make it even better |
Special thanks to:
- Andrew Ng for his legendary Machine Learning course that inspired this project
- CampusX for their exceptional ML tutorials and educational content
- Scikit-Learn team for the amazing library
- NumPy contributors for the numerical computing foundation
- The open-source community for inspiration
- You for taking the time to explore this repository!
- Pattern Recognition and Machine Learning by Christopher Bishop
- The Elements of Statistical Learning by Hastie, Tibshirani, Friedman
- Machine Learning Yearning by Andrew Ng
Machine Learning is not just about using libraries — it's about understanding the principles that make those libraries work. This repository bridges the gap between theory and practice, empowering you to not just use ML, but to truly understand it.