This project predicts student performance based on academic and personal attributes such as study hours, parental education level, prior grades, and other relevant features. The system trains multiple regression algorithms, evaluates them using R2 score, automatically selects the best performing model, and deploys it as a live cloud application.
From data preprocessing to deployment on AWS EC2, the entire pipeline is automated using Docker and GitHub Actions.
This project implements a complete machine learning lifecycle including data preprocessing, model training, evaluation, artifact generation, containerization, and automated cloud deployment.
The system is built with scalability and automation in mind. It uses multiple regression models, evaluates performance using R2 score, selects the best performing model, and deploys it in a production ready Docker environment hosted on AWS.
The entire process is automated using CI CD pipelines.
Key Features
Automated Model Training Pipeline
Trains multiple regression algorithms
Evaluates models using R2 score
Selects best performing model automatically
Stores trained model as artifacts model.pkl
Containerized Application
Dockerized ML application
Reproducible environment
Lightweight and portable setup
CI CD with GitHub Actions
Automatic build on code push
Image pushed to Amazon ECR
Continuous deployment to EC2
Cloud Deployment
Hosted on AWS EC2
Docker container running production server
Secure authentication with IAM
Architecture Overview
Developer pushes code to GitHub
Continuous Integration workflow runs tests
Docker image is built
Image pushed to Amazon Elastic Container Registry
Self hosted EC2 runner pulls latest image
Container restarts with updated model
Application becomes live automatically
This ensures zero manual deployment steps.
Technology Stack
Python
Scikit Learn
Docker
GitHub Actions
Amazon EC2
Amazon ECR
IAM
Model Training Process
Multiple regression models are trained including Linear Regression, Decision Tree, Random Forest, Gradient Boosting, and XGBoost.
Each model is evaluated using R2 score.
The highest performing model is automatically selected and saved.
The final trained model is stored inside artifacts folder and packaged inside the Docker image for production inference.
Deployment Pipeline
Every push to main branch triggers:
Code checkout
Unit testing
Docker image build
Authentication with AWS
Image push to ECR
EC2 pulls latest image
Container restarted automatically
This ensures reliable and repeatable deployments.
Why This Project Matters
Demonstrates real world MLOps implementation
Bridges gap between Data Science and Cloud Engineering
Automates deployment workflow
Production ready architecture
Cloud scalable infrastructure
This project reflects strong skills in Machine Learning Engineering and DevOps practices.
Use Case
This system can be extended to deploy:
Student performance prediction
Sales forecasting
Risk assessment models
Demand prediction systems
Any regression based ML system can be integrated easily.