Student Performance Prediction | End to End MLOps Project

Introduction

This project predicts student performance based on academic and personal attributes such as study hours, parental education level, prior grades, and other relevant features. The system trains multiple regression algorithms, evaluates them using R2 score, automatically selects the best performing model, and deploys it as a live cloud application.

From data preprocessing to deployment on AWS EC2, the entire pipeline is automated using Docker and GitHub Actions.

This project implements a complete machine learning lifecycle including data preprocessing, model training, evaluation, artifact generation, containerization, and automated cloud deployment.

The system is built with scalability and automation in mind. It uses multiple regression models, evaluates performance using R2 score, selects the best performing model, and deploys it in a production ready Docker environment hosted on AWS.

The entire process is automated using CI CD pipelines.

Key Features

Automated Model Training Pipeline
Trains multiple regression algorithms
Evaluates models using R2 score
Selects best performing model automatically
Stores trained model as artifacts model.pkl

Containerized Application
Dockerized ML application
Reproducible environment
Lightweight and portable setup

CI CD with GitHub Actions
Automatic build on code push
Image pushed to Amazon ECR
Continuous deployment to EC2

Cloud Deployment
Hosted on AWS EC2
Docker container running production server
Secure authentication with IAM

Architecture Overview

Developer pushes code to GitHub
Continuous Integration workflow runs tests
Docker image is built
Image pushed to Amazon Elastic Container Registry
Self hosted EC2 runner pulls latest image
Container restarts with updated model
Application becomes live automatically

This ensures zero manual deployment steps.

Technology Stack

Python
Scikit Learn
Docker
GitHub Actions
Amazon EC2
Amazon ECR
IAM

Model Training Process

Multiple regression models are trained including Linear Regression, Decision Tree, Random Forest, Gradient Boosting, and XGBoost.

Each model is evaluated using R2 score.
The highest performing model is automatically selected and saved.

The final trained model is stored inside artifacts folder and packaged inside the Docker image for production inference.

Deployment Pipeline

Every push to main branch triggers:

Code checkout
Unit testing
Docker image build
Authentication with AWS
Image push to ECR
EC2 pulls latest image
Container restarted automatically

This ensures reliable and repeatable deployments.

Why This Project Matters

Demonstrates real world MLOps implementation
Bridges gap between Data Science and Cloud Engineering
Automates deployment workflow
Production ready architecture
Cloud scalable infrastructure

This project reflects strong skills in Machine Learning Engineering and DevOps practices.

Use Case

This system can be extended to deploy:

Student performance prediction
Sales forecasting
Risk assessment models
Demand prediction systems

Any regression based ML system can be integrated easily.

Share this post:
Facebook
Twitter
LinkedIn

Web Development Projects

Interested in more? Check out my Machine Learning projects as well.

Machine Learning Projects

Interested in more? Check out my Machine Learning projects as well.

Python Projects

Interested in more? Check out my Python projects as well.