Wine Quality Prediction

Introduction

Wine Quality Prediction Using Machine Learning

This project focuses on predicting the quality of wine based on various physicochemical properties such as acidity alcohol percentage pH level density and sulphate content. The goal is to understand which chemical factors influence wine quality and to build an accurate machine learning model using real world data.

Wine quality datasets are widely used in the food industry and research to ensure consistency improve production processes and evaluate wine without relying only on human tasters.

Project Overview

The dataset contains numerical features and a target variable called quality which is typically rated between 0 and 10.
The notebook demonstrates a complete workflow including exploratory analysis feature correlation data preprocessing model building and performance evaluation.

Libraries Used

The notebook uses the following libraries exactly as imported in the code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

Data Preprocessing

The dataset was checked for missing values and all columns were confirmed to be numeric.
Since the data did not require categorical encoding the preprocessing was focused on these steps

  • Verified feature types and structure
  • Selected quality as the target variable
  • Standardized the data if needed for certain algorithms
  • Ensured all values were clean and ready for analysis

Exploratory Data Analysis

Visualizations were used to understand patterns between wine features and quality.

Key observations include

  • Higher alcohol content generally results in higher wine quality
  • Increased volatile acidity lowers the quality
  • Sulphates show a mild positive influence
  • Citric acid also contributes slightly to improved scores

Bar charts and distribution plots helped reveal how each feature behaves and how they contribute to the final rating.

Correlation Insights

A heatmap was plotted to reveal how strongly each feature relates to the quality score.
Important findings from the correlation analysis include

  • Alcohol has the strongest positive correlation
  • Volatile acidity has a strong negative correlation
  • Sulphates and citric acid show moderate positive effects
  • Density shows a weak negative correlation
  • pH residual sugar and chlorides have minimal impact

These insights helped guide feature selection and understanding of wine chemistry.

Feature Selection

Based on correlation values and overall relevance the following features were selected as the most impactful predictors of wine quality

alcohol
volatile acidity
sulphates
citric acid
density

Model Training

This project frames the problem as classification and trains a Random Forest classifier rather than a regression model. The training flow follows these steps

  • Split data into training and test sets using train_test_split
  • Instantiate RandomForestClassifier and fit on the training data
  • Predict quality labels on the test set using the trained classifier

Example code snippet

model = RandomForestClassifier()
model.fit(X_train y_train)
predictions = model.predict(X_test)

Random Forest is used because it handles complex non linear relationships and works well with feature sets of this type.

Model Evaluation

Model performance is evaluated using accuracy which is appropriate for classification tasks and provides a clear measure of correct label prediction. Additional evaluation steps can include precision recall and F1 score for class imbalance analysis

  • accuracy_score was used to compute the overall prediction accuracy

The Random Forest classifier achieved strong classification accuracy compared to a simple baseline.

Key Results

  • Alcohol content is the most powerful indicator of wine quality
  • Higher volatile acidity tends to reduce quality
  • Sulphates and citric acid show positive influence
  • Random Forest classifier delivered the best predictive performance in this workflow

Conclusion

The Wine Quality Prediction project successfully highlights which chemical characteristics matter most in determining wine quality. The Random Forest classifier proved to be an effective model for predicting quality labels and the insights gained align with real world wine chemistry. This project provides a strong foundation for predictive modeling in food science and quality control.

Future Improvements

  • Test advanced classification models such as Gradient Boosting or XGBoost classifiers
  • Apply cross validation for more reliable scoring
  • Consider converting quality into categorical bins such as good average and poor to simplify interpretation
  • Optimize hyperparameters using GridSearchCV or RandomizedSearchCV
  • Deploy the classifier with Flask or Streamlit for real time use
Share this post:
Facebook
Twitter
LinkedIn

Web Development Projects

Interested in more? Check out my Machine Learning projects as well.

Machine Learning Projects

Interested in more? Check out my Machine Learning projects as well.

Python Projects

Interested in more? Check out my Python projects as well.