Car price estimation is a valuable ML application widely used in resale valuation platforms automobile dealerships and second hand car marketplaces.
This project predicts the selling price of a car using machine learning models based on features such as fuel type transmission present price and age of the car.
The goal is to understand which factors affect pricing most and build a model that can predict real world car value with good accuracy.
Project Overview
- Predicts resale price of used cars using regression
- Two ML models implemented and compared
- Linear Regression and Lasso Regression
- Lasso selected for better regularization and feature control
Libraries Used
- Pandas and NumPy for dataset handling
- Matplotlib and Seaborn for data visualization
- Scikit Learn for model training and evaluation
- Metrics R2 MAE MSE RMSE used for performance comparison
Dataset and Feature Description
- Dataset contains Fuel Type Transmission Seller Type Present Price Model Year etc
- New feature Car Age created from Year to represent depreciation
- Car Name column removed as it adds no predictive contribution
- Non numeric attributes encoded before model input
Data Preprocessing
- Checked null values and removed duplicates
- Converted categorical columns through encoding
- Dropped Year after computing Car Age
- X and Y separated for training and testing
EDA Observations
- Older cars lose value more quickly
- Diesel and automatic transmission cars often priced higher
- Present Price has strongest correlation with Selling Price
- Heatmap visualization confirms feature influence
Feature Selection
Most impactful factors selected for model training
- Present Price
- Car Age
- Fuel Type
- Transmission
- Seller Type
Machine Learning Models Used
1: Linear Regression
- Acts as baseline model for price prediction
- Simple fast and interpretable
- Works well only when feature to price relation is linear
lin_reg_model = LinearRegression()
lin_reg_model.fit(x_train, y_train)
lin_predictions = lin_reg_model.predict(x_test)
2: Lasso Regression
- Linear model with L1 regularization
- Shrinks weak coefficients making model more stable
- Controls overfitting and improves generalization
lasso_reg_model = Lasso()
lasso_reg_model.fit(x_train, y_train)
lasso_predictions = lasso_reg_model.predict(x_test)
-> Lasso performed better than plain Linear Regression since it reduces noise & prevents overfitting.
Evaluation Results
Metrics for both models
- R2 Score
- Mean Absolute Error
- Mean Squared Error
- Root Mean Squared Error
Outcome Summary
- Linear Regression gives good baseline accuracy
- Lasso Regression performs better by controlling coefficients and reducing error drift
- Final selected model Lasso Regression
Final Insights
- Present Price and Car Age most influential for pricing trends
- Lasso more stable and predictive compared to Linear Regression
- ML can be used effectively to estimate used car prices
Future Enhancements
- Try advanced models like Random Forest XGBoost and Gradient Boosting
- Add features like location service history ownership count accident record
- Deploy model using Flask or Streamlit for real time price forecasting