fbpx

Linear Regression: How Much Is My Car Worth?

Not long ago, my friend asked me, “How could machine learning be useful for the average person?” And so I got inspired for this article! Machine learning does not have to be some mysterious and obscure topic. It can be valuable and practical for everyday use.

Today, I want to use machine learning to value a car. Not just any car. My very own car!

More than one of my mates has asked me to help them find a good car at a good price. So, I made them a simple Google Sheets tool that would estimate a car’s value. With this tool, you can identify which cars were over- or under-priced.

Now, you could go hard and gather thousands of data points and train a fancy model, but there are time and data constraints! Instead, we need a quick and dirty (but robust) solution. This is where a good old multiple linear regression model comes in handy.

What Is Multiple Linear Regression

Multiple linear regression, in essence, is a mathematical equation. Remember back to school when they taught you y = mx + c. The formula perfectly describes a linear function with the intercept being c, m the slope, and x the point on the line.

Multiple linear regression builds on this idea, but it includes multiple coefficients. Essentially, we can describe a dependent variable (y) in terms of multiple linear variables, called independent variables.

For example, you could say that your house price is a linear function of several variables. For example, the number of bedrooms and bathrooms, the floor and section area, and a random ‘error’ term.

At this stage, the formula is just theoretical. We would need to fit the model to real data to see if there is, indeed, any significant relationship between these factors and the price of houses. Once we have fit the model, we can make general predictions on other houses.

Pricing That Car

To start off with, I had a set of hypotheses. From experience, I was guessing that the number of kms travelled should have an impact on the car’s value. Additionally, the age and number of previous owners may also impact the price. A few other variables could be engine size, transmission type, colour, or even whether it is a four-wheel-drive (4WD).

I have written the full model with all the variables I thought could be significant predictors of a car’s value below:

This model is far from complete. I, then, went through your standard machine learning process to develop the model. This is how it went:

  1. I gathered the data from live car listings for my car. I use this data to train the model that predicts the car’s value.
  2. I then cleaned the data so that I did not have missing data points, incorrect data types, and any other data imperfections.
  3. I preprocessed the data so that I had meaningful information. Some preprocessing was to normalise the data for more consistent results and rounding some values to get fewer categories (e.g., a 1998cc engine compared to a 2000cc engine).
  4. I did some feature engineering to derive the predictor variables I thought may have some significance.
  5. I then selected the best subset of variables using feature selection. This removed bad predictors and improved the overall model by only keeping the best variables.
  6. fitted the final model and predicted the car’s value.
  7. Finally, I did some robustness testing to make sure that the model was trustworthy.

Significant Predictors

The number of previous owners and kms on the odometer were highly correlated, so the two variables shared quite a lot of information in common. So, due to the high correlation (without getting into a discussion about multicollinearity), I dropped the number of previous owners variable, which was not significant.

The final model was simple yet effective. I was left with kms on the odometer and engine size. As simple as that! You can see the training data’s relationship between kms and price below:

As the car travels further, its market value goes down on average. With all the wear and tear, this relationship makes sense!

Making a Prediction

My car has travelled 122k kms and does not have a 2000cc engine, so the value comes to 9.5k — $2.43(1.22) + $0.99 (0) = $6,550

Accuracy

Conclusion

We dive into detail on how we can prepare and fit models like this one in our new machine learning micro-credential provided by Queenstown Resort College (QRC).

Find more about our Machine Learning Fundamentals micro-credential here.

This error message is only visible to WordPress admins

No posts found.

Make sure this account has posts available on instagram.com.