fbpx

What’s your worth?

Every time I applied to jobs, inevitably, I got asked the awkward question about salary. “What salary are you expecting?”, they ask. And I am not going to lie. A few years ago, I was not too sure how to answer. Now, however, the answer is simple. It’s your market value!

What is your market value? Simply put, it is the going rate employers are willing to pay for people with your skills, experience, and education. Of course, there are other variables that come into the equation, but those are often the first things employers look at. The first impression, if you will.

So, when asking for a given salary, there should be a sense of what alternative employers are paying. That way, the market is competitive and no one is getting conned into paying or receiving less than the fair market rate.

What does this all have to do with machine learning? Well, we can get this estimate for our market value simply by observing the data. And this is what I did! I took thousands of current listings and trained a model to predict the fair salary for an input CV. So how did I do it?

Collecting the Data

I was able to gather a sample of relevant listings to me along with their salary range.

The salary range is required as input when an employer lists the job, so it could be seen as the salary they are willing to offer for their job. The middle point of that range is going to be the labels for training the predictive model.

I chose to narrow down the sample to jobs that may have a category that is relevant to me (For example, I know I have no tradie skills, unfortunately!). I was left with a sample of approximately 4000 job listings.

Feature Engineering

So, years of experience required or seniority should be a predictor. Unfortunately, there is no field for ‘required years of experience’; so, I would need to somehow determine this just from the text in the job title or description.

Seniority

I tried modelling the market value without this variable, and it made sense that it would perform worse than if you provide it with your seniority.

Job Categories

Below, I have calculated the average salary offer for several top categories relevant to data analytics. Information and communication technology jobs were, on average, higher than others.

Job Requirements and Desired Skills

“Strong experience with Azure Databricks, R and Python. Proven experience in writing algorithms.”

I get the following: [‘experience with Azure Databricks’, ‘R and Python’, ‘writing algorithms’].

Do that over a whole job listing, and you cut down the words to the most critical bits of information.

Requirements/Skills Embeddings

In essence, I convert text into vectors like so:

Word vectors in 3D.

To distil the information contained in the text vectors, I ran the skills vectors through a dimension reduction algorithm to reduce noise in the data and the number of dimensions for later training of the model. In two dimensions, the listings show some ‘groupings’ of similar jobs.

Choosing the Features

Fitting a Model

To start off with, it is wise to train up a simple model like multiple linear regression to gain a better understanding of the data and create a baseline performance. Right off the bat, it was clear that seniority is a significant predictor (unsurprisingly). Job categories are good predictors too.

With all the features included, the model performed relatively well. Out-of-sample predictions were usually within +-$10,000k of the actual mid-point salary offer (so within the average offer range of about $20,000). I have plotted the final model predicted vs. actual test data below:

I chose to fit over 10 different regression models to see which one had the best cross-validated performance. I found that LightGBM worked particularly well, so I chose to keep it.

Hyper-parameter Tuning

So I split the data into training, validation, and test samples. And then used an awesome library called Optuna, which is an optimisation engine. I used it to automatically search for the optimal set of parameters that would maximise the performance of the model on the validation sample.

Once complete, my model was ready to make some predictions!

Testing

Interestingly, there are some salary clumps around $65k, $100k, $130k, $160k and $210k+. I postulate that employers gravitate toward round numbers and compete at those price points.

Predicting

Because one of the strongest predictors was the level of seniority, the predicted salary goes up in jumps with each level (as you can see in the plot above). The model highlights that adding skillsets may not significantly add to your market value compared to becoming a senior from a junior.

Now, becoming a senior or an executive comes with time AND skills, so it would be interesting to rerun the model without the seniority variables. This way, you would be able to see what one’s market value is without specifying the seniority.

Conclusion

Like any machine learning project, I collected relevant, high-quality data. From there, I picked and created the best predictors based on intuition. I reduced the set of predictors to the best ones, trained a model, and tuned the best ones. From there, it was a matter of making relevant predictions.

At Queenstown Resort College (QRC), we are launching our new micro-credential in machine learning in beautiful New Zealand. Come join us and learn the skills you need for your next career steps!

Find more about our Machine Learning Fundamentals micro-credential here.

This error message is only visible to WordPress admins

No posts found.

Make sure this account has posts available on instagram.com.