
When entering the machine learning world, there is a LOT of information. There are lots of different courses, techniques, terms, and algorithms. How would one know what to focus on when learning?
That’s what I want to help you with today! So you do not have to go down too many rabbit holes, I present my favourite machine learning algorithms!
1. Linear Regression
This algorithm is easily the top algorithm in data science and machine learning. This approach to modelling assumes linear relationships between the predictor and the response variables.
This algorithm is so common that I have yet to have a job where this tool was not useful! With linear regression, we can do all kinds of things. I have used it to help my friends value cars. I have used it to forecast revenues, estimate marketing effectiveness, and much more!
Linear regression is the backbone of finance, business and science. This should definitely be one of the first algorithms one should master before getting into other ones.
2. Logistic Regression
A close runner-up, logistic regression is also quite common. As opposed to linear regression, logistic regression uses a non-linear logistic function to come up with a probability between 0 and 1 (0% and 100%). For example, I plotted the logistic function fitted on male vs other gender based on weight below:
This model is handy for predicting binary (e.g., yes/no, positive/negative) variables such as sentiment analysis in reviews or the probability a customer will return.
3. K-Means
K-means is another of those quintessential algorithms that I frequently use in my workflow. It is a simple, yet effective algorithm to find ‘clusters’ among data. The algorithm iteratively pairs data points to the nearest cluster and finds the best cluster allocation.
This tool is so useful on a day-to-day basis. I have used it for grouping customer feedback to work out common ‘themes’ in the comments. I used it to find common job groupings when job hunting! This tool, along with linear regression should definitely be part of your machine learning Swiss army knife.
4. K Nearest Neighbours (KNN)
This algorithm is another handy tool. KNN is used to determine how ‘close’ one set of variables is to another. The intuition is that similar inputs should produce similar outputs. How the similarity is calculated can differ; however, the principle stays the same.
I have used KNN quite a few times, especially in the data preprocessing stage. It can be used to fill erroneous or missing data points. It is also often used as a baseline algorithm for recommendation systems.
5. Decision Trees
Decision trees are the foundation for many other fascinating and practical machine learning tools. They are also handy models that can be easily explained to others. It is essentially a flow chart describing a decision’s process before settling on the final result. Take a simple house price model, for example:
Decision trees are a must-have in your machine learning tool belt. There are not too many simple and easily-explainable models like this one.
6. Random Forests
Now, random forests are where machine learning starts to do some heavy lifting! This algorithm is what they call an ensemble learning algorithm. The algorithm makes several random decision trees that ‘vote’ on what the answer should be.
As they say, “There is wisdom in the counsel of many.” This algorithm is quite a powerful regression and classification algorithm. If Linear and logistic regression are the baselines, this algorithm steps it up.
I have used random forests when pricing property, as it was the best predictor of all the regression models I tried. After learning linear and logistic regression, this algorithm is well worth learning about!
7. Boosting Algorithms
These algorithms are extremely common among machine learning enthusiasts. Not only because they are accurate, but they are also fast to train. The intuition behind how they work is that they are tree algorithms that iteratively fit the residuals of the previous iteration. Boosting algorithms are perfect examples of algorithms that get smarter with every iteration.
Though there are quite a few different implementations, I would recommend XGBoost, which is the most widely used model. Some others include AdaBoost, Gradient Boosting, and CatBoost.
8. Principal Component Analysis (PCA)
PCA is easily one of the most common dimension reduction techniques out there. Sometimes the data are too big and high-dimensional that we, mere humans, cannot understand. It would need to be explained in 2 or 3 dimensions for us to understand.
PCA, basically, computes principal components which are vectors that are linear combinations of the original variables. You can, then, take the first few principal components to explain a large part of the variation in the data. Below, I have plotted some generated data in 3D and then in 2D after keeping the first two principal components.
As you can see above, the data are reduced down to 2D whilst keeping most of the information. PCA is a critical tool to have in your toolbelt!
9. Neural Networks
Neural networks come in many shapes and sizes. There are so many different architectures, that I do not have enough time to get into them all. Neural networks are a large collection of interconnected nodes or neurons that all work together to fit a function. They are highly adaptable and flexible, so you can use them in all kinds of scenarios.
Once you have a grasp of the more common algorithms, these babies are a must! They will meet the challenge in regression, classification, signal processing, and more!
10. Support-Vector Machines (SVMs)
Before XGBoost, AdaBoost and other boosting algorithms, support vector machines (SVMs) were very popular. This algorithm is used for regression and classification tasks. The way it works is to separate the classes using a straight line that maximises the margin between the classes.
The model is beautifully simple and effective if the data are linearly separable. Later, the algorithm was improved to include kernels that would transform the input data space. This allows for ‘non-linear’ separations between classes.
Conclusion
Machine learning is a growing field with so many exciting and new algorithms and models. The industry is booming and data analytics skills are in hot demand. If you are looking to get into analytics, machine learning is a key skill set you should have in your tool belt.
Each of these algorithms and models has its strengths and use cases, so it is important to gain an understanding of the ML tools!
At Queenstown Resort College (QRC), we are launching our new micro-credential in machine learning in beautiful New Zealand. Come join us and learn the skills you need for your next career steps!
Find more about our Machine Learning Fundamentals micro-credential here.