Skip to main content

Blog

You are here:

4 Easy to Understand Machine Learning Methods

Data analysis in the 21st century increasingly relies upon the field of “machine learning.” It has had incredible success in a wide range of applications, especially in business and finance. The field is also an area of computer science that is full of mathematics and hard-to-decipher lingo. If you’re not sure what those machine learning gurus are talking about, the following is an excellent place to start.

MACHINE LEARNING?

Machine learning is a subfield of computer science focused on designing algorithms that can “learn” from data. The discipline intertwines intimately with the field of artificial intelligence.

There are two main kinds of machine learning: unsupervised learning and supervised learning. Both are useful in data analysis.

Unsupervised learning involves a machine learning algorithm looking at data and finding patterns, frequently referred to as “features.” Supervised learning, on the other hand, is what you do when you have input data as well as output labels. The goal of supervised learning is to create an algorithm that maps input data to output labels. This method is widely used in commercial applications.

Four easy-to understand (and commonly used) machine learning methods are: 

1. Linear Classifiers

A linear classifier is a simple method for classifying data. In essence, it is an algorithm that learns to draw straight lines between groups of data points. All points on one side of a line or set of lines get the same label.

When an algorithm gets a data point it hasn’t seen before, it figures out which side of the various lines the new data point falls on. The algorithm then labels the data point using the name it has learned for that particular region.

2. Linear Regression

Linear regression is simply an algorithm for predicting a value, based on a straight line. It uses the same y=mx+b equation taught in high schools. Give it some training data, let it learn a line of best fit to the data, and that’s it! It can then interpolate (or extrapolate) based on that straight line to make its best guess, “y,” given new unseen data point, “x.”

3. Nearest Neighbors

The “nearest neighbors” method is suitable for classification tasks and works exactly as its name implies.

First, it memorizes all the training data you give it. Then, when you test it with an unseen data point, it will look for the most similar data points from the training set. Based on the classification of most of the similar-looking data points, it picks that class as its guess.

The algorithm calculates a “distance” to determine how different two data points are, hence the name “nearest neighbors.” It calculates distance using a modified version of the Pythagorean Theorem taught in high school. Changing the number of nearest neighbors can alter the algorithm’s performance.

4. Decision Trees

Decision trees are another aptly named machine learning method. All they are is a fancy algorithm for learning to classify things based on a tree of decisions. It uses the same logic as the popular game/toy “20 Questions,” where a person (or computer) has to correctly guess an object using a series of yes/no questions. Tweaking the maximum depth of your decision tree (i.e., how many “questions” it uses) can help improve accuracy.

To make this more concrete, take the example of sorting apples, oranges, and bananas. For this example, use an orange. The first branching point in the tree could be: is the fruit round? Yes, it is. It now knows that the fruit in question is not a banana. Then the decision tree could ask if it’s red. No, it’s orange in color. Now it knows that it’s not an apple, but an orange. It’s that simple.

THE FUTURE OF DATA-DRIVEN BUSINESS

Machine learning is becoming more common in business with each passing year. If your company is to stay competitive, it cannot just collect data; it must figure out how to leverage it. Understanding these four machine learning methods will be crucial to helping you ensure an edge over your competition.