Top 3 Effective Feature Selection Strategies in Machine Learning

Feature selection is the most critical step besides having data. As necessary as it is, many guides and tutorials entirely skip this part of the process. Check out Becoming a Machine Learning Engineer | Step 2: Pick a Process to learn more about what a good ML process looks like.

“Garbage Data in, Garbage results out”—  Every ML Engineer.

I’m going to share some excellent methods to perform feature selection, so you can up your Machine Learning game.

What is feature selection? Well in real-world problems it is not always clear what features if any will help you model the problem you are attempting to solve. Along with this issue, there is also a lot of issues with data sometimes being redundant or not being very relevant.

Feature selection is a field of research that wants to help algorithmically pick out important features.

Why not just throw all of your features into a Machine Learning model and call it a day?

Real-world problems don’t have open-source data sets, nor do these data sets always contain relevant information for solving the problem you face. Faced with these realities feature selection can help you maximize feature relevancy while reducing feature redundancy. This leads to a higher chance of building a good model and reducing the overall model size.

Say we want to predict future water park ticket sales. To do this, we decided to look at weather data, ice cream consumption, coffee consumption, and season.

From the graph, it is easy to see that more tickets get sold during the summer months than any other time and that zero get sold during the winter months. Coffee consumption seems constant over the year, while ice cream consumption happens all year round, but peaks in June.

Table 1: Made up data we are using for this example 

Graph 1: Graph of made-up data

We want to predict ticket sales, but we likely do not need to have all of this data to get the best results. There are N dimensions of data and exists a K number that will give us the best results. However, there is an enormous number of combinations of various-sized subsets.

Our goal is to reduce the number of dimensions without losing predictive power. Let’s take a step back and look at the tools available to us for this use.

1. Exhaustive Search

This technique will 100% always find the best possible features to build a model. We can be sure it works because it searches every possible combination of features and finds the combination that returns the lowest low in our model.

In our case, we have 15 possible feature combinations that we have to search through. To calculate the number of combinations I used this formula (2^n — 1). This method works for a few features, but can quickly get out of hand if you have 3000 features.

Luckily for everyone involved, there is a slightly better way to do this.

2. Random Feature Selection

Most of the time random feature selection will work well enough for the majority of situations. Assume you want to reduce your features by 50%, randomly select 50% of the features to remove.

After training your model check performance. Repeat in till you are satisfied. Sadly this is still a brute force method to solve this problem.

What can you do if you have a large set of features, which just can’t be cut down to size?

3. Minimum Redundancy Maximum Relevance Feature Selection

Bringing all of the ideas together into one algorithm there is mRMR feature selection. The idea behind this algorithm is that you want to minimize the redundancy of features while maximizing the relevancy.

To do this, we have Equations for calculating relevancy and Redundancy.

Image for post

Let’s write a quick script to implement mRMR using out made-up data.

Image for post

I was not expecting this outcome. Consumption of Ice cream seems to model ticket sales well, were as temperature not so much. In this example looks like we only need one variable to accurately model ticket sales, but this could be completely different in your problem.

Get the mRMR code here


You should have a better understanding of feature selection methods that help you reduce the total number of features to those that support the best model for the desired target.

Don’t forget to share what you learn with your ML tribe. Don’t know what a Machine Learning tribe is? Check out this article to learn more about ML tribes and why they are important to be a part of.

Read: The Engineers Guide to Machine Learning Data Types

Read: How I Created a 40,000 Labeled Audio Dataset in 4 Hours of Work and $500

Thanks for reading 🙂 Let’s also connect on MediumTwitterLinkedIn, or email