After a few applied machine learning problems, you usually develop a pattern or process for quickly getting started and achieving good results. Once you have this process, it is vital to use it again and again on project after project. The more developed your process, the faster you can get results!
Let me give you a head start and teach you a 5-step systematic process that I developed while becoming a machine learning engineer. This is just a starting point and you should feel free to change it to suit your needs.
Define the problem
This step is all about learning more about the problem at hand. Familiarize yourself with the domain and understand why you are building this solution.
To help facilitate this, always ask yourself the questions below.
- What is the problem? Describe what the problem is formally and informally. Make sure you list assumptions you are making and any problems that are similar
- Why does the problem need to be solved? List any motivations for solving the problem. What are the benefits a solution brings and how would you use it?
- How would I solve the problem? Describe how the problem would be solved manually to build up domain knowledge
Do you understand the data you have been given? Lots of people skip over this step because it is often tedious. The truth — it is super important. This work forces you to think about the data in the context of the problem before it gets lost in the craziness of algorithms
- Data Selection: Consider what data is available to you. Is there any data missing? Can you remove any data?
- Data Preprocessing: Organize your selected data. Format it, clean it and take a sample from it
- Data Transformation: Process your ready data for machine learning by engineering its features using scaling, attribute decomposition, and attribute aggregation.
Explore different Algorithms
Now that you have your data it’s time to try out a bunch of different standard machine learning algorithms.
Typically, you would run 10–20 standard algorithms on the transformed and scaled versions of the dataset you prepared in the last step.
The main goal of trying all of these different algorithms and dataset combinations it spreading your net far and wide. See what works and what doesn’t then go from there. More detailed explorations will follow with well-performing algorithms.
After you have finished exploring the different algorithms and picked one that works well for your dataset it is time to squeeze out the best results from it.
You can do this in a few ways, but it’s important to make sure that your results are significant at this point because hyper-parameter tuning isn’t going to turn a crap result into a good result. It will just help you squeeze out a bit more performance.
Here are some standard ways to improve an already working algorithm.
- Hyper-parameter Tuning: All algorithms have hyper-parameters and making sure these are optimal is key to getting the best performance.
- Ensemble Methods: Where predictions are made by combining multiple models
- Extreme Feature Engineering: Attribute decomposition and aggregation seen in data preparation is pushed to the limits
The results of a complex machine learning problem are often meaningless in a vacuum. It’s important to put them in context.
This typically means a presentation to stakeholders. This applies to big meetings with CEOs and online competitions. It’s good practice and gives everyone involved a good understanding of the problem and how you solved it.
Here is a quick template for you to present your results:
- Why: Define the environment that the problem exists in and set up a motivation for the solution
- Question: Describe the problem as a question that you went out and answered.
- Solution: Concisely describe the solution as an answer to the question you just posed
- Findings: List out all of the discoveries you made while solving the problem.
- Limitations: Clearly go over the limitations of the model. What is it not good at and what can be done better.
- Conclusions: Go back to the why, question, and solutions and tie them together in a way that makes it easy to remember.
Remember that this is not the end all be all of the processes, but it is a good step towards becoming a machine learning engineer.
Check this out: Step 3: Pick a Tool
If you have a technique that worked for you on picking a process or would like to propose a better point I’ve missed, do let us know in the comments section below. We love hearing from our readers, and your remarks are much appreciated.
If you have any machine learning questions, contact us here, we are more than happy to discuss your case. Happy Machine Learning!