Becoming a Machine Learning Engineer | Follow a Process

Let me give you a head start and teach you a 5-step systematic process that I developed while becoming a machine learning engineer. This is just a starting point and you should feel free to change it to suit your needs.

Define the problem

To help facilitate this, always ask yourself the questions below. 

  • What is the problem? Describe what the problem is formally and informally. Make sure you list assumptions you are making and any problems that are similar
  • Why does the problem need to be solved? List any motivations for solving the problem. What are the benefits a solution brings and how would you use it?
  • How would I solve the problem? Describe how the problem would be solved manually to build up domain knowledge

Prepare Data

Do you understand the data you have been given? Lots of people skip over this step because it is often tedious. The truth — it is super important. This work forces you to think about the data in the context of the problem before it gets lost in the craziness of algorithms

  • Data Selection: Consider what data is available to you. Is there any data missing? Can you remove any data?
  • Data Preprocessing: Organize your selected data. Format it, clean it and take a sample from it
  • Data Transformation: Process your ready data for machine learning by engineering its features using scaling, attribute decomposition, and attribute aggregation.

Explore different Algorithms

Typically, you would run 10–20 standard algorithms on the transformed and scaled versions of the dataset you prepared in the last step.

The main goal of trying all of these different algorithms and dataset combinations it spreading your net far and wide. See what works and what doesn’t then go from there. More detailed explorations will follow with well-performing algorithms.

Improve Results

Here are some standard ways to improve an already working algorithm.

  • Hyper-parameter Tuning: All algorithms have hyper-parameters and making sure these are optimal is key to getting the best performance.
  • Ensemble Methods: Where predictions are made by combining multiple models
  • Extreme Feature Engineering: Attribute decomposition and aggregation seen in data preparation is pushed to the limits

Present Results

The results of a complex machine learning problem are often meaningless in a vacuum. It’s important to put them in context.

This typically means a presentation to stakeholders. This applies to big meetings with CEOs and online competitions. It’s good practice and gives everyone involved a good understanding of the problem and how you solved it.

Here is a quick template for you to present your results:

  • Why: Define the environment that the problem exists in and set up a motivation for the solution
  • Question: Describe the problem as a question that you went out and answered.
  • Solution: Concisely describe the solution as an answer to the question you just posed
  • Findings: List out all of the discoveries you made while solving the problem.
  • Limitations: Clearly go over the limitations of the model. What is it not good at and what can be done better.
  • Conclusions: Go back to the why, question, and solutions and tie them together in a way that makes it easy to remember.

Remember that this is not the end all be all of the processes, but it is a good step towards becoming a machine learning engineer.

Check this out: Step 3: Pick a Tool

If you have a technique that worked for you on picking a process or would like to propose a better point I’ve missed, do let us know in the comments section below. We love hearing from our readers, and your remarks are much appreciated.

Thanks for reading 🙂 If you enjoyed the post, share the article with anyone you think needs it. Let’s also connect on TwitterLinkedIn, or follow me on Medium

If you have any machine learning questions, contact us here, we are more than happy to discuss your case. Happy Machine Learning!