7 Stages of Machine Learning

Step by step guide on ML model building

Posted by Admin on 2022-04-02 20:23:10

We live in a world drowning in data. Corporates generates huge amount of data which can be analyzed to find the relationship between the parameters or to make data driven business decisions. That's where the machine learning shines. Machine learning can be used to analyze these data & make data driven business decisions. The applications of machine learning are widespread as it is fast becoming an integral part of different fields such as medicine, e-commerce, banking, etc. These business may be entirely different but machine learning has some very specific stages which can be applied on any business cases.

Lets dive in to understand the stages of machine learning from its inception to practical application!

There are 7 major stages in Machine learning

  1. Problem Definition
  2. Data Collection
  3. Data Preparation
  4. Data Visualization
  5. ML Modeling
  6. Model Evaluation
  7. Model Deployment

7 Stages of ML

1. Problem Definition

You need to thoroughly understand problem as its very important to be very clear with the expectation from the stakeholders. You need to have the domain knowledge as well as the current situation in business world. As the saying goes, curiosity is the mother of invention, you must ask questions to get clarity on the business as well as the problem its facing.

Some examples:

  1. What is the business
  2. What are the parameters that determine the solution
  3. What is the outcome expected
  4. What is the measurable business goal
  5. Any domain specific questions

2. Data Collection

This is one of the crucial part of Machine learning. The Quality of a models predictability depends entirely on the input that you provide. Model finds the correct pattern in the input data if the data provided is more reliable. We can collect the data from the stakeholders, Publicly available data such as libraries, data hosting sites etc. Also, certain business problems requires fresh data as the models prediction quality decreases over time, if the model is not trained on fresh data.

3. Data Preparation Once the data is collected, the next step is to prepare the data for the modeling. In real life, data will have loads of noises which impacts the quality of the data. Hence, we need to perform certain preprocessing operations on the gathered data before we feed the data to the model. Removing the unwanted features, Handling the missing values, Handling Outliers, Handling categorical features etc are some of the important preprocessing steps to ensure the model gets the noiseless data to learn & predict.

4. Data Visualization

Visualization can reveal a tons of hidden relationship among the input features as well as with the target variable(Output feature). Its a technique to represent the data pictorially. Also, if you are giving presentation, visualization help you convey your thoughts more effectively.

5. ML Modelling

This is where the actual magic happens. You need to identify the type of problem you are dealing with. If you are trying to find the relationship between the feature & target variable, then go ahead with the regression model. This is a predictive modeling & is used to predict the continuous values. Also, if you want to predict the class label for the given set of data , then go ahead with the classification model. Basic idea here is to identify the most desired parameters which help you in predicting the target variable. Its advised to split the dataset into Train & Test samples. So that , model will be trained on train samples & will be tested on the test samples for the accuracy.

6. Model Evaluation

When it comes to machine learning problems, you will encounter a lot of different types of metrics in the real world. Sometimes, you may need to create metrics that suits the business problem. Once we build the model we need to measure its predictability by important measuring aspect. This depends on case to case. Majorly we try to understand models predictability with respect to the desired outcome. Based on the type of modeling we can use following metrics to evaluate the model.

For Classification Problems:

  1. Accuracy
  2. Precision
  3. Recall
  4. F1 Score
  5. AUC / ROC curve
  6. Log Loss etc.

For Regression Problems:

  1. Mean Absolute Error (MAE)
  2. Mean Squared Error (MSE)
  3. Root Mean Squared Error (RMSE)
  4. R squared etc...

You may consider fine tuning the hyperparameters, if the evaluation metrics leads to poor prediction. You can drop unwanted /corelated features etc. You can also create a new feature from the existing 2 or more features. This may increase the accuracy. Multiple iterations along with hyperparameter tuning can be made till our models prediction is satisfactory.

7. Deployment This is the final stage in machine learning. If the models prediction capability is satisfactory, then we can deploy the ML model in the live environment. Model may need to be regularly evaluated as the models performance may decrease over time.

Final Notes

Performing all these operations on a small dataset would not be a challenge. But when it comes to huge dataset, its better to create pipeline to automate & handle the data systematically. Pipeline is a way to codify & automate the workflow it takes to build a Machine Learning model. This may consist of sequential operations from Data preprocessing to model deployment.