Image Generated by DALL-E 2
The current trend in the machine-learning world is all about advanced models. The movement fueled mainly by many courses’ go-to model is the complex model, and it looks much more incredible to use a model such as Deep Learning or LLMs. The business people also didn’t help with this notion as they only saw the popular trend.
Simplicity doesn’t mean underwhelming results. A simple model only means that the steps it uses to deliver the solution are less complicated than the advanced model. It might use fewer parameters or simpler optimization methods, but a simple model is still valid.
Referring to the philosophy principle, Occam’s Razor or Law of Parsimony states that the simplest explanation is usually the best one. It implies that most problems can usually be solved through the most straightforward approach. That’s why simple model value is in its simple nature to solve the problem.
A simple model is as important as any kind of model. That is the crucial message the article wants to convey, and we will explore why. So, let’s get into it.
When we talk about simple models, what constitutes a simple model? Logistic regression or naive Bayes is often called a simple model, while neural networks are complex; how about random forest? Is it a simple or complex model?
Generally, we didn’t classify Random Forest as a simple model but often hesitated to classify it as complex. This is because no strict rules govern the model’s simple level classification. However, there are a few aspects that might help to classify the model. They are:
– Number of Parameters,
– Computational efficiency.
These aspects also affect the advantages model. Let’s discuss them in more detail.
Number of Parameters
The parameter is an inherent model configuration that is learned or estimated during the training process. Different from the concept of the hyperparameter, the parameter can’t be set initially by the user but is affected by the hyperparameter choices.
Examples of parameters include Linear Regression coefficient, Neural Network weight and biases, and K-means cluster centroid. As you can see, the values of the model parameters change independently as we learn from the data. The parameter value is constantly updated in the model iteration until the final model is present.
Linear regression is a simple model because it has few parameters. The Linear Regression parameters are their coefficients and intercept. Depending on the number of features we train, Linear Regression would have n+1 parameters (n is the number of feature coefficients plus 1 for the intercept).
Compared to the Neural Network, the model is more complex to calculate. The parameter in NN consists of the weights and biases. The weight would depend on the layer input (n) and the neurons (p), and the weight parameter number would be n*p. Each neuron would have its bias, so for each p, there would be a p bias. In total, the parameters would be around (n*p) + p number. The complexity then increases with each addition of layers, where each additional layer would increase (n*p) + p parameters.
We have seen that the number of parameters affects model complexity, but how does it affect the overall model output performance? The most crucial concept is it affects the overfitting risks.
Overfitting happens when our model algorithm has poor generalization power because it’s learning the noises in a dataset. With more parameters, the model could capture more complex patterns in the data, but it also includes the noises as the model assumes they are significant. In contrast, a smaller parameter model has a limited ability means it is harder to overfit.
There are also direct effects on interpretability and computational efficiency, which we will discuss further.
Interpretability is a machine learning concept that refers to the ability of machine learning to explain the output. Basically, it is how the user could understand the output from the model behaviour. Simple model significant value is in their interpretability, and it’s a direct effect coming from a smaller number of parameters.
With fewer parameters, simple model interpretability becomes higher as the model is easier to explain. Furthermore, the model’s inner workings are more transparent as it’s easier to understand each parameter’s role than the complex one.
For example, the Linear Regression coefficient is more straightforward to explain as the coefficient parameter directly influences the feature. In contrast, a complex model such as NN is challenging to explain the direct contribution of the parameter to the prediction output.
Interpretability value is enormous in many business lines or projects as a particular business requires the output can be explained. For example, medical field prediction requires explainability as the medical expert needs to be confident with the result; it’s affecting individual life, after all.
Avoiding bias in the model decision is also why many prefer to use a simple model. Imagine a loan company trains a model with a dataset full of biases, and the output reflects these biases. We want to eliminate the biases as they are unethical, so explainability is vital to detect them.
Another direct effect of fewer parameters is an increase in the computational efficiency. A smaller number of parameters means less time to find the parameters and less computational power.
In production, a model with higher computational efficiency would become more accessible to deploy and have a shorter inference time in the application. The effect would also lead to simple models being more easily deployed on resource-constrained devices such as smartphones.
Overall, a simple model would use fewer resources, translating to less money spent on the processing and deployment.
We might undervalue a simple model because it doesn’t look fancy or doesn’t provide the most optimum metrics output. However, there are many values we can take from the Simple model. By taking a look at the aspect that classifies model simplicity, the Simple model brings these values:
– Simple Models have a smaller number of parameters, but they also decrease the risk of overfitting,
– With fewer parameters, the Simple model provides a higher explainability value,
– Also, fewer parameters mean that the Simple model is computationally efficient.