7 Tips to Future-Proof Machine Learning Projects | by Destin Gong

7 Tips to Future-Proof Machine Learning Projects | by Destin Gong | Feb, 2024

There can be a knowledge gap when transitioning from exploratory Machine Learning projects, typical in research and study, to industry-level projects. This is due to the fact that industry projects generally have three additional goals: collaborative, reproducible, and reusable, which serve the purpose of enhancing business continuity, increasing efficiency and reducing cost. Although I am no way near finding a perfect solution, I would like to document some tips to transform a exploratory, notebook-based ML code to industry-ready project that is designed with more scalability and sustainability.

I have categorized these tips into three key strategies:

Improvement 1: Modularization — Break Down Code into Smaller Pieces
Improvement 2: Versioning — Data, Code and Model Versioning
Improvement 3: Consistency — Consistent Structure and Naming Convention

Problem Statement

One struggle I have faced is to have only one notebook for the entire data science project — which is common while learning data science. As you may experience, there are repeatable code components in a data science lifecycle, for instance, same data preprocessing steps are applied to transform both train data and inference data. If not handled properly, it results in different versions of the same function are copied and reused at multiple locations. Not only does it decrease the consistency of the code, but it also makes troubleshooting the entire notebook more challenging.

Bad Example

train_data = train_data.drop(['Evaporation', 'Sunshine', 'Cloud3pm', 'Cloud9am'], axis=1)
numeric_cols = ['MinTemp', 'MaxTemp', 'Rainfall', 'WindGustSpeed', 'WindSpeed9am']
train_data[numeric_cols] = train_data[numeric_cols].fillna(train_data[numeric_cols].mean())
train_data['Month'] = pd.to_datetime(train_data['Date']).dt.month.apply(str)inference_data = inference_data.drop(['Evaporation', 'Sunshine'…

Source link

7 Tips to Future-Proof Machine Learning Projects | by Destin Gong | Feb, 2024

7 Tips to Future-Proof Machine Learning Projects | by Destin Gong | Feb, 2024

Problem Statement

Popular Posts

Meeting minutes generation with ChatGPT 4 API, Google Meet, Google Drive & Docs APIs | by Offer SADEY

How to implement Adaptive AI in your business | by LeewayHertz

Recent Posts

Recent Comments

Archives