SQL can now replace Python for most supervised ML tasks. Should you make the switch?

Dario Radečić

Towards Data Science

How to Train a Decision Tree Classifier… In SQL | by Dario Radečić | Apr, 2024 - image  on https://aiquantumintelligence.com
Photo by Resource Database on Unsplash

When it comes to machine learning, I’m an avid fan of attacking data where it lives. 90%+ of the time, that’s going to be a relational database, assuming we’re talking about supervised machine learning.

Python is amazing, but pulling dozens of GB of data whenever you want to train a model is a huge bottleneck, especially if you need to retrain them frequently. Eliminating data movement makes a lot of sense. SQL is your friend.

For this article, I’ll use an always-free Oracle Database 21c provisioned on Oracle Cloud. I’m not sure if you can translate the logic to other database vendors. Oracle works like a charm, and the database you provision won’t cost you a dime — ever.

I’ll leave the Python vs. Oracle for machine learning on huge dataset comparison for some other time. Today, it’s all about getting back to basics.

I’ll use the following dataset today:

  • Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. University of California, Irvine, School of Information and Computer Sciences. Retrieved…

Source link