Image created by Author with DALL•E 3
Are you looking for handy quick references for a variety of topics on data science, machine learning, Python programming, data engineering, and AI? Do you want to stay updated while enhancing your skills in these areas? The collection of cheat sheets that KDnuggets has created over the course of 2023 aims to help you accomplish these goals.
You will find these cheat sheets to be valuable resources for keeping you at the forefront of some of the most useful and relevant tools, technologies and concepts of this year. Whether you’re a seasoned data scientist, a budding machine learning enthusiast, or a data engineering professional, these professionally-crafted resources will undoubtedly provide nugget-sized bullet points of importance.
From the practical applications of ChatGPT in data science to mastering valuable data tools such as GitHub CLI, Plotly Express, and cuDF, each cheat sheet is designed to offer concise, actionable insights. Learn machine learning with Streamlit. Explore data cleaning with Python. Venture into the realm of AI with helpful Chrome extensions and generative AI tools. Consider this collection your gateway to mastering (and reinforcing over time) complex concepts and tools, ensuring you stay ahead in the field.
So go ahead and check out the following cheat sheets from KDnuggets and see what insights are available.
ChatGPT (and, indeed, the most robust and latest versions of GPT3) is meant to assist (that’s right… assist!) humans that decide to use it as such, and with a little help from your friends at KDnuggets you will be able to hone your prompt engineering skills to do useful things like generate code, assist in your research process, and analyze data.
The GitHub CLI, unsurprisingly, is the GitHub tool that allows for interaction with the GitHub platform with the command line interface. Mastering the most-used commands will allow you to become a productive of a development team, be that a web app development team, or more specifically for our purposes, a data science, data engineering, or machine learning engineering team.
The cheat sheet first addresses getting started, such as installing the library and its basic syntax. Next, the resources covers creating common chart types with Plotly Express, including: Scatter plot, histogram, density heatmap, pie chart, box plot. Finally, you will gain some exposure to plot customization, including adjusting markers and layouts.
Getting started with cuDF is straightforward, especially if you have experience using Python and libraries like Pandas. While both cuDF and Pandas offer similar APIs for data manipulation, there are specific types of problems in which cuDF can provide significant performance improvements over Pandas, including large scale datasets, data preprocessing and engineering, real-time analytics, and, of course, parallel processing. The bigger the dataset, the greater the performance benefits.
Mastering data science interviews are a skill all their own, and preparing for them is the key to success. Just as I was once told that learning how to write university examinations is a skill all of its own, beyond learning the material on which you are being tested, specialized technical job interviews are very similar.
For an overview of what we believe to be the 10 of the best ChatGPT plugins for data science, check out our latest cheat sheet, conveniently named 10 ChatGPT Plugins for Data Science Cheat Sheet. You’ll find plugins for coding, analysis, web searching, document interrogation, and more.
Putting machine learning and Streamlit together is a popular option for data scientists and other data professionals looking to experiment on data, prototype, or share results. Knowing how to quickly turn around data apps is becoming an essential skill for data folks, and this combination certainly allows for this. If you don’t know how to use Streamlit, we suggest you learn now.
With ChatGPT, building a machine learning project has never been easier. By simply writing follow-up prompts and analyzing the results, you can quickly and easily train the model to respond to user queries and provide helpful insights. In this cheat sheet, learn how to use ChatGPT to assist with the following machine learning tasks: Project planning, feature engineering, data preprocessing, model selection, hyperparameter tuning, experiment tracking, and MLOps.
Scikit-learn’s unified API interface makes learning how to implement a variety of algorithms and tasks much easier than it would otherwise be. Once you learn the pattern of how to make Scikit-learn calls, you are off and running. The only thing you need after this, beyond your imagination and determination, is a handy reference. This cheat sheet covers the basics of what is needed to learn how to use Scikit-learn for machine learning, and provides a reference for moving ahead with your machine learning projects.
Docker has become an essential data science tool to assist in the building of reproducible and scalable environments. Docker allows code and dependencies to be packaged in containers, which lets data scientists distribute their models across different platforms. This assists in both development and production, and works to prevent errors and inconsistencies that can arise from different versions of software or hardware configurations.
In graph queries we lose some syntax from SQL and gain other syntax. SELECT has been replaced by MATCH. FROM and JOIN have been discarded. But the WHERE and ORDER BY commands are used in the same way. Aggregate functions like SUM and AVG are all there, but the GROUP BY has been discarded. Most importantly, though, we gain the ability to query patterns in the graph using the node relationships. In the attached Cheat Sheet, you will see a list of most-commonly used query approaches.
In this cheat sheet, we go from detecting and handling missing data, dealing with duplicates and finding solutions to duplicates, outlier detection, label encoding and one-hot-encoding of categorical features, to transformations, such as MinMax normalization and standard normalization. Moreover, this guide exploits the methods provided by three of the most popular Python libraries, Pandas, Scikit-Learn and Seaborn for displaying plots.
The state of flow control has come a long way since the days of goto. There are numerous common execution patterns that are available in the majority of modern programming languages, though their syntax differs from language to language. Python has its own, generally quite readable, set of flow controls, and that’s what our latest cheat sheet focuses on. Get ready to learn flow control, and to have a handy reference moving forward as you conquer the world of coding.
The selection of tools presented on this cheat sheet includes SciSpace Copilot, an AI-powered research assistant designed to help you understand the text, math, and tables in scientific literature. Fireflies, an AI assistant powered by GPT-4, is also featured. This revolutionary tool can surf the web and summarize various types of content, including articles, YouTube videos, and emails, with human-like efficiency. And more.
Some highlights covered include OpenAI for accessing models like ChatGPT, Transformers for training and fine-tuning, Gradio for quickly building UIs to demo models, LangChain for chaining multiple models together, and LlamaIndex for ingesting and managing private data. Overall, this cheat sheet packs a wealth of practical guidance into one page. Both beginners looking to get started with generative AI in Python as well as experienced practitioners can benefit from having this condensed reference to the best tools and libraries at their fingertips.
With LangChain, developers can build capable AI language-based apps without reinventing the wheel. Its composable structure makes it easy to mix and match components like LLMs, prompt templates, external tools, and memory. This accelerates prototyping and allows seamless integration of new capabilities over time. Whether you’re looking to create a chatbot, QA bot, or multi-step reasoning agent, LangChain provides the building blocks to assemble advanced AI rapidly.
The cheat sheet links to tutorials for each project, walking through step-by-step implementation leveraging ChatGPT’s conversational prompts. Highlights include using ChatGPT for a loan approval classifier model, resume parser, real-time language translator, exploratory data analysis, and even integrating its capabilities into Google Sheets. Whether you’re new to ChatGPT or looking to push its boundaries, this collection of projects acts as a launch pad to boost productivity and accelerate AI-assisted development.
Matthew Mayo (@mattmayo13) holds a Master’s degree in computer science and a graduate diploma in data mining. As Editor-in-Chief of KDnuggets, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, machine learning algorithms, and exploring emerging AI. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.