Natural Language Processing | Machine Learning | Chat GPT

Exploring the architecture of OpenAI’s Generative Pre-trained Transformers.

Towards Data Science
GPT — Intuitively and Exhaustively Explained | by Daniel Warfield | Dec, 2023 - image  on https://aiquantumintelligence.com
“Mixture Expert” by the author using MidJourney. All images by the author unless otherwise specified.

In this article we’ll be exploring the evolution of OpenAI’s GPT models. We’ll briefly cover the transformer, describe variations of the transformer which lead to the first GPT model, then we’ll go through GPT1, GPT2, GPT3, and GPT4 to build a complete conceptual understanding of the state of the art.

Who is this useful for? Anyone interested in natural language processing (NLP), or cutting edge AI advancements.

How advanced is this post? This is not a complex post, it’s mostly conceptual. That said, there are a lot of concepts, so it might be daunting to less experienced data scientists.

Pre-requisites: I’ll briefly cover transformers in this article, but you can refer to my dedicated article on the subject for more information.

Before we get into GPT I want to briefly go over the transformer. In its most basic sense, the transformer is an encoder-decoder style model.

GPT — Intuitively and Exhaustively Explained | by Daniel Warfield | Dec, 2023 - image  on https://aiquantumintelligence.com
A transformer working in a translation task. The input (I am a manager) is compressed to some abstract representation that encodes the meaning of the entire input. The decoder works recurrently, by feeding into itself, to construct the output. From my article on transformers

The encoder converts an input into an abstract representation which the decoder uses to iteratively generate output.

GPT — Intuitively and Exhaustively Explained | by Daniel Warfield | Dec, 2023 - image  on https://aiquantumintelligence.com
high level representation of how the output of the encoder relates to the decoder. the decoder references the encoded input for every recursive loop of the output. From my article on transformers

both the encoder and decoder employ an abstract representations of text which is created using multi headed self attention.

Source link