Natural Language Processing | Machine Learning | Chat GPT
Exploring the architecture of OpenAI’s Generative Pre-trained Transformers.
In this article we’ll be exploring the evolution of OpenAI’s GPT models. We’ll briefly cover the transformer, describe variations of the transformer which lead to the first GPT model, then we’ll go through GPT1, GPT2, GPT3, and GPT4 to build a complete conceptual understanding of the state of the art.
Who is this useful for? Anyone interested in natural language processing (NLP), or cutting edge AI advancements.
How advanced is this post? This is not a complex post, it’s mostly conceptual. That said, there are a lot of concepts, so it might be daunting to less experienced data scientists.
Pre-requisites: I’ll briefly cover transformers in this article, but you can refer to my dedicated article on the subject for more information.
Before we get into GPT I want to briefly go over the transformer. In its most basic sense, the transformer is an encoder-decoder style model.
The encoder converts an input into an abstract representation which the decoder uses to iteratively generate output.
both the encoder and decoder employ an abstract representations of text which is created using multi headed self attention.