Generative AI: Fundamentals and Applications
What is Generative AI?
Generative AI is a sub-field of artificial intelligence that focuses on machines creating new and original content, such as images, music, or text. Unlike traditional AI, which is designed to analyze and make decisions based on existing data, generative AI is designed to create new data on its own.
The goal of generative AI is to teach machines to generate content that is indistinguishable from human-created content. These models don't necessarily know whether the things they produce are accurate, and we have little way of knowing where the information has come from and how it has been processed by the algorithms to generate content.
Fig. 1: Illustration of generative AI using Midjourney
What can Generative AI provide?
Content generation: OpenAI's GPT-3 can be used to generate articles, essays, and even entire books.
Chatbots and Virtual Assistants: these systems can be used in a variety of contexts, such as customer service, healthcare, and education.
Creative writing: generative AI models can assist writers in the creative process by suggesting ideas, generating story-lines, and even writing entire paragraphs of text.
Medical research: support of new drug molecules and prediction of the properties of existing drugs, which can help researchers in drug discovery.
Simulation and prediction: simulation of complex systems and predictions about the future behavior of those systems, such as urban traffic flow prediction or demand prediction of a given consumer good.
Image and video synthesis: create realistic images and videos of people, places, and objects.
Video game development: generative AI can be used to generate game content such as maps, quests, and items.
Data mining and analytics: text summarization, sentiment analysis, named entity recognition, speech recognition and synthesis, image annotation, text-to-speech synthesis, spell correction, machine translation, recommendation systems, fraud detection and code generation.
Fig. 2: Illustrative image of Wiimer's headoffice generated by Midjourney
How does it work?
Generative AI performs a deep learning model to generate new data that is similar to the training data it was trained on. The process generally involves feeding the model input data and learning patterns and relationships within that data.
The general idea is applying the model to learn the underlying patterns and relationships in the input data, and then use those patterns to generate new data that is similar to the input data.
The main types of generative AI techniques are the autoregressive models and the generative adversarial networks (GANs). The autoregressive models generate new data by iteratively predicting the next value in a sequence based on the preceding values. One can give as examples the GPT (Generative Pre-trained Transformer), LSTM (Long Short-Term Memory) networks, and RNN (Recurrent Neural Network) models.
On their side, GANs generate new data by training two neural networks simultaneously: a generator network and a discriminator network. The generator network creates new samples of data, while the discriminator network evaluates the authenticity of the generated samples. GANs have been used for a variety of applications, such as generating realistic images and videos, and creating 3D models.
Autoregressive models tend to be better for generating sequences of data, while GANs are better suited for generating complex, high-dimensional data like images and videos.
Fig. 3: Fractal illustration of generative AI algorithms using Midjourney
Training Generative AI
Data collection and preprocessing
The first step of the framework consists of gathering the training data set, which is the resource that it will be trained on. The data can be provided through various sources such as books, websites, articles, and open datasets. Popular public sources to find datasets are Kaggle, Google Dataset Search, Hugging Face and Wikipedia.
The data then needs to be cleaned and prepared for training. This may involve converting the dataset to lowercase, removing stop words, and tokenizing the text into sequences of tokens that make up the text. After such process, the data can also be called “corpus”.
Large models such as Google’s BERT and OpenAI’s GPT-3 both use transformer deep learning architecture. The key elements to configure are:
Number of layers in transformer blocks: refers to the number of times these sub-layers are stacked on top of each other. Increasing the number of layers can increase the model's capacity to learn complex patterns, but it can also make the model harder to train and more prone to over-fitting.
Number of attention heads: it determines how many ways the model can attend to the input, and thus how much information it can extract from the input. It can improve the model's ability to capture complex patterns in the input, but it can also increase the model's computational complexity and memory usage.
Loss function: it is a measure of how well the model's predictions match the true labels or targets. The goal of training a machine learning model is to minimize the loss function.
Hyperparameters: these are parameters set by the user before the training begins. Choosing appropriate hyperparameters can greatly impact the performance of the model and is often done through trial and error or by using automated methods such as grid search or random search.
Sources: ChatGPT, zdnet, AI Multiple, Visual Capitalist