Overview:
-
An AI-based software program principally consists of a mathematical model with trained parameters.
-
The mathematical model typically has the form of an artificial neural network (ANN) comprising several layers.
Here is a very simple example:
The illustrated exemplary neural network is a feed forward network, i.e. the input is fed on the left side and the output leaves on the right side. Each node in this example is connected with all other nodes of the neighboring layers (= fully connected network). It is called “neural” as it has been inspired by a biological brain, wherein the nodes represent “neurons” and the connecting edges represent “synapses”.
The nodes typically comprise activation functions, such as ReLU or Sigmoid, which are able to “fire” (almost like neurons):
In case the input to a node is high (cf. X-axis of Sigmoid function), the node “fires”, i.e. will output a significantly high value up to <1 (cf. Y-axis of Sigmoid function).
The edges connecting the nodes merely transfer the respective output values from one node to another. However, in the edges the transferred values are multiplied by “learnable” parameters, so-called “weights”. We will see below what that means. Note that the illustrated exemplary network is small with only a few edges, i.e. parameters. As noted, GPT-4 has 1 Trillion of them…. have fun in drawing the model on single a page!
Any layers between the input and output layers are referred to as “hidden layers”. Due to these several layers, the neural network forms a “DEEP” neural network. The network thus an exemplary “Deep Learning” application. Today, all used neural networks are “deep”, i.e. have several hidden layers. Hence, if an AI-based technology uses a neural network, you can assume that the AI concerns deep learning. Anyway, note that deep learning and machine learning are often used interchangeably by data scientists (it’s not a precise science re. the used terminology, as you will see later). Moreover, the term “neural network” is often simply replaced by the term “model”, short version of mathematical model (we will use this term from here on).
By the way, there is nothing mysterious about hidden layers in a neural network (model) beside the fact that you cannot directly measure their behavior at the model’s output. For this reason, there is a whole field of research referring to interpretability of machine learning, i.e. how the single hidden layers contribute to the final output of the model. Anyway, also the outputs of hidden layers can be measured and evaluated, see the following illustration of the single processing steps in a deep neural model processing (classifying) an image:
Accordingly, the image (i.e. its pixel data) is fed through the single layers of the model, wherein in each layer features are extracted having an increasingly abstraction level: starting from simple features (edges, etc.) to more and more abstract features, until the output provides a completely abstract concept, e.g. “this photo is classified as showing an American tourist”.
Great, so we now have a rough idea of what’s happening inside an AI model, i.e. a neural network. But how can the AI model (neural network) actually learn to distinguish American tourists from e.g. a persian cat? Have a look to AI Basics Part 3!
author: Christoph Hewel
email: hewel@paustian.de
(photo: Roussillon, PACA, France. Maybe the AI core looks like a giant snail?)