In its recent decision T 1669/21, the EPO has provided clear guidance on the sufficiency of disclosure (Art. 83 EPC) for inventions in the field of applied AI. The emphasis is on specifying input and output parameters and their relationships in a way that ensures a clear and plausible connection. Importantly, it’s not about providing actual datasets but about making the invention implementable by detailing input/output formats, their technical content, and their interrelations.

In AI, the quality of disclosed relationships between inputs and outputs is critical. Without meaningful real-world correlations, even sophisticated machine learning models fail to deliver, reinforcing the principle of “nonsense in, nonsense out.” The term “data science” underscores that great AI depends on solid, relevant data.

On the positive side, if input and output parameters are properly described (and claimed), the AI model itself can often be briefly specified in the description and defined broadly in claim 1, such as a “machine learning model.” This approach allows inventors to secure protection for a “blackbox” AI solution in applied AI, where the focus is on the AI’s application field rather than the model’s internal workings.

Key takeaways for developers, inventors and patent practitioners:

  1. Input-Output Data: The Magic Sauce

    • Be Specific, at least in dependent claims:

      If input is “production data” get granular: “temperature at Point X, pressure at Time Y, using Sensor Z.” Specificity wins here. Same for outputs—clearly define what your model predicts.

    • Explain the Process:

      Share in the description how data is collected. What’s being measured? Where and when? What tools or sensors? Concrete details = happy examiners.

    • Correlation is Key:

      Make it clear why your inputs predict your output. Got experiments or existing field knowledge? Use them. You don’t need to prove 99% accuracy, but you do need plausible reasoning (e.g., “Output A spikes when Input B changes”).

    • Brag About Your Novelty:

      Found a new input-output relationship? Share it! A novel parameter or unique correlation can make the difference between rejection and getting your AI blackbox patented.

  2. Training Data: Teach Your Model Well

    • Explain How It Learns:

      Mention training methods in the description like “supervised learning,” the loss function, and any unique tricks.

    • Name  the Source:

      Where’s the training data from? If it’s the same as inference data, say so. For supervised learning, explain how it’s labeled.

    • Cover Real-World Variations:

      Show your training data is diverse enough to handle real-world scenarios. No need for perfection—just demonstrate it’s realistic and covers relevant parameter variations. For example, if you model shall learn to distinguish between dogs, cats and humans, the described training data should cover all three.

  3. Taming the AI Model

    • Don’t Be Too Vague:

      Say “machine learning model” in claim 1. That’s still a blackbox, but can most probably dodge an over-breadth objection.

    • Show Some concrete Concepts:

Mention real-world stuff in the description like OpenCV, GitHub libraries, or basic architectures (e.g., CNN + classifier). This keeps it relatable for developers—and credible for examiners.

    • DIY if Needed:

Don’t have an implementation? No sweat! Just outline a simple AI model that matches your input and output. Remember, the bar is low: it just needs to work, not win Kaggle competitions.

    • Highlight What’s New:

Did you tweak the input layer or discover a cool new feature? Explain it! Novel AI features aren’t just good for Art. 83—they boost your inventive step argument too.

Let’s have deeper look into the decision:

 

Summary of T 1669/21:

In decision T 1669/21, the EPO Board of Appeal confirmed the revocation of EP patent EP 2 789 960 B1 due to insufficient disclosure under Art. 83 EPC. The patent, which pertained to an applied AI invention, was initially revoked following opposition. The patentee’s appeal was rejected, as the Board found that the invention was not sufficiently disclosed to enable the skilled person to implement it.

The decision provides significant guidance on the requirements for sufficient disclosure (Art. 83 EPC) concerning inventions in applied AI (cf. the related article on Patentability of AI – Part 3: Claim directed to technical application field (“Applied AI” / Dimension 2))

 

Some background to Art. 83 EPC:

According to Art. 83 EPC , “The European patent application shall disclose the invention in a manner sufficiently clear and complete for it to be carried out by a person skilled in the art.”

Importantly, Art. 83 EPC refers to the entire patent (application) and not only to the claims or claim 1. Hence, also any information given by the specification (e.g. the description of figures) is to be considered.

As further stated by the EPC Guidelines in this context (cf. EPC GL F-III 1)

A detailed description of at least one way of carrying out the invention must be given. Since the application is addressed to the person skilled in the art, it is neither necessary nor desirable that details of well-known ancillary features are given, but the description must disclose any feature essential for carrying out the invention in sufficient detail to render it apparent to the skilled person how to put the invention into practice. A single example may suffice, but where the claims cover a broad field, the application is not usually regarded as satisfying the requirements of Art. 83 unless the description gives a number of examples or describes alternative embodiments or variations extending over the area protected by the claims.

Hence, according to the EPC Guidelines a single embodiment which provides concrete details about the new aspects of the claimed invention can be sufficient to fulfill Art. 83 EPC. However, in case claim 1 is very broad (like in the case of EP 2 789 960 B1), the claimed scope must be covered by various embodiments over its whole extent.

These guidelines are confirmed by the discussed decision T 1669/21.

 

The invention of EP 2 789 960 B1

EP 2 789 960 B1 refers to a method for determining the condition of a fire-resistant lining of a metallurgical melting vessel. The method uses a calculation model (such as a neural network, cf. granted claim 9) for this determination.

In particular, claim 1 according to the Main Request in the Appeal defines:

(1a) A method for determining the state of the refractory lining of a vessel containing the molten metal, wherein

(1b) data of this refractory lining (12), such as materials, wall thickness, type of installation and others are detected or measured and evaluated,

 characterised in that

(1c) the following measured or established data of each vessel (10) are all collected and stored in a data structure, namely

 (1d) the initial refractory construction of the inner vessel lining (12), such as materials, material properties, wall thicknesses of blocks and/or injected materials as maintenance data;

 (1e) production data during use, such as amount of molten mass, temperature, composition of the molten mass or the slag and its thickness, tapping times, temperature profiles, treatment times and/or metallurgical parameters;

 (1f) wall thicknesses of the lining after using a vessel (10), at least at points with the greatest degree of wear;

 (1g) additional process parameters such as the manner of pouring or tapping the molten metal into or out of the vessel (10);

 (1h) that a calculation model is generated from at least some of the measured or ascertained data or parameters of the maintenance data, the production data, the wall thicknesses and the process parameters, by means of which these data or parameters are evaluated by means of calculations and subsequent analyses;

 (1i) wherein the calculation model is adapted from the measurements of the wall thicknesses of the lining (12) after a number of tappings by means of a regression analysis,

 (1j) by means of which the wear can be calculated taking into account the collected and structured data.

According to the Board, the features in red constitute input to the calculation model and features in green are the model’s output:

Schematic illustration of the claimed calculation model with its input/output

A closer look at the claimed method reveals two weaknesses:

  • The “calculation model” is unspecified (only claim 9 mentions a neural network).
  • The input data are described vaguely.

Additionally, the patent description is brief (two pages in the published version), which contributes to its revocation for insufficient disclosure. However, it is the quality and detail of the disclosure, not its length, that determines compliance with Art. 83 EPC.

 

The Board’s reasoning 

The Board identifies multiple aspects of the claimed invention as not meeting the requirements of Art. 83 EPC, with particular focus on the input / output data of the calculation model.

Insufficient disclosure of the “Calculation model”:

The Board criticized claim 1 for failing to specify the type of calculation model, such as a machine learning model (cf. point 1.2.3). While claim 9 mentions a neural network, the patent provides no details about it—neither the network’s design, node arrangement, connections, nor activation functions.

Furthermore, while various neural network designs existed before the priority date, they were not tailored to the problem addressed by the invention (cf. point 1.3.4). This leaves the burden of selecting and configuring a suitable model entirely on the skilled person.

This objection seems justified, as the patent lacks any information about the model’s structure, not even generic boilerplate on a potential neural network architecture.

However, adding a brief description of a possible ML model or network design could have likely resolved this issue. Just as a mind game, imagine EP 2 789 960 B1 specified a concrete neural network design in its description, such as:

The neural network may have several input branches, each dedicated to one type of data (e.g. refractory lining data, maintenance data, and process parameters). Each branch may have one fully connected layer with several nodes and ReLU activation, followed by a concatenation layer to merge the outputs. The concatenated output may then pass through two dense layers: the first with several nodes and ReLU activation, and the second with 1 node and a sigmoid activation function to output a predicted value between 0 and 1, representing the wall thickness.”

It remains speculative whether a short paragraph would have prevented the Board’s objection. However, including such a paragraph would likely have made it more challenging for the Board to revoke the patent due to insufficient disclosure of the model / neural network.

How to avoid Art. 83 objections against the claimed AI model

  • Limit the AI model in dependent claims:

If claim 1 refers to a generalized “AI model,” add fallback positions in dependent claims or the description, such as “machine learning model,” “deep learning model,” or “neural network,” to address any concerns about over-breadth, as seen in T 1669/21.

  • Specify the AI model in the description

Provide at least one specific embodiment of the model in the description:

    • Check if the invention uses software modules from public toolboxes or repositories (e.g., OpenCV, GitHub). If so, mention these modules briefly as an example. Consider whether alternative modules or basic model architectures (e.g., CNN with classifier) could serve the same purpose. This is common, as most developers today apply existing AI models to new technical problems, rather than creating models from scratch.
    • If you’re unsure how to implement the invention or which tools to use, create your own solution! Under Art. 83 EPC, sufficient disclosure only requires that the implementation works, not that it’s perfect. Focus on defining your input data, the required output, and design a model that matches input and output. See example above.
  • Focus on new aspects of the AI model

Emphasize the novel aspects of the model (if any). For instance, if using an existing AI model but modifying its input layer for your specific data, provide a detailed description of these adaptations. Remember, it’s not just about meeting Art. 83 EPC requirements, but also addressing the inventive step. Highlight any innovative features in detail, as they could serve as fallback positions to differentiate your claim from prior art.

Insufficient disclosure of the input data

No specification of concrete input parameters

Although claim 1 outlines the inputs to the calculation model (i.e., refractory lining data, maintenance data, and process parameters), the Board argues that these are defined as categories rather than specific parameters. The Board contends that the input is overly broad, using generic terms that encompass many possible concrete measurements. Additionally, some of these parameters are time-variable, necessitating clarification of when (e.g., during which stage of the production process) the measurements should be taken. This deficiency is highlighted by the Board, for example, in point 1.6.2 of the decision:

The patent does not contain any information or a single example of which specific measurements could be selected within the categories or are particularly meaningful for wear”.

Hence, the Board emphasizes that the specification of concrete measurements, i.e. of measured physical values, as input data to the model would be required.

This aligns with the EPC’s requirement for the technical character of computer-implemented inventions. By meeting the Art. 83 requirements as outlined in T 1669/21, the technical character and inventive step of the claimed invention can also be reinforced, as supported by EPC case law (e.g., D-I 9.2.4)

This view by the way matches with the EPC’s requirement to the technical character of a computer implemented invention. Hence, by fulfilling the Art. 83 requirements as outlined by the Board in T 1669/21, also the technical character / inventive step of the claimed invention can be strengthened, cf. e.g. EPC case law, D-I 9.2.4:

A technical character results either from the physical features of an entity or (for a method) from the use of technical means (T 641/00OJ 2003, 352T 1543/06).

According to T 208/84 (OJ 1987, 14), one indication of technical character is that the method has an overall technical effect, such as controlling some physical process (see also T 313/10).

Correspondingly, EPC GL G-II 3.3 sates in context of mathematical methods:

If steps of a mathematical method are used to derive or predict the physical state of an existing real object from measurements of physical properties, as in the case of indirect measurements, those steps make a technical contribution regardless of what use is made of the results”.

Likewise, EPC GL G-II 3.3.2 states in context of simulations:

Computer-implemented simulations that comprise features representing an interaction with an external physical reality at the level of their input or output may provide a technical effect related to this interaction. A computer-implemented simulation that uses measurements as input may form part of an indirect measurement method that calculates or predicts the physical state of an existing real object and thus make a technical contribution regardless of what use is made of the results”.

No relationship disclosed between input parameters and predicted output

he Board further emphasizes the need to plausibly demonstrate that the disclosed input parameters (if they were provided in the patent) are relevant for predicting the wear of the refractory lining. In particular, as stated at the end of point 1.6.5, the patent shall at least disclose the most relevant input parameters (implying that it is not required to disclose all howsoever relevant parameters).

In this context, a concrete embodiment should show that the input parameters enable a fundamentally successful prediction of the wear of the refractory lining. In my view, “fundamentally successful prediction” means that the prediction should be generally correct, i.e. more often right than wrong. However, the Board does not seem to require concrete evidence, such as experimental data or quantitative benchmarking of model accuracy.

The Board rejects the Patentee’s argument that the model can independently learn during training which input parameters are relevant for a successful prediction. Even though the skilled person is able to identify numerous input parameters, relevant input parameters might be missing, as the patent does not reveal the relevant ones. Consequently, no reliable correlation could be made by the trained model between the fed input parameters and actual wear of the refractory lining.

I agree with the Board’s insistence on the importance of disclosing these relationships. Essentially, the Board relies on the critical role of data science in AI. In machine learning (AI), it is essential that the input data is meaningfully correlated with the output (i.e., the parameter to be predicted) in the real world. AI models merely learn to capture and model these correlations to identify patterns in the input data. If no correlation exists between the chosen input parameters and the output, even the most sophisticated AI model cannot yield accurate results (aka “nonsense in, nonsense out”). This fundamental reliance on data is why the field is aptly named “data science” rather than “AI science”.

How to avoid Art. 83 objections against the claimed input/output data

  • Limit the input data in dependent claims:

If claim 1 refers to generalized input categories, add some concrete parameters as fallback positions in the dependent claims. No need to say that also the model output should be briefly specified.

  • Describe specific input parameters and how they are obtained

Add at least one concrete and workable embodiment which specifies concrete input parameters to the model.

    • Provide a detailed explanation of how the input parameters are obtained, focusing on the physical process involved. Specify the physical properties represented by these parameters, including when, where, and how they are measured, and the type of sensors or devices used. Emphasize that these physical properties underpin the technical character of the invention, contributing to its inventive step.
    • In this context, a “workable embodiment” means providing the skilled person with concrete guidance or instructions, ensuring they do not need to figure out all relevant details independently to implement the claimed invention. 
  • Describe the relationship between input and output

Explain why the disclosed input parameters are actually the most relevant ones to predict the model’s output.

    • The reasoning should be plausible, and explicit evidence, such as experiments, is generally not required. While proving the model’s accuracy through benchmarking is not necessary (but can be advantageous), it should be plausible that the model’s predictions are basically correct (i.e. rather right thang wrong).
    • In this context ”plausible reasoning” shall mean, either the correlation between the claimed inputs and outputs of the model is established in the prior art or the patent description provides at least implicit reasoning for the correlation, such as technical field knowledge or tests conducted by the applicant.
  • Focus on new aspects of the input / output data

Highlight the novel aspects of the input/output data and their relationship. For example, if your model uses a new input parameter or uncovers a new connection between an input and the predicted output, describe these relationships in detail. This is crucial not only for meeting Art. 83 EPC but also for addressing inventive step. Ideally, the EPO will recognize the relationship as novel and inventive, allowing you to secure a patent with an AI “blackbox” approach, offering strong protection since input/output relationships are often easier to detect than internal model properties.

Strong patent protection by focusing on input/output and keeping AI model generalized

Insufficient disclosure of training data

The Board further objects that the training data, including its origin and properties, is insufficiently disclosed. As a result, it is not plausibly demonstrated by the patent disclosure that the trained calculation model can reliably predict the wear of the refractory lining.

While the Board’s reasoning appears both technically and legally questionable, it offers clear takeaways. I will focus on the key lessons learned from the Board’s reasoning before addressing its shortcomings in detail.

How to avoid Art. 83 objections against the disclosed training data / training method

  • Briefly Explain the Training Method:

Indicate how the model is trained, e.g. by supervised learning, and if applicable, specify any unique aspects, such as the preferred loss function used during training.

  • Specify the Training Data Source:

Identify the origin of the training data (as required by the Board in point 1.7.1). If the training data are obtained in the same way as the input data during inference (e.g. using the same measuring techniques / sensors), simply mention this. Indicate how the training data are labelled (for supervised learning).

  • Relate Training Data to Intended Application:

Clarify that the training data are obtained (measured) to cover representative variations that span the parameter space relevant to the intended application. For instance, depending on the application’s scope, the training data could be collected from diverse sites or under varying process conditions. Additionally, highlight that measured data naturally include variations across individual parameters due to inherent measurement differences, as no parameter is artificially fixed.

If the data origin is properly explained, many of the Board’s objections regarding the training data may no longer be relevant.

Challenging the Board’s Reasoning on Training Data Disclosure Requirements

The Board argues that training data collected during normal steel plant operations would lack sufficient variation, potentially leading to undesirable learning of random correlations (see, e.g., point 1.7.5). However, this perspective needs technical refinement. It is not strictly necessary for training data to exhibit large variations if appropriate techniques are applied. For example, methods like contrastive learning with a triplet loss can enhance model training by focusing on relative relationships, even when data variation is limited. Such approaches enable models to generalize beyond the immediate dataset. It is worth noting, however, that the patent itself does not elaborate on the training process, including the loss function.

Beside this, the Board’s objection that insufficient variation in the training data prevents “successful training” (cf. point 1.7.5) appears misplaced in view of Art. 83 EPC. Art. 83 EPC focuses on whether the claimed invention is implementable and functional in principle, rather than on achieving satisfying (successful) training performance.

To illustrate this distinction, consider a binary classifier designed to differentiate between acceptable and excessive wear of refractory lining. If the classifier achieves an accuracy of 50.1% after training, this may be deemed a poor, i.e. unsuccessful training outcome. However, the model still demonstrates basic functionality by statistically outperforming random guessing. Therefore, it satisfies the requirements of Art. 83 EPC, which demands feasibility and basic operation rather than a specific performance level. This example underscores that Art. 83 EPC imposes a qualitative standard: the invention must work, even if imperfectly, rather than meet a high-performance threshold.

Thus, the Board’s concerns about the success of training (cf. point 1.7.5) seem to overstate the requirements for compliance with Art. 83 EPC. Limited variation in the training data might lead to a model with low accuracy or robustness, but it would anyway work in compliance with Art. 83 EPC.

Title photo: Highlighting the transition from vague to precise measurement data what counts for a strong patent. Left image: Predicting the state of a cheese fondue vessel employing rather vague and far-fetched measurement concepts. Right image: The JET magnetic fusion experiment, demonstrating the use of precise measurement parameters taken from its doughnut-shaped vessel (image under  Creative Commons Attribution-Share Alike 3.0 Unported license). Note: Neither vessel relates to the one described in EP 2 789 960 B1.

> Mehr Lesen

Overview:

Three different aspects in context of AI model training are patentable:

  1. Generating training data for use in training the AI model;

    a.   Generating training data for use in training the AI model;

    b.   Training the AI model using the training data (see AI Basics Part 3: How can the AI model (neural network) learn from data? and Patentability of AI – Part 3: Claim directed to technical application field (“Applied AI” – Dimension 2)); and

    c.   Using the trained AI model during inference (see

    Patentability of AI – Part 3: Claim directed to technical application field (“Applied AI” – Dimension 2))

Most AI-based inventions concern applied AI (i.e. applied machine learning). These inventions rely on open-source frameworks (e.g. from OpenCV or Huggingface) and thus do not comprise any differences inside the used mathematical model / neural network in view of the prior art.

However, the used open-source AI models are often trained or finetuned on a particular task using a tailored training dataset. Hence, in supervised machine learning the actual innovation often simply lies in the used training dataset.

Therefore two questions may arise:

  1. Is a training dataset patentable or is a model trained with this dataset patentable? In brief: yes, we will show you below how.

  2. Is it required to publish the complete dataset? Happily no, but consider our recommendations below.

Step by step:

As you may already have guessed, an invention related to a particular training dataset concerns Dimension 2: Invention directed to a technical application field (“Applied AI”).

The EPC guidelines point out that the generation of a training dataset and the training method using the dataset can be technical: “Where a classification method serves a technical purpose, the steps of generating the training set and training the classifier may also contribute to the technical character of the invention if they support achieving that technical purpose” (EPC Guidelines G-II, 3.3.1)

Accordingly, it is decisive whether the purpose of the underlying dataset is a technical one or not. Hence, it does not matter that the data samples contained in the dataset are might also be useable in non-technical applications, as long the dataset has a technical purpose. It appears important to understand the difference between application and purpose here: We do not care what is actually done, i.e. an “application” (in any case, a dataset does never do anything, or have you ever seen a calling telephone book?). It is only important what can be done using the dataset (e.g. calling all people in the telephone book and invite them to your best friend’s house party).

Moreover, it seems advisable to describe the algorithm for generating/obtaining the dataset in detail in the patent application. Assuming your dataset has a technical purpose, each feature of said algorithm is to be considered as supporting achieving this technical purpose. Therefore, each of these features is to be considered when assessing the inventive step. Accordingly, each of these features can be a valuable fallback position for claim 1. Likewise, any particular characteristics of the training set itself, e.g. of the properties and format of the training labels can contribute achieving the technical purpose and should thus also be described in detail. For example, think of a particular form of a segmentation mask (= the label) used to annotate the image samples of the training set.

Beside the generation of the training set, also the method of training the AI model can be patentable, as well as using the trained model at inference (i.e. in its intended application). Respective claims may merely refer back to the training set generated in a method according to claim 1. However, in some cases the training method may have own inventive features, cf. the example discussed in the blog article “Patentability of AI – Part 3: Claim directed to technical application field (“Applied AI” – Dimension 2)“.

Importantly, it is NOT required to publish the whole dataset (which often is the actual value of the Applicant). As pointed out in the EPC Guidelines G-II, 3.3.1 , if the technical effect is dependent on particular characteristics of the training dataset used, those characteristics that are required to reproduce the technical effect must be disclosed unless the skilled person can determine them without undue burden using common general knowledge. However, in general, there is no need to disclose the specific training dataset itself (also cf. the blog article “Patentability of AI – Part 4: Requirements to the disclosure of the AI invention (Art. 83 EPC)“)

Have a look to the following example for dimension 2:

Claim 1 of PCT/EP2019/063650:

  1. A method for training a generative adversarial model generating image samples of different brightness levels, comprising the steps of:
    a – obtaining (SOI) a set of training image data comprising for each of a plurality of training images an input image sample and a target image sample representing the same image but in a different brightness level,
    b – providing (S02) an image generating model having an encoder configured to receive the input image sample and a plurality of decoder branches configured to output each a generated output sample,
    c – providing (S03) a discriminator model,
    d – training (S04) the image generating model using the set of training image data based on a predefined loss function, wherein for each training image among the decoder branches only the decoder branch whose generated output sample has a minimum loss compared to all other decoder branches is optimized based on said predefined loss function, and
    wherein the training step (S04) of d is augmented by an adversarial loss which is based on the output of the discriminator model.
  • Technical character of claimed invention has not been objected in ISR

    • Generated image samples can have a technical purpose: train an automated driving system to learn driving in different daylight conditions

    • However: Claim 1 is NOT limited to the technical application, i.e. the automated driving system (not objected by the EPO!)

Said invention proposes to train a GAN model to generate night-time images based on provided daytime images. The motivation is straight forward. There exist large datasets of annotated training images made at daytime for training an automated driving system. However, only few respective datasets made at nighttime or at a twilight are available. Accordingly, the invention allows to train an AI model on different daylight conditions without the need to manually annotate nighttime images.

Note however that claim 1 leaves it open, whether the generated images are actually used to train an automated driving system or not. In other words, claim 1 actually also covers non-technical applications! However, with a convincing and detailed descriptions which focuses on the technical purpose(s) of generated dataset, the EPO does not seem to require any further limitation in claim 1. In other words, the GAN model of claim 1 can also be used to generate images without a technical purpose, e.g. a full-moon version of Da Vinci’s Monalisa for a poetry collection.

Short excursion: What is actually a GAN (generative adversarial networks) model?

A GAN is basically trained to generate images (or other data samples). The concept was initially developed by Ian Goodfellow and his colleagues in June 2014 (have a look to his paper).

Yann LeCun, Facebook’s chief AI scientist, has called GANs “the coolest idea in deep learning in the last 20 years.”

A GAN uses two neural networks to competitively improve an image’s quality. A “generator” network creates a synthetic image based on an initial set of images such as a collection of faces or in our example nighttime images. A “discriminator” network tries to detect whether or not the generator’s output is real or fake. In case the discriminator “wins” (i.e. could correctly discriminate the fake image from a real image), the generator is penalized (i.e. optimized) and vice versa. This cycle is repeated several times, until the discriminator can no longer distinguish between the fakes generated by its opponent and the real thing. The ability to create high quality generated imagery has increased rapidly.

The GAN (generative adversarial network) architecture

After this concurrent training of generator and discriminator, the generator can be used to produce realistic fake images. In other words, the discriminator is only used for training of the generator.

But wait: Why is this GAN approach relevant for understanding the patentability of datasets?

Because a Gan can also generate fake images which do not have a technical purpose!

For example, the organization Obvious has trained a GAN model on a set of 15,000 portraits from online art encyclopedia WikiArt, spanning the 14th to the 19th century. The trained GAN model was used in 2018 to generate the portrait painting “Edmond de Belamy” which has been sold in a Christie’s auction for $432,500.

However, in contrast to the generated nighttime images of the example above, “Edmond de Belamy” is rather an aesthetic creation in the sense of Art. 52(2)(b) EPC without any technical purpose.

Hence, the same (GAN) technology is used once for serving a technical purpose and thus constituting patentable subject-matter, and once not.

***

Summarized, it mainly depends on the way how you describe the (technical) purpose of your generated training dataset. A good and detailed explanation of the possible technical purpose can be decisive for the patent grant, even though the claimed technology could also be used in non-technical applications. Hence, re. the technical application of the invention: Keep claim 1 general and the description specific!

author: Christoph Hewel
email: hewel@paustian.de

(photo: Pierrevert, PACA, France. Training data is as valuable as lavender oil. Only they don’t smell as good.)

> Mehr Lesen

Overview:

  • If the invention concerns core AI, emphasize in the patent application that image processing is a primary application, add text processing as secondary application.

  • In case the AI model of invention only does text processing, check whether it can have any technical purpose, for example in context of a user interface (cf. EPC I G-II,°3.7.1).

According to the EPC Guidelines the classification of digital images, videos, audio or speech signals based on low-level features (e.g. edges or pixel attributes for images) are typical technical applications of classification algorithms. However, classifying documents according to their context is a mere linguistic task and thus does not have a technical purpose. cf. (also cf. EPC Guidelines G-II, 3.3.1).

Put simply, processing image data (and likewise video and audio data) seems to be a method with a technical purpose, meanwhile processing text data is considered as not implying a technical purpose.

Where does this bias come from? And is it indeed justified or not? My take is: Yes it is justified  in many cases, but actually not in all cases. There also exist text processing applications which DO have a technical purpose and image processing applications which have NO technical purpose.

Image data (and likewise audio data) typically represent signal data of real-world objects (e.g. photos) and may thus be regarded as a specific type of measured sensor data. For this reason, an AI model processing such image data can be understood as a post processing unit of a sensor system, e.g. for detecting particular features in the image (signal) data.

In this sense, the early decision T 208/84 (Vicom) came to the conclusion that a method of digitally filtering an image is technical because the image is a representation of a physical object.

However, image data may also have been generated in an artificial manner and therefore do not mandatorily represent a real-world object, i.e. be measured signal data, cf. e.g. the portrait painting “Edmond de Belamy” generated by a GAN:

For example, the organization Obvious has trained a GAN model on a set of 15,000 portraits from online art encyclopedia WikiArt, spanning the 14th to the 19th century. The trained GAN model was used in 2018 to generate the portrait painting “Edmond de Belamy” which has been sold in a Christie’s auction for $432,500.

Examples of artificial images with and without technical purpose

For more information see the blog article “Patentability of AI – Part 6: Can a training dataset be patentable?

Likewise, image data can be processed by a generative AI model and not necessarily processed by a predictive AI model (see also AI Basics Part 4 for some more info about generative AI). Accordingly, processing image data does not necessarily represent a post-processing stage of a sensor system, e.g. by detecting particular features in the image (signal) data. Instead, the fed image data may also be processed for generating other images, such as it has been done to generate the portrait painting “Edmond de Belamy”. In this particular case a GAN model has been trained on a set of 15,000 portraits from online art encyclopedia WikiArt, spanning the 14th to the 19th century.

Furthermore, the mere classification of photos into the categories “cats” and “dogs” (e.g. to create corresponding photo albums) does not necessarily imply a technical purpose either in my opinion. Thus, processing image data often has a technical purpose but is NOT limited to this.

On the other side, text documents consist in many cases of mere linguistic/cognitive content, e.g. a set of emails from an attorney (many words, no technical relevance…). Hence, processing such text documents would not have a technical purpose. For example, in T 1177/97 the Board found the claimed subject-matter (related to machine translation) to be unpatentable, stating “Features or aspects of the method which reflect only peculiarities of the field of linguistics, however, must be ignored in assessing inventive step.”

However, processing text data can also have a technical character. For example, in T°1028/14, the Board stated that a method classifying messages as SPAM based upon factors such as the IP address from which it originated consists of technical features.

Beside this, text data can also comprise technical content (and not only linguistic/cognitive content). Such technical content may for example include programming code or pseudo-code which can control a machine. Even natural language text may have the purpose of controlling a machine, i.e. have a technical purpose. Just think of prompt engineering using a generative LLM (large language model) like GPT-4, where natural language prompts can be fed to the LLM to generate code snippets. With the advent of LLMs, it even appears probable that machine commands will increasingly be based on natural language, a development which can already been seen in virtual assistants, such as Amazon’s Alexa, Apple’s Siri or recent GPT-4o. As a consequence, an AI model for processing text data can have a technical purpose, for example in context of a user interface. Features which specify a mechanism enabling user input, such as entering text, are normally considered as technical (cf. EPC I G-II,°3.7.1).

Coming back to AI inventions for image processing, the EPO typically does not require specifying the (technical) content of images in a claimed classification method, as long as the invention concerns low-level feature extraction. However, AI models for processing text data also extract low level features of the text data, e.g. by using an attention mechanism. In a nutshell, Attention mechanisms are inspired by human visual processing and allow the model to selectively focus on the parts of the text input that are most important for making a prediction (e.g. a classification), and to ignore the less relevant parts. Attention mechanisms have become a standard in Natural Language Processing, as Transformer LLMs relying on attention have prevailed.

So why should it be required to specify the (technical) content of the text data in the patent claim? Wouldn’t it be sufficient to define text data in general in the claim and describe the technical content of the text in the description?

***

In summary, in case the invention concerns core AI and is able to process different data modalities (i.e. both text and image data), it is recommended to describe an image processing method as a primary application in the patent specification and add text processing as (only) secondary application.

In case the AI model of the invention only does text processing, it is recommended to investigate, whether any potential technical purposes of the AI model can be identified, for example in context of a user interface (cf. EPC I G-II,°3.7.1). If possible, it should be further pointed out in the patent specification that the text data to be processed can have a technical content which contributes to the technical purpose of the method.

author: Christoph Hewel
email: hewel@paustian.de

(photo: Barre des Ecrins, PACA, France. Maybe there is less bias than it looks like. One could also walk up in sneakers.)

 

> Mehr Lesen

According to Art. 83 EPC, an application shall disclose the invention in a manner sufficiently clear and complete for it to be carried out by a skilled person.

  • The description must disclose any feature essential for carrying out the invention in sufficient detail to render it apparent to the skilled person how to put the invention into practice (T 2574/16)

  • Depending on the claimed AI-related invention this could require disclosure of underlying algorithms and/or corresponding training steps (T 161/18)

In view of the recent decision G 2/21 of the Enlarged Board of Appeal (Evidence standard for inventive step/plausibility) the EPC guidelines also have been updated in 2024 re. the disclosure requirements of AI inventions:

As pointed out in the EPC Guidelines G-II, 3.3.1, the technical effect that a machine learning algorithm achieves may be readily apparent or established by explanations, mathematical proof, experimental data or the like. While mere allegations are not enough, comprehensive proof is not required either. If the technical effect is dependent on particular characteristics of the training dataset used, those characteristics that are required to reproduce the technical effect must be disclosed unless the skilled person can determine them without undue burden using common general knowledge. However, in general, there is no need to disclose the specific training dataset itself.

As further stipulated in the F-III, 3, sufficiency of disclosure cannot be acknowledged if the skilled person has to carry out a research programme based on trial and error to reproduce the results of the invention, with limited chances of success (T 38/11, Reasons 2.6). This applies to the field of artificial intelligence if the mathematical methods and the training datasets are disclosed in insufficient detail to reproduce the technical effect over the whole range claimed. Such a lack of detail may result in a disclosure that is more like an invitation to a research programme (see also EPC Guidelines G-II, 3.3.1).

What does this mean in concrete terms? To me, these statements in the guidelines appear rather vague and do not seem to not provide helpful guidance. By the way, the latter seems to be true for the decision G2/21 itself (the guidance of said decision is also rather vague).

However, it appears from the Board of Appeal decision in T 0116/18 (i.e. referring Board of Appeal case behind G 2/21) that a technical effect may be solely supported by post-published data, i.e. does not need to be explicitly disclosed in the application as filed. Accordingly, following T 0116/18, it can be sufficient to provide evidence for a technical effect of an AI invention only after filing, e.g. during the examination or opposition proceedings, in order to be considered by the inventive step assessment. In case this rule is confirmed by the future EPC case law, it make life much easier for applicants of AI inventions, regardless of whether they concern dimension 1 (see blog: Patentability of AI – Part 2: Claim directed to specific technical implementation (“Core AI” – Dimension 1)) or dimension 2 (see blog: Patentability of AI – Part 3: Claim directed to technical application field (“Applied AI” – Dimension 2)).

As a best practice, I would generally recommend using all available information of the planned publication, in case the inventors plan to publish their work otherwise, e.g. as a scientific paper or on Github, etc. Typically, the publication is made shortly after filing the application, e.g. to share the work with other scientists or to advertise the related products. Conference papers usually require a more detailed and deeper level of disclosure than what is required for patents in view of Art. 83 EPC. For example, conference papers are often only accepted, in case they disclose in detail the used AI model and its training procedure (including performance numbers of the tests in view of a baseline model). In contrast, the standards to patents are typically lower (cf. EPC Guidelines G-II, 3.3.1 “comprehensive proof is not required”, “no need to disclose the specific training dataset”).

By the way, there exist some powerful tools to convert your paper draft (typically having .tex file format) into a .docx file (i.e. the format of your patent attorney who will draft and file the application). In particular, these tools are able to convert all the equations you have assembled in your paper, a task where MS Word mostly fails according to my experience.

author: Christoph Hewel
email: hewel@paustian.de

(photo: Camargue, France. How much training data is required to make a pink aircraft fly?)

> Mehr Lesen

Overview:

Dimension 2: Claim directed to technical application field (“Applied AI”)

  • Claim features contribute to the technical character of the AI invention, when they serve a technical purpose:

    • By technical application, i.e to solve a technical problem in a technical field

    • The claims need to be functionally limited to the technical purpose

  • AI (Neural network) may be defined as “black box” by its specific input and output data

 applied AI invention

Examples of patentable technical applications include:

  1. a) Image / speech processing

  2. b) Fault detection – predictive maintenance,

  3. c) Medical analysis,

  4. d) Self-driving cars,

  5. zz) Further examples may be found in the list under EPC Guidlines G‑II, 3.3

Note that processing image data is rather considered as a method with a technical purpose than processing text data. The questions remains to me whether the EPO has an (unjustified) bias in this regards (see Patentability of AI – Part 5: Image data vs. text data – Does the EPO have a (un)justified bias?).

Most AI-based technologies concern applied machine learning. The SW-developers or data scientists use an open-source AI model (e.g. from OpenCV, Huggingface or Github) without making modifications of e.g. the internal network architecture. Happily, also applied AI can be patentable, in case the application to a particular technical field is not yet known or suggested by the prior art.

However, the following two conditions must be fulfilled for proving that the (new) features of claim 1 provide a technical contribution:

1. Claim features must contribute to the Technical Character:

The features mentioned in the claim must serve a technical purpose, meaning they are instrumental in solving a technical problem within a specific technical field. By demonstrating a clear technical benefit or advancement in the description, the feature should be taken into account when assessing inventive step of the claimed method.

Interestingly, the features do not need to have a technical character on their own: They only need to provide a possible technical purpose in the invention (which should be pointed out with care in the description).

Think of an AI-based technology which changes the colors in images. Does such a feature have a technical character?  Not necessarily. But now image the technology has the purpose of generating a dataset for training a self-driving care to drive in the night. In view of this technical purpose, the claim feature is to be taken into account when assessing inventive step of the claimed method.

applied AI invention for dimension 2

Hence, also a specific training dataset can contribute to the technical character of the invention if the dataset supports achieving that technical purpose (also cf. EPC Guidelines G-II, 3.3.1).

2. Functional Limitation to Technical Purpose:

When drafting claims for applied AI inventions, they shall be functionally limited to the technical purpose they serve. This means that the scope of the claim should be precisely defined in terms of how the AI technology addresses a particular technical problem or achieves a specific technical outcome. However, I would recommend to provide a rather broad and generalized definition in originally filed claim 1 and to provide further functional limitations in dependent claims or the description. In this way, the scope of protection remains broad.

Note that merely defining the nature of the data input to an AI model does not necessarily imply that the claimed method contributes to the technical character of the invention (T 2035/11T 1029/06T 1161/04).

However, if steps of an AI model are used to derive or predict the physical state of an existing real object from measurements of physical properties, as in the case of indirect measurements, those steps make a technical contribution regardless of what use is made of the results (cf. EPC Guidelines G-II, 3.3.3). In other words, in case the AI model is used to process/enhance any measurement data of a real object (e.g. data captured by a camera or a sensor), there is no need to specify any further technical purpose of the output of AI model.

This assessment principally makes sense to me, since the AI model may be considered as a (post-processing) unit of a sensor system (also cf. example of WO2020053611 below)). However, in this case care should be taken to suitably formulate the output of the AI model: The claimed output should still at least implicitly relate to the initially measured object. For example, in case the input to the AI model is an image of the object, and the output is any kind of heat map of the object or a set of coefficients related to the object, the AI model makes a technical contribution to invention. In contrast, in case the output is e.g. a completely different image, (e.g. produced by a generative AI model) it would be necessary to specify a technical purpose of the generated image.

Is there also an advantage of dimension 2 over dimension 1? Yes!

The claimed AI model may be a “Black Box”: The claimed invention is thus not limited to a particular AI model what not only broadens the scope of protections but also makes a proof of infringement much easier. Likewise, the “black box” definition of the AI model allows to leave it open in the claim, whether and how the AI model has been trained, see example below.

AI, particularly neural networks, is often described as a “black box” due to its complex internal workings, where the relationships between input and output data are not readily interpretable by humans. This is in particular true, in case the software developers have utilized an open-source model which is

a) not new (and thus not patentable on its own), and

b) the developers are not aware of exact internal function of the model.

Despite these challenges, the specific input-output data configuration of the AI system can still be defined and claimed in a patent application. By specifying the input and output data characteristics relevant to the technical application field, the patent claim provides clarity on how the AI technology interacts with its environment to achieve the desired technical objectives.

Have a look to the following example for dimension 2:

Claim 1 of WO2020053611:

  1. An electronic device for determining a semantic grid of an environment of a vehicle,
    the electronic device being configured to:
    receive first image data of an optical sensor, the first image data comprising a 2D image of the environment,
    perform a semantic segmentation of the 2D image and project the resulting semantic image into at least one predetermined semantic plane,
    receive an occupancy grid representing an allocentric bird eye’s view of the environment, wherein
    the control device further comprises:
    a neural network configured to determine a semantic grid by fusing the occupancy grid with the at least one predetermined semantic plane.

Applied AI invention

  • Claimed invention concerns Dimension 2: Application of the AI technology to a specific field of technology:

    • Solves a technical problem in a technical field

      • Process 2D image data to obtain a bird’s eye view

    • Functionally limited to the technical purpose

      • “… for determining a semantic grid of an environment of a vehicle

    • Neural network is defined as “black box”, i.e. merely by its input (occupancy grid and semantic plane) and output (semantic grid). It is not even specified, whether the model is trained, even though the claim implies using trained model in inference time

Preliminary opinion in ISR established by EPO on Dec 5, 2018

  • Examiner considers claimed invention as new and inventive due to the specific use of the neural network

Applied AI inventions (dimension 2) may relate to the training method of an AI model

As mentioned, SW-developers or data scientists often use an open-source AI model (e.g. from OpenCV, Huggingface or Github) without making modifications of e.g. the internal network architecture. Instead, they train the model in a new way, e.g. to reduce the computational costs of training or increase the accuracy of the model.

Also such a training method can concern dimension 2, even though during inference (i.e. when exploited) the trained AI model does not mandatorily have any new features (e.g. in the network architecture) in view of the prior art. In particular, the invention may define how the AI model is better adapted to a particular task, i.e. to fulfill a particular technical purpose. For example, this may be achieved by using a particular dataset (see Patentability of AI – Part 6: Can a training dataset be patentable?) or by adapting the steps of the training method.

Also compare the EPC guidelines in this respect: “Where a classification method serves a technical purpose, the steps of generating the training set and training the classifier may also contribute to the technical character of the invention if they support achieving that technical purpose” (EPC Guidelines G-II, 3.3.1). Accordingly, the requirements to the training method are principally the same as to the (trained) AI model belonging to dimension 2: the (new features) of the training method must contribute to achieving the technical purpose.

See also the following example for a training method related to dimension 2:

Claim 1 of WO2020057753:

A method for training a semantic segmentation model performing semantic segmentation of images taken at nighttime, comprising:
a – obtaining (SOI) a first set of labelled images (101) taken at daylight, the labelled images being annotated with predefined semantic segmentation labels,
b – training (S02) a semantic segmentation model using the first set of labelled images,
c – applying (S03) the semantic segmentation model of step b to a second set of unlabeled images (102) taken at twilight of a first predefined degree, where solar illumination is less than at daylight and more than at nighttime, to obtain semantic segmentations (102’) of the images of the second set,
d – labelling (S04) the second set of unlabeled images (102) with the semantic segmentations (102’) of the images of the second set to obtain a second set of labelled images (102”), and
e – training (S05) the semantic segmentation model using the first set of labelled images (101) and the second set of labelled images (102”).

The trained model allows a reliable segmentation of night time images (specific technical purpose) without requiring a large dataset of labelled night time images which is rarely available.

author: Christoph Hewel
email: hewel@paustian.de

(photo: Pelvoux, PACA, France. Glacial pressure for a bottling station. The invention lies in the use of gravity)

> Mehr Lesen

Overview:

Dimension 1: Claim directed to specific technical implementation

  • AI model must be specifically adapted to this technical implementation by:

    • AI design must be motivated by technical consideration of internal functioning of the computer

core AI invention

Simplified, dimension 1 refers to core AI inventions. (attention, this term is only partially correct, as explained below).

Examples:

  • AI model adapted for parallel processing on several processors

  • AI model adapted to hardware properties of the executing computer for efficient use of computer storage capacity or network bandwidth

The AI model must be particularly adapted for a technical implementation in that its design is motivated by technical considerations of the internal functioning of the computer system or network (T 1358/09G 1/19). This may happen if the mathematical method is designed to exploit particular technical properties of the technical system on which it is implemented to bring about a technical effect such as efficient use of computer storage capacity or network bandwidth

Hence, such features related to a specific technical implementation, which differ from the prior art, must be taken into account, when assessing the inventive step of the AI-based technology. In case these features cannot be rendered obvious by the prior art (=are sufficiently different from what is known), the AI-based technology is patentable.

However, an increased efficiency of the AI model does not contribute to the technical character of the invention, in case the AI model does not go beyond a generic technical implementation (cf. the EPC Guidelines G-II, 3.3). The EPC guidelines thereby note that exceptions are existing and refers to the examples mentioned by the Guidlenes in context of database systems (cf. the EPC Guidelines G-II, 3.6.4). It appears obvious to me that the guidelines and probably also the present caselaw is still incomplete in this context. (Hard-coded) database systems are obviously a completely other technical domain than computer-implemented mathematical models, such as (trained) AI models. Consequently, the requirements to an increased efficiency in both domains also differ substantially.

A simplified test to find out whether an AI invention concerns the core AI and might thus falls under dimension 1 could be to ask the inventor: “Did you change anything within the used AI model (e.g. in the (ANN) network architecture or in a hyper parameter) compared to the prior art?”

Attention, not all core AI inventions concern dimension 1:

  • if the core AI – invention adapts the AI model to the hardware (“internal functioning of the computer”), it is patentable according to dimension 1.
  • If the core AI – invention adapts the AI model to the tasks to be performed, it is NOT patentable according to dimension 1 but might still concern dimension 2 (consider further requirements of dimension 2!)

Personally, I would be in favor of making both types of AI core inventions patentable in accordance with Dimension 1 (i.e. without additional requirements of dimension 2, such as claiming a specific technical application). Dimension 1 should also cover those cases where the AI design adapts the internal functioning of the computer to the task to be performed, i.e. to the real world. Inventions like LSTM (adding a memory to artificial neurons for compensating the vanishing problem), attention mechanism (focusing on the parts of the input data that are most important for making a prediction) or GAN (training a generative model in competition with a discriminator model to better imitate real images, see Patentability of AI – Part 6: Can a training dataset be patentable?) do not adapt the AI design to the internal functioning of the computer, but in contrary, adapt the computer to better understand the real world. The computer can thus better accomplish any kind of tasks originating from the real world. Isn’t that already technical, independently of any (technical) application of the task, i.e. without adding an explicit technical purpose of the AI invention?

This view rather seems to correspond with the former EPC case law, where such core AI patents have been allowed (cf. e.g. EP 0 554 083 B1 granted in 1999).

See the following example of a core AI invention which however does not concern dimension 1:

Claim 1 of PCT/EP2018/064534 filed on June 1, 2018

1. A method for training a prediction system,
the prediction system comprises a hidden variable model using a hidden random variable for sequence prediction,
the method comprising the steps of:
multiple input of a sequence input (x) into the hidden variable model which outputs in response multiple distinct samples (y) conditioned by the random variable,
use of the best of the multiple samples (y) to train the model, the best sample being the closest to the ground truth.

The claimed invention addresses properties of the used AI-model (let’s ignore for simplicity that claim 1 refers to training the AI model).

Essentially, the EPO Examiner is quite favorable re. the patentability of the invention (cf. preliminary opinion in ISR established by EPO on Feb 5, 2019).

However, he objects claim 1 for an obvious reason. Can you find it? Claim 1 is objected to be a mere mathematical operation without any technical character (= hurdle 1 failed). Happily, this objection can be easily overcome as also stated by the Examiner. As a takeaway, make sure to provide a clear definition in the application that the claimed method is computer implemented (if not already defined in claim 1).

Furthermore, the Examiner requests to specify a technical purpose in claim 1 (in the present case: prediction of a future trajectory of a detected object, e.g. a pedestrian, cf. [0045]). It can be left open to which extent such a technical purpose is to be added to a claim 1 referring to dimension 1, i.e. which already differs from the prior art by the internal functioning of the AI model. Actually, specifying the technical purpose is a requirement in dimension 2 (see the blog article “Patentability of AI – Part 3: Claim directed to technical application field (“Applied AI” – Dimension 2)“).

But wait! Isn’t this example actually falling under dimension 2 (at best)? As we have just learnt, dimension 1 requires an AI design motivated by technical considerations of the internal functioning of the computer. Just having a quick look to par. [0012] of PCT/EP2018/064534, we read the AI model of the invention “leads to more accurate and more diverse predictions that better capture the true variations in real-world sequence data”.

The exemplary Core AI invention actually concerns Dimension 2, i.e. an Applied AI invention. Sounds strange but is in line with EPC practice.

Indeed, the AI model of PCT/EP2018/064534 seems to be rather motivated by considerations of properties of real-world data than by the internal functioning of the computer. However, doesn’t the AI model contribute to the technical character of the invention by adapting the internal functioning of the computer to make more accurate predictions of the real world (see my thoughts above)? It will be interesting to see whether the EPO’s practice on patentability of AI inventions will evolve in such a direction in future.

Anyway, we generally recommend explaining in detail any possible technical purpose in the description of the patent application. In the end, when filing a patent application, it cannot be foreseen, which features might differ from the prior art coted in the examination proceedings. Accordingly, it is possible that the used AI design (according to dimension 1) is in fact not new, but at least relates to a new technical application field (i.e. also falls under dimension 2).

Furthermore, we recommended explaining all potentially possible technical purposes of the core AI in the application. In this way also the risk can be reduced that claim 1 is objected as being unduly broad and thus unclear (see EPC Guidelines F-IV, 4.22).

Is there an advantage of dimension 1 over dimension 2? Yes!

A claimed AI model related to dimension 2 must be limited to a technical purpose. In contrast, a claimed AI model related to dimension 1 may also cover non-technical purposes. This can make an important difference, in case the scope of protection shall also cover other technical (and eventually non-technical) applications.

Core AI inventions (of dimension 1) may relate to the training method of an AI model

In many cases, SW-developers or data scientists use an open-source AI model (e.g. from OpenCV, Huggingface or Github) without making modifications of e.g. the internal network architecture. Instead, they train the model in a new way, e.g. to reduce the computational costs of training or increase the accuracy of the model.

Also such a training method can concern dimension 1 in my opinion, even though during inference (i.e. when exploited) the trained AI model does not mandatorily have any new features (e.g. in the network architecture) in view of the prior art. In particular, the invention may define how the model parameters are optimized during training, e.g. by defining the loss function.

What is a loss function?

In simple terms, the loss function is a method of evaluating how well your algorithm is modeling your dataset. It is a mathematical function of the parameters of the machine learning algorithm. Loss functions serve as the basis for model training, guiding algorithms to adjust model parameters in a direction that minimizes the loss and improves predictive accuracy.

In case it can be argued that the loss function of the invention is motivated by technical considerations of the internal functioning of the computer (e.g. makes the training process faster by memory optimization or parallel processing on several GPUs), the invention belongs to dimension 1. Accordingly, it is not required to limit the trained AI model to any specific technical purpose.

author: Christoph Hewel
email: hewel@paustian.de

(photo: Gorges du Verdon, PACA, France. Water flow is motivated by technical considerations of the internal functioning of the world. You may also call it “gravity”.)

> Mehr Lesen

In this blog post we will give some insights into the patentability of AI (artificial intelligence) inventions in Europe (according to EPO practice). Can AI be patented and under which circumstances?

But first, in order to assess these questions, let’s try to define “AI” (artificial intelligence).

Introduction to AI / Machine Learning / Deep Learning

  • Conventional Programming: “hard coding”

    • Defining explicit instructions in a programming language

  • New approach: AI / Machine Learning / Deep Learning

    • Data driven approach = training a mathematical model with training examples

    • Many arbitrary parameter to be tuned during training (e.g. GPT-3: 175B, GPT-4: 1 Trillion)

    • Mathematical model in the form of a layer with several layers

When we refer nowadays to “AI”, we typically mean a software program that is obtained by machine learning, more in particular deep learning.

Now we all know software programs since at least the early 90s. However, they have been implemented by conventional programming. Conventional Programming involves manually coding explicit instructions and rules to solve problems. Imagine a program for an automatic door: “If the door is closed and the sensor’s light barrier is blocked, open the door”.

In contrast, machine learning utilizes a data-driven approach: It trains a mathematical model having many tunable parameters (e.g. GPT-3: 175B, GPT-4: 1 Trillion) with large set of data samples. In other words, instead of manually defining and coding any rules, the mathematical model learns itself from the data. Imagine you want to define all rules of a proper German Grammar. That’s practically not possible, unless you use a data-driven approach (the reason why machine translations are not useable until DeepL and other companies started to train neural networks on the translation task).

An AI-based software program thus principally consists of a mathematical model with trained parameters. The mathematical model typically has the form of an artificial neural network (ANN) comprising several layers.

ANN

Artificial neural network with an input layer, a hidden layer and an output layer

Any layers between the input and output layers are referred to as “hidden layers”. There is nothing mysterious about these hidden layers. But when we have them, the neural network becomes a “DEEP” neural network, so we can also speak of “Deep Learning”, not only “Machine Learning”.

If you wish to learn more about the concepts behind AI, have a look to the AI-Basics parts 1 to 4!

Patentability of AI inventions – same requirements like for other computer-implemented inventions

So, since we now have a better understanding of AI, i.e. machine learning, we can tackle the question of its patentability.

The EPO applies the same criteria to AI inventions as to any other kind of computer-implemented inventions, e.g. of conventional software programs, Hence, even if AI-based software is obtained in a completely different manner than conventional software, they are handled in the same manner in terms of patentability.

Basically, there are two hurdles to take:

1st hurdle (Art. 52(2),(3) EPC)

  • Mathematical methods are excluded from patentability (Art. 52 (2)(a) EPC and EPC Guidelines G-II, 3.3.1)

  • But: No exclusion, if the invention has technical character v(cf. EPC Guidelines F-IV, 3.9)

–> Solution: Claims are directed to computer-implemented methods / systems

2nd hurdle (Art. 54, 56 EPC)

  • Only the features contributing to the technical character are taken into account for assessment of inventive step (Art. 56 EPC)

So what does this mean in practice for AI based inventions? How can we fullfill this condition?

Simply start claim 1 with: “A computer-implemented method…”.

By defining that the method (i.e. its mathematical model) is computer-implemented, a technical character is added to the invention. So the AI-based method cannot be excluded anymore according to Art. 52 (2)(a) EPC. That’s simple, but wat: There is a second hurdle to take!

Second hurdle (Art. 54, 56 EPC)

  • Only those (new) features of claim 1, which contribute to the technical character of the invention, are taken into account for assessment of inventive step (Art. 56 EPC).

Accordingly, in. a first step it has to be determined which features of claim 1 actually differ from the closest prior art, or in other words: Which features are new? Then, these features are assessed in view of their contribution to the technical character of the invention.

For example, in case claim 1 relates to a computer-implemented method for a new heathcare app, but does not contain any new features which contribute to a technical character of the method, the patent application would be refused as being not inventive. The simple reason is that computer-implemented methods per se (i.e. software programs) are well known from the prior art and any potentially new features of the healthcare app are not regarded as providing a technical contribution.

In fact, the reason for rejection of a large majority AI-related patent applications is an alleged lack of inventive step.

It is thus important to understand which features of an AI-based technology can actually contribute to the technical character. In many cases the second hurdle of patentability can be taken, if these features are described in detail and their technical contribution is sufficiently pointed out in the patent application.

So the question is: Which features contribute to the technical character?

There are two “dimensions” of how such technical contribution may be achieved:

Hence, we may either proof an inventive step due to a specific technical implementation of our AI-based technology or due to its application to a specific technical field.

For both cases we must also exceed an inventive step threshold. In other words, the technically contributing feature of claim 1 must be non-obvious (= inventive) in view of the prior art.

Dimension 1 – “Core AI” (simplified term!, see my blog post re. dimension 1): Claim directed to specific technical implementation

  • AI model must be specifically adapted to this technical implementation by:

    • AI design must be motivated by technical consideration of internal functioning of the computer

If you want to learn more about Dimension 1, have a look here!

core AI invention

Dimension 2 (Applied AI): Claim directed to technical application field

  • Claim features contribute to the technical character of the AI invention, when they serve a technical purpose:

    • By technical application, i.e to solve a technical problem in a technical field

    • The claims need to be functionally limited to the technical purpose

    • AI (Neural network) may be defined as “black box” by its specific input and output data

applied AI invention

If you want to learn more about Dimension 2, have a look here!

The scope of protection of AI patents and their enforceability

It appears straight forward to focus on claims defining the underlying AI model of the invention as a black-box, which is merely defined by its input and its out. In case these input and output are detectable, e.g. in form of a camera interface and a particular driving behavior of a vehicle, a patent infringement can be proven without any investigation about the internal function of the competitor’s AI. This makes patents of dimension 2 (applied AI) quite attractive.

However, also patents related to dimension 1 (i.e. core AI) or to particular training methods or training datasets can have an important value: Since many developments in the AI field are published as open source, e.g. on Github, included by the inventors themselves, third parties may tend to use the same technology. In the end, a patented invention may become an industry standard, and thus it can be difficult to develop a workaround.

Likewise, it appears that cloud service providers like AWS, Microsoft (OpenAI) or Google are about to establish de-facto standards in AI by developing and offering foundational LLMs (GPT, Claude, Gemini, etc.). Since these LLMs  can outperform most customized (smaller) models, they have already become kind of standards for many downstream applications. Furthermore, these models are at least partially open-source and users tend to publish which model they use for offering AI-based services. Consequently, it might become increasingly easier to proof a patent infringement in AI, in case the patent covers an essential aspect of the concerned model.

Note that this topic will need to be evolved in future blog articles. Similarly, I expect that key case law will evolve in this field.

author: Christoph Hewel
email: hewel@paustian.de

(photo: St. Tulle, PACA, France. AI patentibility is a vast filed and it is only just beginning to blossom)

> Mehr Lesen

Which tasks can be performed by AI?

Generally, the probably most prominent AI applications include image processing (i.e. computer vision) and natural language processing (“NLP”, i.e. language comprehension).

Typical tasks performed by a deep learning method include information prediction (“predictive AI”) and content generation (“generative AI”). Sub-tasks of information prediction (“predictive AI”) include regression and classification. While classification predicts a categorical value (e.g. “cat” or “dog”), regression predicts a continuous value (e.g. 0.0001-0.9999). Particular embodiments of regression include image segmentation, text recognition or medical diagnosis. Content generation includes e.g. text generation (using e.g. GPT-4) or image generation (using e.g. a generative adversarial network or a Dall-E).

Simplified, predictive neural networks are trained to perform pattern recognition. Anyway, also generative networks can be quite strong in pattern recognition based on which they generate new content (e.g. GPT-4 in understanding and summarizing a text). Similarly, text translation may be a type of predictive AI or generative AI. In my personal view, since in particular generative neural networks quickly evolve, also my differentiation made above will change.

Which neural network types exist?

There exist many different types of neural networks (i.e. network architectures) which however all follow the substantial principles described in the previous blog posts about AI basics. Some prominent examples (beside many others) include:

  • convolutional neural networks (“CNN”; mainly used for feature extraction in images), and
  • transformer models (mainly used in Natural Language Processing for text comprehension, e.g. in BERT or in more recent LLMs (large language models) like GPT-4).

What is unsupervised learning?

Beside “supervised learning” there also exists “unsupervised learning” where the samples do not contain labels, i.e. no human supervision. Unsupervised learning can be used to train a model to identify e.g. patterns or clusters within the data. For example, the trained model may than be able to assign any unknown input data sample to one of these clusters. However, most machine learning applications use supervised learning, as the models are intended to learn tasks which have formerly been carried out by humans (e.g. translations).

Since when does deep learning (neural networks) exist?

The basics for deep learning exist since long time. For example convolutional neural network (CNN), a specific neural network type, was already developed in 1989 by LeCunn among others. LSTM (Long short-term memory) another specific neural network type has already been invented in 1997 by Hofreiter et.al. However, I would say the actual breakthrough was only in 2017, when a CNN significantly outperformed all other technologies at the ImageNet Challenge. Why so late? As only at that time both sufficient computational resources and large datasets were publicly available.

Who is actually working in AI and how do they call their activity?

Since machine learning is mainly about data modeling (or in other words: pattern recognition in data clouds), it usually requires less coding work and more math (and by the way, lots of very unfancy data cleaning). For this reason, the guys working in this field are mainly data scientists having a strong background in statistics. Their activity is thus called data science.

However, there are various further tasks, e.g. pre- or post-processing data, or building an operational software program. These tasks require specialized software engineers, full-stack developers and other kind of coders.

Why is the terminology in AI so confusing?

If you have this feeling you are not alone. Indeed, the terminology used in AI often seems vague and arbitrary to me.

As explained in Part 1, “AI”, “machine learning” and “deep learning” are used pretty interchangeably. The same is true for the terms “neural network” and “model”.

You have heard about “SOTA models” and “vanilla models” and wonder, whether this is about AI in the food industry? Nope, it means “State Of The Art model” (the currently most performant “best in class” neural network) and “regular neural network without fancy stuff” (= vanilla). So, when you are asked, you should better say that you are always using the SOTA model for you specific task.

Furthermore, the field of “deep learning” (i.e. referring to “deep” neural networks) evolved in some works to “very deep neural networks” (e.g. very deep CNNs, pointing probably to the circumstance that further layers have been added). The question remains: How “deep” is “very deep”?

Recently, the term “LLM” (= large language model, not “master of laws”) has become popular in the field of NLP. But when does a language model actually become “large”? A data scientist I have been working with considers BERT (published in 2018 and having 110 Million parameters) as a LLM. However, personally I cannot remember having heard this term until the rise of generative AI/foundational models (= Chat-GPT) in late 2022. The underlying models, such as GPT-3.5 or GPT-4 have 175B to 1 Trillion parameters, i.e. 1.500 to 10.000 times more than BERT (correct me if my calc is wrong). GPT-3.5 or GPT-4 are thus comparatively “large” in view of BERT.

You probably have to be a data scientist who works all day with quantifiable data to come up with such quantitatively meaningless terms. Anyway, the trend seems to be going from “deep” towards “large”. Let’s see what comes next, maybe “high”?.

How can you learn more about AI?

You want to understand how a neural network actually works in a nice and gentle way? Then I strongly recommend that you read the book “Make Your Own Neural Network” by Tariq Rashid, 2017. The book guides you of step by step through the math going on in a neural network. This nice thing: You don’t have to be a math pro, basically addition and multiplication are enough (that’s principally all what’s happening in a neural network).

You want to learn more about the internal mechanisms of neural network?

Then have a look to https://ayearofai.com/ . The blogs are indeed also some years old, but they give easy-to-understand insights into different topics, such as CNNs, back propagation, or the logistic regression. If you really want to understand AI (SOTA technology like GPT 4o included), you should know and understand these things.

You want to dive deeper into the theories behind machine learning and deep learning?

Then I recommend the book “Deep Learning” of Ian Goodfellow et.al. from 2016, which can be accessed for free here: https://www.deeplearningbook.org/ . I anyway suggest buying the book. It has a nice cover and looks impressive in any bookshelf.

You are into NLP, want to understand Transformer models and even get some hands-on experience?

Then I recommend “Natural Language Processing with Transformers: Building Language Applications with Hugging Face” of Tunstall et.al. of 2022.

I must admit though that I have not found a good book about SOTA LLMs yet. Suggestions are very welcome!

author: Christoph Hewel
email: hewel@paustian.de

(photo: Hanging out in Calanque de Sormiou, Marseille, PACA, France. If AI could ever be so fancy)

> Mehr Lesen

How can an (AI-) model learn such complex and abstract tasks, like distinguishing an American tourist from a Persian cat?

This is done by training the model using training samples (e.g. example images annotated/labelled with a class, e.g. “American tourist”). Initially, the model contains arbitrary parameters and is thus useless. However, by feeding the training samples and measuring the model’s output in view of the actually correct output (i.e. according to the annotation), the parameters can be successively optimized until the mathematical model has “learnt” the task.

The cool thing: You do not need to tune the parameters manually, but the model does it automatically during training (more than a nice-to-have, if you use a model with several Billion parameters).

The model is typically trained on a large dataset and thereby automatically learns features from the data in each layer. As a result, the model learns to generalize on the task given in the dataset. The term “generalization” means that once the model is trained, it will be able to perform the task on unseen input data. For example, the trained model can classify an unseen image (i.e.  which was not present in the training dataset) This makes deep learning very different to e.g. rule-based or knowledge-based algorithms, which cannot handle any input data, which do not correspond to the internal rules or knowledge. Think of a state-of-the-art machine translator (e.g. DeepL). Thanks to generalization, the translator is able to translate almost any unseen text e.g. from German into English. Previous rule-based approaches failed dramatically in this task (mostly you were not even able to understand the output text).

Training a neural network using supervised learning

The figure above depicts the so-called “supervised learning”. This means that the training samples comprise labels, i.e. the true result which is expected by the model (these labels may also be referred to as “ground truth”). For example, a training sample may be an image and the label may be a class, for example “cat”.

How do training labels look like?

The training labels usually have the same format like the model output. For example, in case the model is a binary classifier outputting two possible classes (e.g. dog and cat), the training labels will contain these two classes. In case the model outputs a translated English text (e.g. the translation of a German input text), the training labels will also contain an English translation. Likewise, the input samples of the training set usually represent the same type of information and have the same format like the input to the trained model (i.e. in our examples above, images or German texts).

How can a neural network, i.e. the mathematical model actually handle text?

The words are typically transformed to numbers in a vector form. Then, the mathematical model can process these vectors, i.e. makes tons of calculations and output one or more numbers or vectors. Depending on the task (e.g. in case of a translator) these numbers are retransformed into words.

What is an error function / loss function?

The actual training, i.e. network optimization is done using an error/loss function. In simple terms, the loss function is a method of evaluating how well your algorithm is modeling your dataset. The loss functions serves as the basis for model training in an optimization method, i.e. to adjust the parameters in a direction that minimizes the loss and improves predictive accuracy.

How large must a training dataset be?

We mentioned above that a “large dataset” is necessary for training. How large? 100, 10.000 or 1 Million training samples? Well, it depends on the task. The more complex the task (so called “dimensionality”), the more tunable parameters are needed (i.e. a larger model is required). At the same time, in order to sufficiently optimize the model parameters, the larger the dataset must be. Unfortunately, the data volume size requirements increase exponentially as the complexity of the task increases. This is known as the curse of dimensionality. So, why is anyway possible to successfully train deep learning models? Because the data space we are actually interested in is very small compared to the mathematically possible space. Think of the following example:

Let’s take an image of white noise, i.e. a random selection of white and black pixels.

White noise

If you try all possible combinations of black and white pixel combinations, you will also arrive at one point in the almost infinite future at a meaningful image, let’s say of the empire state building or the Mona Lisa:

meaningful white noise image

How many black and white pixel combinations need to be tried until you see a face?

However, since the total number of possible combinations of black and white pixel distributions is probably higher than number of sand grains in the universe, the current universe will probably collapse before you arrive at this particular distribution. Hence, compared to the mathematically possible data space, our data space of interest (meaningful images) is very small. In other words, probability distribution over images that occur in real life is highly concentrated. By the way, the same is true for other data types, like text strings or sounds. Due to this high concentration of the probability distribution of the training data, the model will also learn in this particular space which is the only relevant one. Or in simple terms: Statistically, real life images are all very similar to each other. Only humans feel that the Empire State Building looks different from the Mona Lisa.

Hence, deep learning models need less training data than what would be necessary in view of the complexity the tasks to be learnt. The sizes can anyway still be impressive: ImageNet-21k contains 14,197,122 images divided into 21,841 classes. That requires tons of crowdsource work.

Happily, there are techniques to significantly reduce the amount of required training data. For example, models can be “pre-trained” on a large dataset, e.g. ImageNet-21k, and then specialized on a specific task, e.g. the recognition of different cat species. This technique is called transfer learning or fine-tuning.

Is there a difference between transfer learning or fine-tuning? Yes!

Fine-tuning means that the model parameters of the pretrained model are slightly adapted during fine-tuning, i.e. they are fine-tuned. In case of a large model with let’s say 1 billion parameters, this can be computationally still quite expensive. In contrast, transfer learning means that the pre-trained model (i.e. its parameters) is not changed, i.e. they are “frozen”. Additional layers, e.g. a new classifier layer, is put on top of the pretrained model and trained on the specific task. Hence, only the parameters of the additional layers need to be trained. Beside possibly reduced computational costs, transfer learning can be advantageous, in case the new task is different from the original training task of the pretrained model. The pretrained model is merely used as a feature extractor in this scenario. Beside computer vision applications (e.g. image recognition), popular applications comprise NLP (natural language processing) : Typically, a language model (e.g. BERT) is used to extract information from an input text (i.e. “understand” the language), and e.g. an added classifier layer is trained to classify the language into different text types (let’s say patent literature and non-patent literature).

In suitable cases only a few hundred training samples are needed for fine-tuning or transfer learning.

So, we now know how the core of an AI model (neural network) looks like and how it can learn from data. But there is much more to know than can be covered in this brief overview. There are still many AI buzz words you would like to understand and generally dive deeper into AI?  Have a look to AI Basics part 4 where I explain some more concepts and give some book tips.

author: Christoph Hewel
email: hewel@paustian.de

(photo: Can AI models learn by experience, like kids do?)

> Mehr Lesen

Overview:

  • An AI-based software program principally consists of a mathematical model with trained parameters.

  • The mathematical model typically has the form of an artificial neural network (ANN) comprising several layers.

Here is a very simple example:

ANN

Artificial neural network with an input layer, a hidden layer and an output layer

The illustrated exemplary neural network is a feed forward network, i.e. the input is fed on the left side and the output leaves on the right side. Each node in this example is connected with all other nodes of the neighboring layers (= fully connected network). It is called “neural” as it has been inspired by a biological brain, wherein the nodes represent “neurons” and the connecting edges represent “synapses”.

The nodes typically comprise activation functions, such as ReLU or Sigmoid, which are able to “fire” (almost like neurons):

activation function for a neural network

In case the input to a node is high (cf. X-axis of Sigmoid function), the node “fires”, i.e. will output a significantly high value up to <1 (cf. Y-axis of Sigmoid function).

The edges connecting the nodes merely transfer the respective output values from one node to another. However, in the edges the transferred values are multiplied by “learnable” parameters, so-called “weights”. We will see below what that means. Note that the illustrated exemplary network is small with only a few edges, i.e. parameters. As noted, GPT-4 has 1 Trillion of them…. have fun in drawing the model on single a page!

Any layers between the input and output layers are referred to as “hidden layers”. Due to these several layers, the neural network forms a “DEEP” neural network. The network thus an exemplary “Deep Learning” application. Today, all used neural networks are “deep”, i.e. have several hidden layers. Hence, if an AI-based technology uses a neural network, you can assume that the AI concerns deep learning. Anyway, note that deep learning and machine learning are often used interchangeably by data scientists (it’s not a precise science re. the used terminology, as you will see later). Moreover, the term “neural network” is often simply replaced by the term “model”, short version of mathematical model (we will use this term from here on).

By the way, there is nothing mysterious about hidden layers in a neural network (model) beside the fact that you cannot directly measure their behavior at the model’s output. For this reason, there is a whole field of research referring to interpretability of machine learning, i.e. how the single hidden layers contribute to the final output of the model. Anyway, also the outputs of hidden layers can be measured and evaluated, see the following illustration of the single processing steps in a deep neural model processing (classifying) an image:

Feature extraction in several layers of a neural network which classifies images (Deep Learning, Ian J. Goodfellow, Yoshua Bengio and Aaron Courville, MIT Press, 2016, p. 6.)

Accordingly, the image (i.e. its pixel data) is fed through the single layers of the model, wherein in each layer features are extracted having an increasingly abstraction level: starting from simple features (edges, etc.) to more and more abstract features, until the output provides a completely abstract concept, e.g. “this photo is classified as showing an American tourist”.

Great, so we now have a rough idea of what’s happening inside an AI model, i.e. a neural network. But how can the AI model (neural network) actually learn to distinguish American tourists from e.g. a persian cat? Have a look to AI Basics Part 3!

author: Christoph Hewel
email: hewel@paustian.de

(photo: Roussillon, PACA, France. Maybe the AI core looks like a giant snail?)

> Mehr Lesen

Durch die weitere Nutzung der Seite stimmen Sie einer Verwendung von Cookies zu. Weitere Informationen

Die Cookie-Einstellungen auf dieser Website sind auf "Cookies zulassen" eingestellt, um das beste Surferlebnis zu ermöglichen. Wenn du diese Website ohne Änderung der Cookie-Einstellungen verwendest oder auf "Akzeptieren" klickst, erklärst du sich damit einverstanden.

Schließen