• If the invention concerns core AI, emphasize in the patent application that image processing is a primary application, add text processing as secondary application.

  • In case the AI model of invention only does text processing, check whether it can have any technical purpose, for example in context of a user interface (cf. EPC I G-II,°3.7.1).

According to the EPC Guidelines the classification of digital images, videos, audio or speech signals based on low-level features (e.g. edges or pixel attributes for images) are typical technical applications of classification algorithms. However, classifying documents according to their context is a mere linguistic task and thus does not have a technical purpose. cf. (also cf. EPC Guidelines G-II, 3.3.1).

Put simply, processing image data (and likewise video and audio data) seems to be a method with a technical purpose, meanwhile processing text data is considered as not implying a technical purpose.

Where does this bias come from? And is it indeed justified or not? My take is: Yes it is justified  in many cases, but actually not in all cases. There also exist text processing applications which DO have a technical purpose and image processing applications which have NO technical purpose.

Image data (and likewise audio data) typically represent signal data of real-world objects (e.g. photos) and may thus be regarded as a specific type of measured sensor data. For this reason, an AI model processing such image data can be understood as a post processing unit of a sensor system, e.g. for detecting particular features in the image (signal) data.

In this sense, the early decision T 208/84 (Vicom) came to the conclusion that a method of digitally filtering an image is technical because the image is a representation of a physical object.

However, image data may also have been generated in an artificial manner and therefore do not mandatorily represent a real-world object, i.e. be measured signal data, cf. e.g. the portrait painting “Edmond de Belamy” generated by a GAN:

For example, the organization Obvious has trained a GAN model on a set of 15,000 portraits from online art encyclopedia WikiArt, spanning the 14th to the 19th century. The trained GAN model was used in 2018 to generate the portrait painting “Edmond de Belamy” which has been sold in a Christie’s auction for $432,500.

Examples of artificial images with and without technical purpose

For more information see the blog article “Patentability of AI – Part 6: Can a training dataset be patentable?

Likewise, image data can be processed by a generative AI model and not necessarily processed by a predictive AI model (see also AI Basics Part 4 for some more info about generative AI). Accordingly, processing image data does not necessarily represent a post-processing stage of a sensor system, e.g. by detecting particular features in the image (signal) data. Instead, the fed image data may also be processed for generating other images, such as it has been done to generate the portrait painting “Edmond de Belamy”. In this particular case a GAN model has been trained on a set of 15,000 portraits from online art encyclopedia WikiArt, spanning the 14th to the 19th century.

Furthermore, the mere classification of photos into the categories “cats” and “dogs” (e.g. to create corresponding photo albums) does not necessarily imply a technical purpose either in my opinion. Thus, processing image data often has a technical purpose but is NOT limited to this.

On the other side, text documents consist in many cases of mere linguistic/cognitive content, e.g. a set of emails from an attorney (many words, no technical relevance…). Hence, processing such text documents would not have a technical purpose. For example, in T 1177/97 the Board found the claimed subject-matter (related to machine translation) to be unpatentable, stating “Features or aspects of the method which reflect only peculiarities of the field of linguistics, however, must be ignored in assessing inventive step.”

However, processing text data can also have a technical character. For example, in T°1028/14, the Board stated that a method classifying messages as SPAM based upon factors such as the IP address from which it originated consists of technical features.

Beside this, text data can also comprise technical content (and not only linguistic/cognitive content). Such technical content may for example include programming code or pseudo-code which can control a machine. Even natural language text may have the purpose of controlling a machine, i.e. have a technical purpose. Just think of prompt engineering using a generative LLM (large language model) like GPT-4, where natural language prompts can be fed to the LLM to generate code snippets. With the advent of LLMs, it even appears probable that machine commands will increasingly be based on natural language, a development which can already been seen in virtual assistants, such as Amazon’s Alexa, Apple’s Siri or recent GPT-4o. As a consequence, an AI model for processing text data can have a technical purpose, for example in context of a user interface. Features which specify a mechanism enabling user input, such as entering text, are normally considered as technical (cf. EPC I G-II,°3.7.1).

Coming back to AI inventions for image processing, the EPO typically does not require specifying the (technical) content of images in a claimed classification method, as long as the invention concerns low-level feature extraction. However, AI models for processing text data also extract low level features of the text data, e.g. by using an attention mechanism. In a nutshell, Attention mechanisms are inspired by human visual processing and allow the model to selectively focus on the parts of the text input that are most important for making a prediction (e.g. a classification), and to ignore the less relevant parts. Attention mechanisms have become a standard in Natural Language Processing, as Transformer LLMs relying on attention have prevailed.

So why should it be required to specify the (technical) content of the text data in the patent claim? Wouldn’t it be sufficient to define text data in general in the claim and describe the technical content of the text in the description?


In summary, in case the invention concerns core AI and is able to process different data modalities (i.e. both text and image data), it is recommended to describe an image processing method as a primary application in the patent specification and add text processing as (only) secondary application.

In case the AI model of the invention only does text processing, it is recommended to investigate, whether any potential technical purposes of the AI model can be identified, for example in context of a user interface (cf. EPC I G-II,°3.7.1). If possible, it should be further pointed out in the patent specification that the text data to be processed can have a technical content which contributes to the technical purpose of the method.

author: Christoph Hewel
email: hewel@paustian.de

(photo: Barre des Ecrins, PACA, France. Maybe there is less bias than it looks like. One could also walk up in sneakers.)


Durch die weitere Nutzung der Seite stimmen Sie einer Verwendung von Cookies zu. Weitere Informationen

Die Cookie-Einstellungen auf dieser Website sind auf "Cookies zulassen" eingestellt, um das beste Surferlebnis zu ermöglichen. Wenn du diese Website ohne Änderung der Cookie-Einstellungen verwendest oder auf "Akzeptieren" klickst, erklärst du sich damit einverstanden.