Food identification method combining label semantic embedding and attention fusion

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A recognition method and attention technology, applied in character and pattern recognition, neural learning methods, biological neural network models, etc., to achieve high versatility, reduced acquisition, and high recognition accuracy

Pending Publication Date: 2022-07-12

SOUTH CHINA UNIV OF TECH

View PDF0 Cites 5 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The window attention fusion module integrates the inherent self-attention weight of Swin Transformer to promote the model to adaptively focus on local key areas, learn the fine-grained features of food, and solve the fine-grained classification problem of food recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0071] like figure 1 , Figure 4 A food recognition method combining label semantic embedding and attention fusion is shown: It includes the following steps:

[0072] The food identification process is as follows:

[0073] S1. According to public food datasets, such as Food101, VireoFood172, ChineseFoodNet datasets or self-built food datasets, the network model is trained by combining label semantic embedding and attention fusion.

[0074] S2. The original image (Raw Image) is input into the trained backbone network, and the first classification result C is obtained 1 vector;

[0075] S3. Call the window attention fusion module. This module extracts the window attention weights of each layer of the backbone network and fuses them to generate an attention mask. The original image is cropped based on the range of the largest connected graph area on the attention mask. get a local map;

[0076] S4. Input the local graph to the backbone network to obtain the second classifica...

Embodiment 2

[0136] The BERT model is a pre-trained word vector representation model for natural language processing tasks, which can be used to extract the semantic embedding of words or sentence texts. The word vector representation model used in this example to extract the contextual semantic embedding of food text labels can be flexibly replaced. If the label is in English, the bert-base-uncase model is used. If the text labels are Chinese expressions, the Chinese natural language pre-training model MacBERT can be used to extract them.

Embodiment 3

[0138] A food recognition method combining label semantic embedding and attention fusion, including the following steps:

[0139] S1. According to the food dataset, combine label semantic embedding and attention fusion to train the backbone network;

[0140] S2. The original image is input into the trained backbone network, and the first classification result vector is obtained;

[0141]S3. Use the window attention fusion module to extract the window attention weights of each layer of the backbone network and fuse them to generate an attention mask; crop the original image based on the largest connected map area on the attention mask to obtain a local map;

[0142] S4. Input the local graph to the backbone network to obtain the second classification result vector;

[0143] S5. Add the two classification results to obtain the final classification vector, take the serial number with the largest value in the final classification vector, and obtain the final class name of the cur...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a food identification method combining label semantic embedding and attention fusion, which comprises the following steps: a window attention fusion module adaptively selects a discrimination area by using a self-attention mechanism of transformer, and does not need extra frame labeling training; the module fuses window attention of the Swin Transform, cuts an attention area from an original image and amplifies the attention area to serve as input of a next network so as to learn more discriminant features, names of food categories contain important text information such as main components, producing areas and cooking methods, and the module is easy to obtain and helpful for food recognition. Therefore, context-sensitive semantic center loss is proposed, and semantic embedding of the food label is used as the center of the feature space, so that the image expression is guided to learn fine-grained semantic information. The combination of the two improves the food recognition precision.

Description

technical field [0001] The invention relates to the technical field of image processing and analysis, in particular to a food recognition method combining label semantic embedding and attention fusion. Background technique [0002] Food plays an important role in people's life and health. Food identification is the basic task of food application. Once the food category is determined, tasks such as dietary management and nutritional analysis can be performed. With the development of deep learning, the task of food image recognition has been greatly developed. [0003] The food recognition task belongs to the fine-grained recognition task, which refers to the task of distinguishing subordinate categories. Unlike common fine-grained categories such as birds, cars, airplanes, etc., food has no fixed spatial structure or shared semantic patterns, and cannot use relational constraints for feature extraction, which makes most of the existing fine-grained classification methods una...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06V10/44G06V10/74G06V10/764G06V10/82G06K9/62G06N3/04G06N3/08

CPCG06N3/08G06N3/047G06F18/22G06F18/241G06F18/2415

Inventor 康文雄周泳鑫曾明张雄

Owner SOUTH CHINA UNIV OF TECH

Who we serve

R&D Engineer
R&D Manager
IP Professional

Why Patsnap Eureka

Industry Leading Data Capabilities
Powerful AI technology
Patent DNA Extraction

Social media

Patsnap Eureka Blog

Learn More

PatSnap group products

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Food identification method combining label semantic embedding and attention fusion

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology