Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Food identification method combining label semantic embedding and attention fusion

A recognition method and attention technology, applied in character and pattern recognition, neural learning methods, biological neural network models, etc., to achieve high versatility, reduced acquisition, and high recognition accuracy

Pending Publication Date: 2022-07-12
SOUTH CHINA UNIV OF TECH
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The window attention fusion module integrates the inherent self-attention weight of Swin Transformer to promote the model to adaptively focus on local key areas, learn the fine-grained features of food, and solve the fine-grained classification problem of food recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Food identification method combining label semantic embedding and attention fusion
  • Food identification method combining label semantic embedding and attention fusion
  • Food identification method combining label semantic embedding and attention fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0071] like figure 1 , Figure 4 A food recognition method combining label semantic embedding and attention fusion is shown: It includes the following steps:

[0072] The food identification process is as follows:

[0073] S1. According to public food datasets, such as Food101, VireoFood172, ChineseFoodNet datasets or self-built food datasets, the network model is trained by combining label semantic embedding and attention fusion.

[0074] S2. The original image (Raw Image) is input into the trained backbone network, and the first classification result C is obtained 1 vector;

[0075] S3. Call the window attention fusion module. This module extracts the window attention weights of each layer of the backbone network and fuses them to generate an attention mask. The original image is cropped based on the range of the largest connected graph area on the attention mask. get a local map;

[0076] S4. Input the local graph to the backbone network to obtain the second classifica...

Embodiment 2

[0136] The BERT model is a pre-trained word vector representation model for natural language processing tasks, which can be used to extract the semantic embedding of words or sentence texts. The word vector representation model used in this example to extract the contextual semantic embedding of food text labels can be flexibly replaced. If the label is in English, the bert-base-uncase model is used. If the text labels are Chinese expressions, the Chinese natural language pre-training model MacBERT can be used to extract them.

Embodiment 3

[0138] A food recognition method combining label semantic embedding and attention fusion, including the following steps:

[0139] S1. According to the food dataset, combine label semantic embedding and attention fusion to train the backbone network;

[0140] S2. The original image is input into the trained backbone network, and the first classification result vector is obtained;

[0141]S3. Use the window attention fusion module to extract the window attention weights of each layer of the backbone network and fuse them to generate an attention mask; crop the original image based on the largest connected map area on the attention mask to obtain a local map;

[0142] S4. Input the local graph to the backbone network to obtain the second classification result vector;

[0143] S5. Add the two classification results to obtain the final classification vector, take the serial number with the largest value in the final classification vector, and obtain the final class name of the cur...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a food identification method combining label semantic embedding and attention fusion, which comprises the following steps: a window attention fusion module adaptively selects a discrimination area by using a self-attention mechanism of transformer, and does not need extra frame labeling training; the module fuses window attention of the Swin Transform, cuts an attention area from an original image and amplifies the attention area to serve as input of a next network so as to learn more discriminant features, names of food categories contain important text information such as main components, producing areas and cooking methods, and the module is easy to obtain and helpful for food recognition. Therefore, context-sensitive semantic center loss is proposed, and semantic embedding of the food label is used as the center of the feature space, so that the image expression is guided to learn fine-grained semantic information. The combination of the two improves the food recognition precision.

Description

technical field [0001] The invention relates to the technical field of image processing and analysis, in particular to a food recognition method combining label semantic embedding and attention fusion. Background technique [0002] Food plays an important role in people's life and health. Food identification is the basic task of food application. Once the food category is determined, tasks such as dietary management and nutritional analysis can be performed. With the development of deep learning, the task of food image recognition has been greatly developed. [0003] The food recognition task belongs to the fine-grained recognition task, which refers to the task of distinguishing subordinate categories. Unlike common fine-grained categories such as birds, cars, airplanes, etc., food has no fixed spatial structure or shared semantic patterns, and cannot use relational constraints for feature extraction, which makes most of the existing fine-grained classification methods una...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06V10/44G06V10/74G06V10/764G06V10/82G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06N3/047G06F18/22G06F18/241G06F18/2415
Inventor 康文雄周泳鑫曾明张雄
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products