Food identification method combining label semantic embedding and attention fusion
A recognition method and attention technology, applied in character and pattern recognition, neural learning methods, biological neural network models, etc., to achieve high versatility, reduced acquisition, and high recognition accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0071] like figure 1 , Figure 4 A food recognition method combining label semantic embedding and attention fusion is shown: It includes the following steps:
[0072] The food identification process is as follows:
[0073] S1. According to public food datasets, such as Food101, VireoFood172, ChineseFoodNet datasets or self-built food datasets, the network model is trained by combining label semantic embedding and attention fusion.
[0074] S2. The original image (Raw Image) is input into the trained backbone network, and the first classification result C is obtained 1 vector;
[0075] S3. Call the window attention fusion module. This module extracts the window attention weights of each layer of the backbone network and fuses them to generate an attention mask. The original image is cropped based on the range of the largest connected graph area on the attention mask. get a local map;
[0076] S4. Input the local graph to the backbone network to obtain the second classifica...
Embodiment 2
[0136] The BERT model is a pre-trained word vector representation model for natural language processing tasks, which can be used to extract the semantic embedding of words or sentence texts. The word vector representation model used in this example to extract the contextual semantic embedding of food text labels can be flexibly replaced. If the label is in English, the bert-base-uncase model is used. If the text labels are Chinese expressions, the Chinese natural language pre-training model MacBERT can be used to extract them.
Embodiment 3
[0138] A food recognition method combining label semantic embedding and attention fusion, including the following steps:
[0139] S1. According to the food dataset, combine label semantic embedding and attention fusion to train the backbone network;
[0140] S2. The original image is input into the trained backbone network, and the first classification result vector is obtained;
[0141]S3. Use the window attention fusion module to extract the window attention weights of each layer of the backbone network and fuse them to generate an attention mask; crop the original image based on the largest connected map area on the attention mask to obtain a local map;
[0142] S4. Input the local graph to the backbone network to obtain the second classification result vector;
[0143] S5. Add the two classification results to obtain the final classification vector, take the serial number with the largest value in the final classification vector, and obtain the final class name of the cur...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com