Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Check page recognition method and device, computing equipment and medium

A recognition method and marking technology, applied in computing, computer parts, character and pattern recognition, etc., to achieve accurate recognition, improve information recommendation results, and effectively screen

Active Publication Date: 2019-09-24
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the existing technology lacks an effective solution to accurately identify inventory pages from massive network information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Check page recognition method and device, computing equipment and medium
  • Check page recognition method and device, computing equipment and medium
  • Check page recognition method and device, computing equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] figure 1 It is a flowchart of an inventory page identification method provided in Embodiment 1 of the present invention. This embodiment is applicable to the situation of identifying an inventory page by mining massive network information. The method can be executed by an inventory page identification device, which can be implemented in the form of software and / or hardware, and can be integrated on any computing device, including but not limited to a server.

[0028] like figure 1 As shown, the inventory page identification method provided in this embodiment may include:

[0029] S110. Determine a first title vector of the training text title based on the correlation between each word in the training text title.

[0030] Before training the model based on deep learning ideas, it is necessary to prepare the training text in advance. The training text can be any social media text, such as various news or information released on platforms such as Weibo, web pages, and of...

Embodiment 2

[0043] figure 2 It is a flow chart of the inventory page identification method provided by Embodiment 2 of the present invention, and this embodiment is further optimized on the basis of the foregoing embodiments. like figure 2 As shown, the method may include:

[0044] S210. Segment the title of the training text, and use the word vector analysis model to determine the word vector, position vector and part-of-speech vector of each word obtained through word segmentation.

[0045] In this embodiment, the vector representation of each word obtained by segmenting the training text title is composed of three parts of vectors: word embeddings (Word Embeddings), position embeddings (Position Embedding) and part-of-speech vectors (POSEembedding). Among them, the word vector can be obtained by using a pre-trained unsupervised model, such as the word2vector model, etc. The unsupervised model can be obtained based on existing open source word vectors or self-built training corpus t...

Embodiment 3

[0057] Figure 4 It is a schematic structural diagram of an inventory page identification device provided in Embodiment 3 of the present invention. This embodiment is applicable to the situation of identifying inventory pages by mining massive network information. The device can be implemented in the form of software and / or hardware, and can be integrated on any computing device, including but not limited to a server.

[0058] like Figure 4 As shown, the inventory page identification device provided in this embodiment may include a first vector determination module 310, a second vector determination module 320 and a model training module 330, wherein:

[0059] The first vector determination module 310 is used to determine the first title vector of the training text title based on the correlation between the words in the training text title;

[0060] The second vector determination module 320 is used to determine the second title vector of the training text title by using th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a checking page recognition method and device, computing equipment and a medium, and the method comprises the steps: determining a first title vector of a training text title based on the relevance among words in the training text title; determining a second title vector of the training text title by utilizing a preset language model, the preset language model determining different word vectors for the same word at different positions in the training text title; and taking the first title vector and the second title vector as input, taking a checking page labeling result of the training text title as output, and training a checking page recognition model, so as to determine whether the target text title is the checking page title or not by utilizing the checking page recognition model. According to the embodiment of the invention, the effect of accurately identifying the information inventory page containing at least two events or topics from mass network information can be realized, and the information recommendation result of the downstream service can be improved.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of Internet information processing, and in particular to a method, device, computing device and medium for identifying an inventory page. Background technique [0002] With the rapid popularization of the Internet, network information is growing explosively, making netizens need to spend a lot of energy to filter the required information from the massive amount of information. [0003] One type of information in network information is obtained through secondary processing, that is, the information content of different topics that are happening in history or currently occurs is processed and screened, and then combined and presented in one piece of information. , asking for compensation; Apple is preparing to release the foldable iPhone; WeChat responded by mistakenly flipping | Lei Feng Morning Post", the information contains 3 events or topics with low correlation, and each event c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951G06F16/33G06K9/62
CPCG06F16/951G06F16/3347G06F18/214
Inventor 潘禄陈玉光彭卫华罗雨刘远圳韩翠云施茜
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products