Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for extracting keywords in page

A keyword and page technology, applied in the field of computer networks, can solve the problems of inability to work, low processing efficiency, low generality of keyword extraction technology, etc., and achieve the effect of improving generality

Active Publication Date: 2015-06-03
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF7 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problem that the general keyword extraction in the prior art cannot work in an internationalized language, resulting in the low versatility, insufficient intelligence and low processing efficiency of the keyword extraction technology in the prior art, the embodiment of the present invention provides a A method and device for extracting keywords in a page

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting keywords in page
  • Method and device for extracting keywords in page
  • Method and device for extracting keywords in page

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] Various aspects of the present invention will be described in detail below with reference to the drawings and specific embodiments. Wherein, well-known modules, units and their mutual connections, links, communications or operations are not shown or described in detail. Also, the described features, architectures, or functions may be combined in any manner in one or more implementations. It should be understood by those skilled in the art that the various implementations described below are only for illustration, rather than limiting the protection scope of the present invention. It can also be easily understood that the modules or units or processing methods in the embodiments described herein and shown in the accompanying drawings can be combined and designed in various configurations.

[0025] figure 1 It is a flowchart of a method for extracting keywords in a page according to an embodiment of the present invention; see figure 1 , the method includes:

[0026] S...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a device for extracting keywords in a page. The method comprises the following steps: performing character string analysis on the title content of the page to obtain candidate words, and constructing a candidate word search table by the obtained candidate words; performing page analysis on the page to obtain a character combination, and constructing a short string set by the obtained character combination; performing character string analysis on the short string set to obtain character strings, and constructing an original weight pool by the obtained character strings; performing weighted voting on the candidate words in the candidate word search table through the character strings according to the sequence of the quantities of words included in each character string in the original weight pool, and increasing the weight values of the candidate words if the character strings are consistent with the candidate words in the candidate word search table; sequencing according to the weighted values of the candidate words from large to small, and extracting a preset quantity of candidate words in the front as keywords according to the sequence. By adopting the method and the device, the universality of a keyword extraction technology can be enhanced, and a way for extracting the keywords is more intelligent and efficient.

Description

technical field [0001] The invention relates to the field of computer networks, and more specifically, to a method and device for extracting keywords in a page. Background technique [0002] With the development of the network, people can handle more and more things through the network. However, users need to use keywords as search content when facing various information queries. If the keywords in the page can be scientifically Extraction and application will improve the query effect with half the effort. [0003] The analysis and extraction of keywords in the existing technology needs to rely on prior knowledge, such as word segmentation, part-of-speech tagging, and stop word dictionary. These natural language processing logics need to accumulate thesaurus to carry out. Commonly used are statistical methods based on TF-IDF (term frequency-inverse document frequency, that is, a commonly used weighting technique for information retrieval and information mining), some based ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
Inventor 范斌
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products