Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for mining bad examples of search engine

A search engine and confidence technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as low efficiency, failure to detect badcases in time and accurately, and achieve the effect of improving efficiency and accuracy

Active Publication Date: 2018-07-10
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method is inefficient, and can only find a small number of badcases that happen to be encountered, and cannot find badcases in a timely and accurate manner, so it is bound to be difficult to use them as a decision-making reference for search engine improvement

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for mining bad examples of search engine
  • Method and device for mining bad examples of search engine
  • Method and device for mining bad examples of search engine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0051] figure 1 The flow chart of the mining method of the search engine badcase provided by Embodiment 1 of the present invention, such as figure 1 As shown, the method may include the following steps:

[0052] Step 101: extract a certain number of sessions from the session log as samples, and extract feature vectors describing search quality from each session of the samples.

[0053] Session refers to the time period during which the user communicates with the interactive system. It usually refers to the time elapsed from entering the interactive system to exiting the system, and there is still a certain room for manipulation. In the embodiment of the present invention, a session in the session log contains the behavior information of the user using the search engine.

[0054] The session logs of search engines are massive, and may be T (1T=1024G) level files per day, so in this step, only a certain number of sessions need to be extracted as samples, for example, 600 sessi...

Embodiment 2

[0086] figure 2 The search engine badcase mining device provided for the second embodiment of the present invention includes a preprocessing unit 200 and a mining unit 210, such as figure 2 As shown, the preprocessing unit 200 specifically includes a sample feature extraction module 201, a sample clustering module 202, and a confidence determination module 203, and the mining unit 210 specifically includes a query feature extraction module 211, a query category determination module 212, and a bad case discrimination module 213 .

[0087] The sample feature extraction module 201 extracts a certain number of sessions from the session logs as samples, and extracts feature vectors describing search quality from each session of the samples.

[0088] The sample clustering module 202 uses the feature vectors of each session to cluster the samples.

[0089] The confidence determination module 203 determines the confidence of each category obtained by clustering by the sample clust...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and a device for excavating a badcase (badcase) of a search engine, wherein the method comprises the following steps of a preprocessing procedure: extracting a certain number of sessions as samples from a session (session) log, and extracting a feature vector describing the search quality from each session of the samples; clustering the samples by utilizing the feature vector of each session; determining confidence coefficient of each category obtained by clustering the samples, wherein the confidence coefficient represents the low degree of the search quality; an excavating procedure: determining an action sequence in the same query in a session log to be excavated, and extracting a feature vector describing the search quality from the action sequence; determining the category of the query by computing the distance between the feature vector of the query and the feature vector of each category; if the confidence coefficient of the category of the query is beyond a preset high threshold, determining that the search engine has the badcase to the query. According to the method and device for excavating the badcase of the search engine, which are disclosed by the invention, the automatic excavation of the badcase of the search engine can be realized, so that the badcase of the search engine can be timely and exactly found out.

Description

【Technical field】 [0001] The invention relates to the technical field of computer applications, in particular to a method and device for mining badcases of search engines. 【Background technique】 [0002] With the continuous development of computer technology, the network has become the main channel for people to obtain information. Among them, the search engine can understand the user's query needs and intentions through analysis, and search for the webpage that best matches the user's query within the entire network. However, due to the vast amount of web pages on the Internet, the content of web pages varies greatly, and the expressions of user needs are also diverse. Therefore, the biggest difficulty for search engines is to be able to return the most relevant search results regardless of the user's query. result. [0003] The interior of the search engine is composed of many complex coupled correlation strategies, the number and complexity of which, as well as the mutu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/9566
Inventor 张鑫阮星华李卓
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products