Top-K query method and system for incomplete data

A technology of complete data and query methods, applied in the field of data query, can solve problems such as lack of generality and limited scope of application, and achieve the effects of narrowing the filling range, ensuring accuracy, and high query accuracy and efficiency

Pending Publication Date: 2021-10-29
HUAZHONG UNIV OF SCI & TECH +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of the lack of generality of the existing incomplete data Top-K query method, which leads to the defects and improvement needs of the technology with limited scope of application, the present invention provides a Top-K query method and system for incomplete data, and its purpose is to ensure Incomplete data Top-K query results are correct while effectively improving query efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Top-K query method and system for incomplete data
  • Top-K query method and system for incomplete data
  • Top-K query method and system for incomplete data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0065] In order to quickly retrieve data that may be of interest to users from massive data, for example, commodities, the following embodiments are used to quickly and efficiently report the k best results in large-scale data according to user preferences.

[0066] The incomplete product rating data set in Table 1 contains 5 users' ratings on 12 product objects. Some ratings of some products are missing, and the missing values ​​are replaced by "-". Assume result set size k=2 and weight vector The LB and UB of each object are calculated, as shown in the last two columns of Table 1. The data set is sorted in descending order according to LB, and the first k objects are ID=11 and ID=12. Take the LB with ID=12 (ie 52.15) as the threshold, compare this value with the size of UB of other objects, and get the first candidate set {ID=11, ID=12, ID=1, ID=2, ID=5 , ID=9}. Fill the first candidate set, such as mean filling. First, calculate the mean value of each attribute as 29.5...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Top-K query method and system for incomplete data, and belongs to the field of data query. Comprising the following steps: counting a minimum value and a maximum value of each attribute in an incomplete data set, and storing primary key and non-primary key attribute values of each object in the data set into a tuple list; traversing the tuple list, calculating the lowest and highest possible scores of each object, storing the lowest and highest possible scores in the tuple list, positioning the lowest possible scores of all the objects at the Kth value, and taking the Kth value as a judgment threshold value; traversing the tuple list, and if the highest possible score of the object is not less than a judgment threshold value, adding the object into a first candidate set; traversing the first candidate set, estimating a missing attribute value by using a filling strategy, taking a larger value in an estimated value and a minimum value as a filling value, calculating the score of the filled object, and adding the object of which the score is not smaller than a judgment threshold value into a second candidate set; and querying by adopting a complete data Top-K query method. According to the invention, the query efficiency is effectively improved while the correctness of the incomplete data Top-K query result is ensured.

Description

technical field [0001] The invention belongs to the technical field of data query, and more specifically relates to a Top-K query method and system for incomplete data. Background technique [0002] With the continuous popularization and rapid development of information technology, the development of all walks of life is changing with each passing day. Through the analysis of large-scale data, people deeply explore the nature and development laws of the industry, obtain information, knowledge and business opportunities that were unattainable in the past, thereby creating huge economic and social benefits. [0003] While massive data creates value, it also brings challenges such as data quality issues. The evaluation criteria of data quality mainly include: accuracy, completeness, consistency, timeliness, reliability and validity, among which data integrity occupies an important position. Incomplete data can lead to incorrect analysis and processing results. Incomplete dat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/9535G06F16/2453
CPCG06F16/9535G06F16/2453
Inventor 李国徽梁彩梅袁凌杨泳熊云飞
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products