Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Method of Quickly Deduplicating Lists Through Memory

A list and memory technology, applied in the database field, can solve problems such as high file format requirements, limited database IO speed, server crash, etc., to improve the deduplication efficiency and achieve the effect of batch deduplication of lists

Active Publication Date: 2016-03-16
北京讯鸟软件有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The above two methods rely on traversing the database to compare items one by one to achieve deduplication. Although deduplication can be achieved through these two methods, when the amount of data and concurrency are too large, the efficiency is quite low, and it may even cause server failure. collapse
The second method seems to be faster than the first method, but this method has higher requirements on the format of the file, and when the data needs to be imported into multiple tables, the efficiency will be reduced again
Due to the limited number of database connections and hard disk IO speed, there are still some difficulties in overcoming these two problems

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Method of Quickly Deduplicating Lists Through Memory
  • A Method of Quickly Deduplicating Lists Through Memory
  • A Method of Quickly Deduplicating Lists Through Memory

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0047] like image 3 As shown, the steps of deleting duplicate lists in batches through the intersection, difference, and union operations between the current imported list set (TempSet) and the historical list set (AllSet), and updating the historical list set (AllSet) and the historical list information table include:

[0048] The currently imported list set (TempSet) and the historical list set (AllSet) are intersected to find out the duplicate list set, and the duplicate list set (TempSet1) is deleted from the temporary table in batches, where TempSet1=AllSet∩TempSet. The current imported list set (TempSet) is merged with the historical list set (AllSet) to obtain a new historical list set (AllSet1), and the new historical list set (AllSet1) is backfilled into the memory, where AllSet1=AllSet∪ TempSet. Insert the remaining list information in the temporary table into the historical list information table in the database in batches by means of insert select.

Embodiment 2

[0050] The steps to delete duplicate lists in batches through the union operation between the currently imported list set (TempSet) and the historical list set (AllSet), and update the historical list set (AllSet) and the historical list information table include:

[0051] The current imported list set and the historical list set are combined to obtain a new historical list set (AllSet1), where AllSet1=AllSet∪TempSet. Backfill the new history list set (AllSet1) into memory. The historical list information table in the database is replaced by the new historical list set (AllSet1).

Embodiment 3

[0053] The steps to delete duplicate lists in batches through the intersection and difference operation between the current imported list set (TempSet) and the historical list set (AllSet), and update the historical list set (AllSet) and the historical list information table include:

[0054] The currently imported list set (TempSet) and the historical list set (AllSet) are intersected to find out the duplicate list set (TempSet1), and the duplicate list set (TempSet1) is deleted from the temporary table in batches, where TempSet1=AllSet∩TempSet. Insert the remaining roster information in the temporary table into the historical roster collection in batches. Insert the remaining list information in the temporary table into the historical list information table in the database in batches.

[0055] To sum up, compared with the prior art, using the method of the present invention to quickly deduplicate the list through memory, the method of deduplication is imported into the list ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for rapidly removing repeated list through a memory. The method includes: step 1, reading a history list information table in a data base, uploading the history list information table to the memory and storing the history list information table in a history list collection, step 2, uploading a list which is needed to be led to a temporary table of the data base, step 3, reading a data item which is needed to remove a repeated list in the temporary table, uploading the data item to the memory and storing the data item in a current leading list collection, step 4, removing repeated lists in bulk through an operation between the current leading list collection and the history list collection, updating the history list collection and updating the history list information table, and step 5, removing the temporary table. According to the method, in a set operation bulk removing repeated lists mode, the lists are led and repeated lists in the lists are removed so that the speed of removing the repeated lists is increased.

Description

technical field [0001] The invention relates to a method for removing duplicate lists, in particular to a method for rapidly removing duplicate lists through memory operations, and belongs to the technical field of databases. Background technique [0002] In recent years, the transaction volume of my country's e-commerce market has grown steadily, and the application effect of e-commerce in enterprises and its role in promoting economic and social development have become increasingly obvious. According to the characteristics of e-commerce, there is generally a process of electronic marketing. E-marketing has the characteristics of being completely customer-centric, highly interactive, highly targeted, accurate to customers, unique in space and time, and wide in scope of communication. For electronic marketing, the data volume of the electronic marketing list is very large. In the process of collecting marketing information by different information collectors, it is inevitab...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 吴为民陶敏超
Owner 北京讯鸟软件有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products