Method and device for importing duplicate removal data into gallery based on rocksdb and storage medium

A data import and database technology, applied in metadata still image retrieval, still image data retrieval, still image data indexing, etc., can solve problems such as difficulty in data sorting, no data insertion, and difficulty in sorting, and improve user experience. , the effect of improving efficiency

Inactive Publication Date: 2019-04-12
XIAMEN MEIYA PICO INFORMATION
View PDF4 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

According to the design of the Titan / JanusGraph library, when importing data, the program automatically calculates and generates Long-type ids, and the library does not automatically sort the data. If the same piece of data is imported multiple times, different ids will be generated. Therefore, when making data When importing, you need to do a deduplication. It is easy to do deduplication for a small amount of data, but when the amount of data reaches hundreds of millions or billions, deduplication will become difficult, especially when the amount of data reaches tens of billions, one thousand Data deduplication will become more difficult
[0004] In addition, when the data is inserted into the gallery, an id will be generated first. Only after the commit is submitted, the data will be actually inserted into the underlying storage database. If the points and edges are imported at the same time, when the storage machine fails, the data will generate an id. However, the point and edge data are not inserted into the underlying storage database, resulting in an insertion error

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for importing duplicate removal data into gallery based on rocksdb and storage medium
  • Method and device for importing duplicate removal data into gallery based on rocksdb and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain related inventions, rather than to limit the invention. It should also be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

[0028] It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and embodiments.

[0029] The library described in the present invention is a graph that stores data structures, that is, the relationship between vertices and edges, rather than stored image data.

[0030] figure 1 It shows a method of importing data gallery based on rocksdb deduplication da...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and device for importing data into a gallery based on rocksdb and a storage medium, and the method comprises the steps: classifying the data to be imported into the gallery to obtain a plurality of data categories, setting a category identifier for each data category, and setting an edge relation between the data in each data category; based on a rocksdb vertex duplicate removal database, performing duplicate removal on data in each data category, and inserting the data into the image library as a vertex; and based on the rocksdb edge duplicate removal database,performing duplicate removal on the edge relationship among the data, and inserting the edge relationship into the image library as an edge. According to the method, firstly, a rocksdb vertex duplicate removal database and a rocksdb edge duplicate removal database which correspond to data types are constructed; the method comprises the following steps of: importing data into a gallery; accordingto the technical scheme, the vertex importing is firstly carried out, and then the vertex importing is carried out after the vertex importing is completed, so that the problem that the actual data isrepeated and unavailable due to the fact that id is not duplicated when the Titan/Janus Graph image library is imported is solved, and due to the fact that the duplicate removal library of multiple types is established, the duplicate removal efficiency is improved, and the user experience is improved.

Description

technical field [0001] The invention relates to the technical field of computer program processing, in particular to a method, device and storage medium for importing deduplication data into a library based on rocksdb. Background technique [0002] The graph theory method is called the graph theory method which takes the graph as the research object. A graph can be represented as a graph composed of some points and lines connecting these points, and can also be abstractly defined as G=(V, E, Φ), where V and E are the vertices and edge sets of the graph respectively, and Φ means A certain functional relationship between V and E. In this way, all systems related to binary relations can be described by graphs, so graph theory can be used for research. When using graph theory to study problems, we only pay attention to whether two vertices are connected by a line, but the position of the vertices and the way of connection are irrelevant. Euler solved the famous Königsberg bri...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/51G06F16/58G06K9/62
CPCG06F18/241
Inventor 林劼高爽周成祖吴鸿伟吴文王海滨
Owner XIAMEN MEIYA PICO INFORMATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products