Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for synchronizing super large text data to search engine

A text data and search engine technology, applied in the field of big data processing, can solve the problems of ElasticSearch search service engine difficulties, inconsistent forms, server downtime, etc., to avoid inability to edit and check, simplify operation methods, and improve use efficiency.

Active Publication Date: 2019-10-18
山东合天智汇信息技术有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, there are mainly the following problems in the synchronization of super-large text data: the format of the data in the large text is disordered and complicated, and the form is not uniform. When processing data synchronization, there will be a big bottleneck; but the data source is collected from the Internet or from other manufacturers. Obtained, the obtained file may be a text file of hundreds of gigabytes or larger. For this kind of file, we cannot use the visual editor to view and edit the data, and it may even cause the server to go down directly, so we cannot pass This method normalizes the data; moreover, in some specific scenarios, due to hardware limitations, no matter whether it is disk, memory, or CPU, the high-performance configuration cannot be achieved, we also need to perform this kind of super large file data. Processing and analysis, while synchronizing data to the ElasticSearch search service engine becomes more difficult

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for synchronizing super large text data to search engine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0049] This embodiment provides a method for synchronizing super-large text data to a search engine, such as figure 1 shown, including:

[0050] Step 1: Normalize the large text data to be synchronized.

[0051] Step 101: Read and verify the super-large text data to be synchronized line by line, and judge whether each line conforms to the rules. If it is judged that there is line data that does not conform to the rules, create a temporary file, and output the line data that does not conform to the rules to the temporary text;

[0052] Step 102: receiving the edit processing of the temporary file by the user, and obtaining row data conforming to the rules;

[0053] Step 103: Verifying the super-large text data line by line, using the edited line data in the temporary text to replace the non-compliant lines in the super-large text data;

[0054] Step 104: Repeat the above steps until all the data conform to the rules;

[0055] The rules are formulated jointly by the data gen...

Embodiment 2

[0069] According to the method described in embodiment one, the present embodiment provides a kind of super large text data synchronously to the system of search engine, comprises ElasticSearch server cluster, computer equipment and Hadoop distributed file system cluster,

[0070] The computer device includes a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, the following steps are implemented: normalizing the super large text data to be synchronized; The normalized super-large text data is line-cut, multiple fragment files are obtained, and the multiple fragment files are uploaded and synchronized to the Hadoop distributed file system cluster in batches;

[0071] The Hadoop distributed file system cluster stores the plurality of fragment files to the external link list of hive, creates a view table corresponding to the data of hive and the open source search engine, synchronizes the data i...

Embodiment 3

[0085] This embodiment provides a Hadoop distributed file system cluster for synchronizing super-large text data,

[0086] Receive the fragmented files of super large text data; store the multiple fragmented files in hive's external linked list, create a view table corresponding to hive and ElasticSearch data, synchronize the data in the external linked list to the view table, and specify in the view table The server node of ElasticSearch to be synchronized to realize the synchronization of super large text data to the search engine.

[0087] The view table also specifies the server node address, port and corresponding index and document of ElasticSearch, and the primary key field in hive is mapped to _id in ElasticSearch.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a system for synchronizing super-large text data to a search engine. The method includes normalizing the super-large text data to be synchronized; cutting the normalized super-large text data by lines to obtain multiple fragment files, and The multiple fragment files are uploaded and synchronized to the Hadoop distributed file system cluster in batches; the Hadoop distributed file system cluster stores the multiple fragment files to the external chain list of hive, and creates a view table corresponding to the data of hive and ElasticSearch, Specify the server node of ElasticSearch to be synchronized, and synchronize the data in the external link table to the view table to realize the synchronization of large text data to ElasticSearch. The invention can avoid synchronization interruption caused by irregular data, effectively improves synchronization efficiency and simplifies operation mode.

Description

technical field [0001] The present invention relates to the field of big data processing, and is a method and system for synchronizing super large text data to a search engine. Background technique [0002] With the rapid development of network and information technology, people can obtain more and more digital information, but at the same time, more and more time and energy are invested in organizing and sorting out the information. The same text data may be used by different vendors and systems, so synchronizing super-large text data information to various big data platforms has become a key technology. At present, there are mainly the following problems in the synchronization of super-large text data: the format of the data in the large text is disordered and complicated, and the form is not uniform. When processing data synchronization, there will be a big bottleneck; but the data source is collected from the Internet or from other manufacturers. Obtained, the obtained ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/33G06F16/31G06F16/182G06F16/953
Inventor 田立娜高军王可鑫段文良
Owner 山东合天智汇信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products