Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Method and device for matching texts

A matching method and text technology, applied in the field of data processing, can solve problems such as large amount of data processing, affecting system performance, slow processing speed, etc., and achieve the goals of improving system performance, simple matching process, strong versatility and universal applicability Effect

Inactive Publication Date: 2012-04-11
ALIBABA CLOUD COMPUTING LTD
View PDF4 Cites 41 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The embodiment of the present application provides a text matching method and device, which are used to solve the problems in the prior art that the large amount of text matching data processing results in slow processing speed, affects system performance, and causes transmission congestion, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for matching texts
  • Method and device for matching texts
  • Method and device for matching texts

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038] In the text matching method provided in Embodiment 1 of the present application, for each new text in each period, the similarity between each new text and each original text, and between any two new texts is calculated. That is, the similarity data related to the newly added text is determined. For example: when used in the product recommendation process, the new text is obtained based on the product information released in the current cycle. And determine all commodities matching the commodity information released in the current cycle according to the newly added text (the information includes the commodity information released before and the commodity information released in the current cycle).

[0039] The flow of the text matching method provided in Embodiment 1 of the present application is as follows figure 2 As shown, the execution steps are as follows:

[0040] Step S11: Periodically collect content information released by users, and obtain new texts in the ...

Embodiment 2

[0079] The text matching method provided in Embodiment 2 of the present application calculates the similarity between any two texts for each text stored in the data after the newly added text is input in each cycle, and its process is as follows image 3 As shown, the execution steps are as follows:

[0080] Step S21: Periodically collect content information released by users, and obtain new texts in the current cycle according to the content information released by users.

[0081] It is the same as step S11 and will not be repeated here.

[0082] Step S22: Segment the newly added text to extract keywords.

[0083] It is the same as step S12 and will not be repeated here.

[0084] Step S23: Calculate the weight of each keyword extracted from the newly added text in each text currently stored in the database according to the pre-stored word frequency table.

[0085] The same as step S13, which will not be repeated here.

[0086] Step S24: Calculate the similarity between an...

Embodiment 3

[0095] The text matching method provided in Embodiment 3 of the present application improves on the solutions of Embodiment 1 and Embodiment 2, and adds an output filtering process. Specifically include:

[0096] After step S14 of embodiment one calculates similarity and before step S15 determines relevant text, increase the step of output filtering, after step S24 of embodiment two calculates similarity and before step S25 determines relevant text, increase the process of output filtering, its flow process like Figure 4 As shown, the execution steps are as follows:

[0097] Step S31: Obtain the calculated similarity between each newly added text and each text currently stored in the database, or the calculated similarity between any two texts in the database.

[0098] For the filtering of the similarity of two texts, the similarity of different texts can be filtered according to the different requirements determined by the subsequent related texts. Therefore, for the first...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a device for matching texts. The method comprises the following steps of: acquiring new texts in the current period according to content information collected in the current period and storing the new texts in a database; performing word segmentation on the input new texts, and extracting keywords; calculating the weight of each extracted keyword in each text in the database according to a prestored frequency list of words; periodically updating the frequency list of the words according to the occurrence frequency of each word in each text in the database;calculating the similarity between each new text and each text in the database or calculating the similarity of any two texts in the database according to the calculated weight of each keyword in each text in the database; and determining the relevant text of each text stored in the database according to the calculated similarity. In the method, the problem that all the texts are need to be calculated during matching each time in the prior art is solved in the mode of establishing and updating the frequency list of the words, the matching operation work load is reduced and the system performance is improved.

Description

technical field [0001] This application relates to the field of data processing, in particular to a text matching method and device with a large amount of data. Background technique [0002] Existing text comparison generally adopts the method of full calculation and matching. When it is necessary to calculate the degree of correlation between texts, it is necessary to calculate all the acquired texts, and finally obtain the similarity between two pairs. In this way, each calculation of similarity The degree of calculation must be calculated for all text data, and the amount of calculation will be very huge, and its running time is on the order of O(N^2). As the number of texts N increases, the calculation time will also be very long . [0003] This large amount of data calculation comparison has a great impact on the system performance of the equipment, which puts great pressure on the system's I / O communication, data storage, and data network transmission, resulting in sl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F17/3069G06F16/3347
Inventor 张旭苏宁军顾海杰祁建程
Owner ALIBABA CLOUD COMPUTING LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products