Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Multilingual news text clustering method, storage medium and terminal device

A text clustering and multilingual technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the incompatibility of clustering texts, restrict the development of multilingual news text clustering technology, and resource acquisition costs Advanced problems, to achieve the effect of improving the speed and efficiency of clustering

Active Publication Date: 2018-12-21
GUANGDONG UNIVERSITY OF FOREIGN STUDIES
View PDF6 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The text clustering methods provided by the prior art mainly include "clustering first, then merging" methods, multilingual text clustering methods based on machine translation systems, multilingual text clustering methods based on multilingual dictionaries, and multilingual text clustering methods based on multilingual subject headings. Multilingual text clustering methods based on tables or multilingual ontologies, multilingual text clustering methods based on parallel corpora, and methods based on homologous named entities, etc. However, due to resource acquisition costs such as multilingual dictionaries, thesaurus and parallel corpora High, and there is a problem of incompatibility in the field of clustering text, which restricts the development of large-scale multilingual news text clustering technology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multilingual news text clustering method, storage medium and terminal device
  • Multilingual news text clustering method, storage medium and terminal device
  • Multilingual news text clustering method, storage medium and terminal device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0045] see figure 1 Shown is a flow chart of a preferred embodiment of a multilingual news text clustering method provided by the present invention, including steps S11 to S13:

[0046] Step S11, obtaining in advance the text features of each monolingual news text in the multilingual news text;

[0047] Step S12, clustering the monolingual news texts based on the coincidence degree of keywords according to the text features of each monolingual news text, and corresp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multilingual news text clustering method, which comprises the following steps: obtaining the text characteristics of each monolingual news text in the multilingual news textin advance; clustering the monolingual news text according to the text characteristics of each monolingual news text based on the keyword coincidence degree, and obtaining a monolingual clustering cluster set of each monolingual news text correspondingly; according to the monolingual clustering set of each of the monolingual news texts, a cross-linguistic clustering set of the multilingual news texts being obtained. Correspondingly, the invention also discloses a computer-readable storage medium and a terminal device. The technical proposal of the invention can realize large-scale clustering of multilingual news texts without relying on multilingual resources, meet the requirements of multilingual network public opinion analysis of Chinese, English, Indonesian and Malay, and improve clustering speed and efficiency.

Description

technical field [0001] The invention relates to the field of natural language processing in information technology, in particular to a multilingual news text clustering method, a computer-readable storage medium and a terminal device. Background technique [0002] With the increasing abundance of Internet information resources, the number of non-English text resources on the Internet is increasing day by day, the multilingual tendency of Internet information sources is increasing, and the analysis of Internet public opinion is also tending to be multilingual. How to achieve accurate and efficient transnational Language clustering has become one of the key issues in multilingual public opinion analysis. [0003] At the same time, the Chinese government and enterprises are increasingly paying attention to the analysis of online public opinion in the countries along the route to avoid risks. Among the countries along the route, Indonesia and Malaysia are the founding countries ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 蒋盛益李锦贤林楠铠
Owner GUANGDONG UNIVERSITY OF FOREIGN STUDIES
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products