Method which is used for classifying translation manuscript in automatic fragmentation mode and based on large-scale term corpus

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A corpus and large-scale technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve problems such as unfavorable translation fragmentation classification methods, improve classification efficiency, shorten classification time, and reduce query time Effect

Inactive Publication Date: 2013-05-15

IOL WUHAN INFORMATION TECH CO LTD

View PDF2 Cites 16 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0010] The present invention aims to provide a method for automatic fragmentation classification of translated manuscripts based on a large-scale terminology corpus to solve the above-mentioned problems that are not conducive to the fragmentation classification method of translated manuscripts

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0027] The present invention will be described in detail below with reference to the accompanying drawings and in combination with embodiments. see figure 1 , the process of the embodiment includes:

[0028] S11: Extract each keyword of each paragraph of the translated manuscript, and establish a corresponding relationship between each paragraph and each keyword contained therein;

[0029] S12: Match each keyword of the translated manuscript in the term corpus one by one, and use the industry category attribute of the term matched by each keyword as the industry category attribute to which the keyword belongs in each segment corresponding to it;

[0030] S13: According to the corresponding relationship, determine that each segment contains the same maximum industry category attributes;

[0031] S14: classify the segment with the most industry category attributes.

[0032] Since the number of words in the document to be translated is much smaller than the number of words in ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a method which is used for classifying a translation manuscript in an automatic fragmentation mode and based on a large-scale term corpus. The method which is used for classifying the translation manuscript in an automatic fragmentation mode and based on the large-scale term corpus comprises that the translation manuscript is processed in a word classification mode, stop words are eliminated, a key word set is acquired, each key word of each paragraph of the translation manuscript is picked up, and corresponding relations of each paragraph and each key word included by the each paragraph are built; key words of the translation manuscript are one by one matched in the term corpus, and industry categorical attributes of terms matched by the key word are used as attributive industry categorical attributes of each paragraph corresponding to the key word; according to the corresponding relations, identical and maximum categorical attributes included by each paragraph are confirmed; and the paragraph is classified by the maximum categorical attributes. Because the number of words of the translation manuscript is far less than the number of words of the term corpus, the term corpus has the function of being looked up according to alphabet sequences and a pattern matching algorithm needs not adopting when key word matching is conducted in the term corpus, and therefore lookup time is greatly reduced, fragmentation time of the translation manuscript is shortened and fragmentation efficiency is improved.

Description

technical field [0001] The invention relates to the field of document division, in particular to a method for automatic fragmentation and classification of translated manuscripts based on a large-scale terminology corpus. Background technique [0002] At present, the production of corpus in the prior art generally includes the following processes: [0003] Collection of corpus: corpus can come from national standards, industry standards and other standard documents, and can also come from officially published dictionaries, encyclopedias, periodicals, teaching materials, newspapers and other reference books and related documents published on authoritative websites; Other terminology corpus network, exchange corpus data and record carrier, etc. to obtain. [0004] Standardization processing: According to the established standard format or rules, the corpus obtained from various sources is initially processed. For example, the duplicate checking of corpus, the unified convers...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30

Inventor 江潮

Owner IOL WUHAN INFORMATION TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method which is used for classifying translation manuscript in automatic fragmentation mode and based on large-scale term corpus

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology