Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for standardizing Chinese and English hybrid texts in Chinese social networks

A social network and mixed text technology, which is applied in the field of standardizing mixed Chinese and English texts, can solve the problem of inapplicability of English text error correction

Inactive Publication Date: 2014-10-15
FUDAN UNIV
View PDF2 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Different from English, Chinese often has more variations in form, pronunciation, and combination. Many English text error correction work is not applicable to Chinese, which also makes the standardization of Chinese text more challenging.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for standardizing Chinese and English hybrid texts in Chinese social networks
  • Method for standardizing Chinese and English hybrid texts in Chinese social networks
  • Method for standardizing Chinese and English hybrid texts in Chinese social networks

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] For Chinese-English mixed texts in Chinese social networks, it is mainly divided into three parts to realize the standardization of English words.

[0018] 1. Non-standard word recognition

[0019] Construct an English-Chinese electronic dictionary by artificial web crawling. The dictionary contains most English words and their corresponding Chinese translations. In this way, the target text can be effectively screened out, that is, the user text mixed with English words in the Chinese social network.

[0020] 2. Generate non-standard English words corresponding to Chinese translation words

[0021] Traditional machine translation methods can generate bilingual alignment probability tables based on bilingual alignment corpus. The probability table contains the alignment probability of each source language word with the target language word. Here, due to the lack of bilingual alignment training corpus in the social network semantic space, we can generate the alignmen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of machine translation, and particularly discloses a method for standardizing Chinese and English hybrid texts in Chinese social networks. The method includes steps of identifying non-standard words; generating translation substitute words for the English words by the aid of hidden topic translation models; resorting the translation substitute words by the aid of neural network language models relevant to historical information of users and selecting standard words corresponding to the non-standard words. The method has the advantages that the texts of the networks are preprocessed and accordingly are adaptive to processing work of most natural languages; bilingual alignment training corpora of semantic spaces of non-social networks correspond to semantic spaces of the social networks by means of topic mapping, accordingly, the method is good in expansibility, and the translation accuracy can be guaranteed.

Description

technical field [0001] The invention belongs to the technical field of machine translation, and in particular relates to a method for standardizing mixed Chinese and English texts in Chinese social networks. Background technique [0002] In recent years, with the advancement of technology, more and more people have started to use the Internet. People browse and publish information on the Internet, and major websites receive a large number of user-submitted information every day. Many natural language processing works have begun to pay attention to network texts. Through the analysis of network texts, many information such as user clustering, user emotional tendencies, and user preferences can be obtained. These massive amounts of information are of great value. [0003] One of the places where users generate the most information is on social networks. Over the past two decades, many famous social networks at home and abroad have emerged as the times require. Massive user ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/28
Inventor 陈欢张奇黄萱菁
Owner FUDAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products