Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese machine annotation method for English web pages

A technology in English and Chinese, applied in instruments, electrical digital data processing, computing, etc., can solve problems such as unusable, no obvious improvement in reading efficiency, and damage to the logic of the original paragraph, and achieve the effect of accurate translation and good readability

Pending Publication Date: 2020-01-31
吕海港
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are many long sentences in English, and the efficiency of comparative reading is still not significantly improved, and this split destroys the logic of the original paragraph, so it cannot be used in official documents

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese machine annotation method for English web pages
  • Chinese machine annotation method for English web pages
  • Chinese machine annotation method for English web pages

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0017] Example 1. Chinese machine annotation of English web pages.

[0018] Download gpl.html from https: / / www.gnu.org / licenses / gpl.html as the original English web page. Every HTML document has its own encoding. Because Chinese comments need to use Chinese characters, it is necessary to change the encoding of the original webpage to UTF-8 encoding, so that the inserted Chinese characters can be recognized without garbled characters. That is, set " ".

[0019] We take the following long paragraphs as an example for Chinese annotations and explanations.

[0020] The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it appli...

Embodiment 2

[0034] Embodiment 2. Determination of the optimal length of English semantic block.

[0035]Words and common phrases in English are 1-4 words. When translating word by word or phrase by phrase, because of the lack of background information of words or phrases, it is equivalent to the level of point-to-point translation. The effect of machine translation is very poor and the translation is not accurate. When bilingual annotations are made, each Chinese segment is too short to express a relatively complete meaning, and the readability of Chinese is poor. So the shortest semantic block is 5 words.

[0036] Too long semantic blocks increase the time for readers to find the Chinese meaning of words. According to the research results of cognitive psychology, the length of human short-term memory is 7±2, and the upper limit is 9. This embodiment adopts the paragraph used in Embodiment 1, which has 92 words in total. Among them, there are 37 words that are simple or have no actual ...

Embodiment 3

[0038] Example 3. Web page layout with different proportions of Chinese and English characters.

[0039] Because the Chinese-annotated webpage has two kinds of characters in parallel, the ratio of different Chinese and English character sizes will change the display effect of the webpage, so it is necessary to find the best ratio of Chinese and English characters.

[0040] Carry out Chinese annotation to English webpage according to the webpage content of embodiment 1 and annotation method, and wherein Chinese and English character size ratio is respectively set to 50%, 70%, 100%, 120%, obtains the HTML page of different bilingual comparison, as attached figure 2 shown. From the page effect in the Chrome browser, it can be seen that when the size ratio of Chinese and English characters is 50% (201), the Chinese characters are too small for reading; when the ratio is 120% (204), the Chinese characters are too large, obviously exceeding English Letters affect normal English re...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

In order to enable a reader to quickly and accurately read an English webpage, the invention provides a Chinese machine annotation method of English web pages. The Chinese machine annotation method comprises the following steps: segmenting a Chinese sentence into semantic blocks of 5-15 words according to grammars, performing machine translation on each semantic block, and dispersing translated texts above each English word by using a Ruby label to generate a bilingual contrast English web page. The web page can help Chinese readers to smoothly read English web pages and more accurately understand English meanings.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a Chinese machine annotation method for English webpages. Background technique [0002] With the rapid development of information technology and Internet technology, all kinds of information are mainly stored and disseminated in the form of electronic documents, such as HTML pages, Word documents, text documents, PDF documents, etc., especially HTML pages are the main form of online browsing and reading for the public. . At present, there are more than 1 trillion web pages on the Internet, of which only 12% are in Chinese, while 80% are in English, which is a huge treasure trove of information. With the deepening of reform and opening up and China's deep integration into the world, English HTML pages have gradually become the source of information that Chinese readers need to read more and more directly. However, English is not the mother tongue of Chinese people, and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/253G06F40/289G06F40/284G06F40/169G06F40/58
Inventor 不公告发明人
Owner 吕海港
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products