Chinese machine annotation method for English web pages
A technology in English and Chinese, applied in instruments, electrical digital data processing, computing, etc., can solve problems such as unusable, no obvious improvement in reading efficiency, and damage to the logic of the original paragraph, and achieve the effect of accurate translation and good readability
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0017] Example 1. Chinese machine annotation of English web pages.
[0018] Download gpl.html from https: / / www.gnu.org / licenses / gpl.html as the original English web page. Every HTML document has its own encoding. Because Chinese comments need to use Chinese characters, it is necessary to change the encoding of the original webpage to UTF-8 encoding, so that the inserted Chinese characters can be recognized without garbled characters. That is, set " ".
[0019] We take the following long paragraphs as an example for Chinese annotations and explanations.
[0020] The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it appli...
Embodiment 2
[0034] Embodiment 2. Determination of the optimal length of English semantic block.
[0035]Words and common phrases in English are 1-4 words. When translating word by word or phrase by phrase, because of the lack of background information of words or phrases, it is equivalent to the level of point-to-point translation. The effect of machine translation is very poor and the translation is not accurate. When bilingual annotations are made, each Chinese segment is too short to express a relatively complete meaning, and the readability of Chinese is poor. So the shortest semantic block is 5 words.
[0036] Too long semantic blocks increase the time for readers to find the Chinese meaning of words. According to the research results of cognitive psychology, the length of human short-term memory is 7±2, and the upper limit is 9. This embodiment adopts the paragraph used in Embodiment 1, which has 92 words in total. Among them, there are 37 words that are simple or have no actual ...
Embodiment 3
[0038] Example 3. Web page layout with different proportions of Chinese and English characters.
[0039] Because the Chinese-annotated webpage has two kinds of characters in parallel, the ratio of different Chinese and English character sizes will change the display effect of the webpage, so it is necessary to find the best ratio of Chinese and English characters.
[0040] Carry out Chinese annotation to English webpage according to the webpage content of embodiment 1 and annotation method, and wherein Chinese and English character size ratio is respectively set to 50%, 70%, 100%, 120%, obtains the HTML page of different bilingual comparison, as attached figure 2 shown. From the page effect in the Chrome browser, it can be seen that when the size ratio of Chinese and English characters is 50% (201), the Chinese characters are too small for reading; when the ratio is 120% (204), the Chinese characters are too large, obviously exceeding English Letters affect normal English re...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com