Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Chinese and English literature author name fusion disambiguation method

A Chinese-English, author's technology, applied in the direction of instrumentation, computing, electrical digital data processing, etc., to achieve the effect of saving training resources, improving the accuracy of disambiguation, and avoiding the process of model training

Active Publication Date: 2022-04-12
中科大数据研究院
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] Aiming at the defects and problems existing in the distinction between Chinese and English author names, the present invention provides a method for disambiguating Chinese and English author names

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese and English literature author name fusion disambiguation method
  • Chinese and English literature author name fusion disambiguation method
  • Chinese and English literature author name fusion disambiguation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0110] Embodiment 1: This embodiment provides a method for disambiguation of author name fusion in Chinese and English documents, which includes disambiguation of Chinese author name, English author name disambiguation, and disambiguation of Chinese author name and English document name pinyin. in

[0111] 1. Chinese author name disambiguation, such as figure 1 shown, including the following steps,

[0112] Step 1. Author name cleaning: remove the symbols in the author name (including spaces, semicolons, commas, etc.), convert the author name according to the surnames of hundreds of families, and uniformly convert it into the format of surname + first name; for example: "Chonglin" Convert to "Lin Chong".

[0113] Step 2. Cleaning of the author's institution: The author's institution is uniformly organized into the main name of the institution; for example, "xx hospital xx department" is adjusted to "xx hospital", "xx university xx college" is adjusted to "xx university", etc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of name disambiguation, and particularly relates to a Chinese and English literature author name disambiguation method. According to the method, Chinese author name disambiguation and English author name disambiguation are carried out based on semantic fingerprints, author cooperation network similarity, author reference network similarity and the like, and disambiguation of Chinese authors and name pinyin in English literatures is completed according to a Chinese disambiguation result and an English disambiguation result. According to the method, whether authors of different literatures are the same person or not can be accurately distinguished, the same author in Chinese and English can be well recognized, the author needing to be found can be quickly positioned, the accuracy rate is high, and retrieval work can be conveniently carried out; the calculation of the similarity of the scientific research duration of the authors is introduced, so that disambiguation of Chinese and English names of the Chinese authors can be well assisted, the age range of the authors can be determined, other authors with the same name not in the range can be filtered out, and the disambiguation accuracy is improved.

Description

technical field [0001] The invention belongs to the technical field of name disambiguation, and in particular relates to a method for disambiguating names of authors in Chinese and English documents. Background technique [0002] With the rapid development of the Internet, a large number of scientific documents such as papers and patents continue to emerge. When we retrieve the useful information we need from this massive document, the retrieval method we often use is to search by the name of the author of the document. All published literature. However, during the search process, we will find that there are a large number of authors with the same name, and it is difficult to quickly locate the author we are looking for, which is very detrimental to our work. [0003] There has been a long history of ambiguity in the author’s name in the literature. There are mainly the following problems: [0004] 1. The Chinese author's name is ambiguous. For example: "Zhang Wei", there...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G06F16/2458
Inventor 贾士杨冯凯王元卓彭亮
Owner 中科大数据研究院
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products