Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Author disambiguation method based on incremental learning

An author and citation technology, applied in the field of author disambiguation based on incremental learning, can solve problems such as high computational overhead, system outdated, batch update discarding cluster results, etc., to achieve accurate classification, reduce fragmentation, and reduce the amount of calculation.

Active Publication Date: 2019-11-01
CENT SOUTH UNIV
View PDF2 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But this batch-update solution is computationally expensive, and after any one update, the system may expire again immediately, and batch-update usually discards any existing cluster results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Author disambiguation method based on incremental learning
  • Author disambiguation method based on incremental learning
  • Author disambiguation method based on incremental learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0058] The following is a detailed description of the embodiments of the present invention. This embodiment is carried out based on the technical solution of the present invention, and provides detailed implementation methods and specific operation processes to further explain the technical solution of the present invention.

[0059] this invention

[0060] Step 1, obtain historical citation records;

[0061] The existing historical citation records are obtained from the database, and the historical citation records refer to the brief content that can represent the corresponding documents in the retrieval results obtained from the database retrieval, including the author information of the documents. All historical citation records have known cluster labels, that is, all historical citation records have been processed by author disambiguation, and are divided into clusters corresponding to individual authors, so that the accurate individual authors are known. When dividing th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an author disambiguation method based on incremental learning. The author disambiguation method comprises the following steps: obtaining a historical citation record, wherein the historical citation information has known clustering labels, and different clustering labels represent different author individuals; judging whether each clustering cluster is a clustering clusterof a first type or a clustering cluster of a second type according to the number of the historical citation records, and for the clustering clusters of the first type with a large number, training a corresponding naive Bayes classifier by using the feature vectors and clustering labels of the historical citation records; and screening out candidate clustering clusters, according to the types of all candidate clustering clusters, carrying out classification processing on the new citation records according to conditions, comprehensively using a naive Bayesian classifier to calculate the affiliated probability for classification, using the synergy person similarity to perform supplementary judgment on the affiliated probability mode classification, and calculating the semantic similarity withthe second type of clustering cluster to solve the problem that the naive Bayesian classifier cannot be used for probability classification. The author disambiguation method is good in author disambiguation effect and low in calculation overhead.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to an author disambiguation method based on incremental learning. Background technique [0002] With the development of computer technology, people are becoming more and more accustomed to obtaining information on the Internet, and libraries have also evolved into digital libraries following the tide of the times. Researchers can comprehensively obtain literature and scholar information through the digital library. However, the ambiguity of the author's name has always been an obstacle to effective information retrieval in the era of digital libraries, leading to author identification errors, especially for some common names, which is particularly serious. At the same time, these problems also bring a lot of trouble to evaluate the performance of researchers, because the accurate author cannot be thoroughly confirmed, which will increase additional financial and material re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F16/38G06F16/33G06K9/62
CPCG06F16/35G06F16/382G06F16/3347G06F18/22G06F18/24155G06F18/214
Inventor 龙军唐柳黄文体魏志
Owner CENT SOUTH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products