Knowledge base entity normalization method, system, terminal and computer-readable storage medium

A knowledge base and entity technology, applied in the field of database construction, can solve problems such as the inability of classification scheme to solve the problem of normalization, large differences in data form, complex and difficult knowledge base construction, etc. Effect

Active Publication Date: 2021-07-13
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF14 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Knowledge base construction is a very complex and difficult technical problem, so existing methods generally only deal with small-scale single vertical knowledge bases (millions to tens of millions of entities)
However, when facing a large-scale knowledge base (billion-level entities), it is impossible to efficiently solve the problem of normalizing large-scale entities.
On the other hand, due to the large differences in the shape of entity data, a single classification scheme cannot solve all normalization problems, and cannot uniformly and efficiently support various attributes, categories, and problem scenarios. Therefore, the existing method is to specialize knowledge base entities Processing, directly filter out the entities with thin attribute information and do not process them, and also do related processing on the quality of entity information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Knowledge base entity normalization method, system, terminal and computer-readable storage medium
  • Knowledge base entity normalization method, system, terminal and computer-readable storage medium
  • Knowledge base entity normalization method, system, terminal and computer-readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0060] The embodiment of the present invention provides a knowledge base entity normalization method, such as figure 1 As shown, the method mainly includes the following steps:

[0061] Step S100: Obtain the entity set in the knowledge base.

[0062] Wherein, the knowledge base may be a knowledge base with a scale of millions, tens of millions, or hundreds of millions. The above-mentioned knowledge bases of various scales can be Chinese knowledge graphs, single-category or multi-category hybrid knowledge bases.

[0063] Step S200: Pre-partitioning the entity set by combining multiple partitioning methods.

[0064] It should be noted that multiple partitioning methods refer to two or more partitioning methods. Pre-partitioning is to divide the entity collection into multiple groups (or multiple zones), and the entity collection in each group is several entities that are suspected to be the same. The combination of multiple partitioning methods can be understood as each part...

Embodiment 2

[0119] The embodiment of the present invention provides a knowledge base entity normalization system, such as Figure 4 shown, including:

[0120] Obtaining module 10, used for obtaining the entity set in the knowledge base;

[0121] The multi-dimensional partition module 20 is used to pre-partition the entity set by combining multiple partition methods;

[0122] Sample construction module 30, for carrying out sample construction according to the result of pre-partitioning, extracting key samples;

[0123] Feature construction module 40, is used for carrying out feature construction according to the result of pre-partition, extracts similar features;

[0124] The normalization determination module 50 is used to combine key samples and similar features through at least one normalization model, and perform a normalization determination on each entity pair in the pre-partitioned result, and determine whether each entity pair is the same entity;

[0125] A set division module 6...

Embodiment 3

[0137] The embodiment of the present invention provides a knowledge base entity normalization terminal, such as Figure 5 shown, including:

[0138] A memory 400 and a processor 500 , the memory 400 stores computer programs that can run on the processor 500 . When the processor 500 executes the computer program, the knowledge base entity normalization method in the foregoing embodiments is implemented. The number of memory 400 and processor 500 may be one or more.

[0139] The communication interface 600 is used for the memory 400 and the processor 500 to communicate with the outside.

[0140] The memory 400 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory.

[0141] If the memory 400, the processor 500, and the communication interface 600 are implemented independently, the memory 400, the processor 500, and the communication interface 600 may be connected to each other through a ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention proposes a knowledge base entity normalization method, system, terminal and computer-readable storage medium, the method includes obtaining the entity set in the knowledge base; adopting multiple partitioning methods to pre-partition the entity set; Sample construction; feature construction according to the results of pre-partitioning; normalization judgment for each entity pair through at least one normalization model; set division for the results of the normalization judgment. The system includes an acquisition module for acquiring entity collections in the knowledge base; a multidimensional partition module for pre-partitioning entity collections; a sample construction module for sample construction based on pre-partition results; and a feature construction module for pre-partition results Perform feature construction; the normalization judgment module is used to perform normalization judgment on each entity pair in the result of the pre-partitioning; the set division module is used to perform set division on the result of the normalization judgment. The invention can solve the problem of entity normalization of large-scale knowledge base.

Description

technical field [0001] The present invention relates to the technical field of database construction, in particular to a knowledge base-based large-scale open domain entity normalization method, system, terminal and computer-readable storage medium. Background technique [0002] Knowledge base construction is a very complex and difficult technical problem, so existing methods generally only deal with small-scale single vertical knowledge bases (millions to tens of millions of entities). However, when facing a large-scale knowledge base (100 million-level entities), it cannot efficiently solve the problem of normalizing large-scale entities. On the other hand, due to the large differences in the shape of entity data, a single classification scheme cannot solve all normalization problems, and cannot uniformly and efficiently support various attributes, categories, and problem scenarios. Therefore, the existing method is to specialize knowledge base entities Processing, direct...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62G06N5/02
CPCG06N5/022G06F18/25G06F18/241G06N3/08G06N3/044G06N3/045G06F18/24G06N20/00G06N5/025
Inventor 冯知凡陆超徐也方舟朱勇李莹
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products