Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

BERT embedding-based software programming field entity identification method

A software programming, entity recognition technology, applied in character and pattern recognition, computer components, biological neural network models, etc., can solve problems such as cumbersome work, spelling errors, text content that does not follow language rules, etc., to solve sequential problems , the effect of reducing the dimension of the vector space and improving the training efficiency

Pending Publication Date: 2020-12-29
YUNNAN NORMAL UNIV
View PDF2 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] (1) The text content of the software knowledge community does not follow strict language rules, there are a lot of spelling mistakes, and abbreviations, etc.
[0006] (2) Methods based on rules, dictionaries, and knowledge bases rely on manual creation by experts, which is cumbersome and cannot be automatically updated
[0007] (3) Based on supervised learning and semi-supervised learning methods, a large amount of sample data needs to be manually labeled, and word ambiguity cannot be resolved, resulting in poor entity recognition effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • BERT embedding-based software programming field entity identification method
  • BERT embedding-based software programming field entity identification method
  • BERT embedding-based software programming field entity identification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The present invention will be further described below in combination with the accompanying drawings and specific embodiments.

[0047] Such as figure 1 As shown, a method for entity recognition in the field of software programming based on BERT embedding, the corpus data of this embodiment is the question and answer text of different labels extracted from the official data dump released by StackOverflow. For example: object- and process-oriented languages ​​(Java, C), Web and scripting languages ​​(JavaScript, PHP, Python), markup languages ​​(html), platforms (android) and libraries (jquery), with a total of 4,000 StackOverflow questions and answers. The specific implementation process includes: software Q&A community dataset preprocessing (Step1), sample data labeling (Step2), feature extraction and vectorization (Step3), BiGRU-CRF model training and entity labeling (Step4) and effect evaluation (Step5).

[0048] The specific steps of the entity recognition method in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a BERT embedding-based software programming field entity identification method, and belongs to the technical field of natural language processing, deep learning and software resource mining. The method comprises the following steps: firstly, carrying out text analysis and preprocessing on a data set of a software question and answer community StackOverflow by utilizing a natural language processing technology, determining a software programming domain entity category in combination with domain analysis, and carrying out manual annotation on sample data based on a Bartnatural language annotation tool to obtain a training set and a test set; secondly, obtaining semantic and vectorized representation of an input sequence through a BERT pre-training language model, and performing model training on the input sequence in combination with a BiGRU bidirectional recurrent neural network; and finally, modeling the input label sequence through a CRF conditional random field, thereby obtaining the label sequence with the maximum probability, and achieving entity identification in the field of software programming. Based on a deep learning training method, specific entities in the software programming field can be effectively identified under the condition of a small amount of labeled sample data.

Description

technical field [0001] The invention relates to an entity recognition method in the field of software programming based on BERT embedding, and belongs to the technical fields of natural language processing, deep learning and software resource mining. Background technique [0002] In the era of popular software development, more than 50 million software developers have exchanged questions and answers about software programming such as development technology, configuration management, and project organization in the StackOverFlow software knowledge community. These massive social text data contain various questions and answers of software engineering, and contain rich knowledge in the field of software programming. The automatic acquisition, sharing and recommendation of software programming knowledge will help software developers quickly solve problems encountered in the project development process and improve the quality of software development. [0003] Traditional informa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/295G06F40/216G06F40/284G06F16/35G06K9/62G06N3/04
CPCG06F40/295G06F40/216G06F40/284G06F16/35G06N3/045G06F18/2415G06F18/214
Inventor 唐明靖王俊陈建兵邹伟
Owner YUNNAN NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products