Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Financial scene-oriented end-to-end natural language processing training framework and method

A natural language processing and financial technology, applied in natural language data processing, finance, data processing applications, etc., can solve the problems of excessive model pressure testing time, time-consuming, huge tasks, etc., to reduce configuration requirements and improve semantic understanding effect of ability

Pending Publication Date: 2022-01-04
北京熵简科技有限公司
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The open source BERT model does not perform well in tasks in the financial field, and some tasks are not as good as some lightweight models. To achieve good results, a lot of data needs to be labeled, and the cost of labeling data is very high;
[0007] 2. The BERT12-layer Transformer network has a large number of parameters, the forward calculation is time-consuming, the model pressure test time exceeds the standard, and the online deployment is difficult
[0008] 3. In the financial field, the task is huge. Without a good baseline model, it takes a lot of time to experiment and try, and sometimes detours, wasting a lot of time and experience

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Financial scene-oriented end-to-end natural language processing training framework and method
  • Financial scene-oriented end-to-end natural language processing training framework and method
  • Financial scene-oriented end-to-end natural language processing training framework and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The following describes several preferred embodiments of the present invention with reference to the accompanying drawings, so as to make the technical content clearer and easier to understand. The present invention can be embodied in many different forms of embodiments, and the protection scope of the present invention is not limited to the embodiments mentioned herein.

[0050] Such as figure 1 As shown, it is a system block diagram of a preferred embodiment of the present invention, including Google's native BERT module, FinBERT pre-training module, new data mining module from external related data based on similar self-training ideas, and pre-training on downstream task corpus module, using a semi-supervised learning framework to make full use of the unlabeled corpus module, knowledge distillation module and online deployment module.

[0051] Google's native BERT module is the starting point of the entire training framework, including Google's native BERT (Chinese)...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a financial scene-oriented end-to-end natural language processing training framework and method, and relates to the field of natural language processing. The training framework comprises a Google native BERT module, a FinBERT pre-training module, a module for mining new data from external related data based on a similar self-training thought, a module for pre-training on downstream task corpora, a module for fully utilizing untagged corpora by using a semi-supervised learning framework, a knowledge distillation module and an online deployment module. The training method comprises the following steps: step 1, performing FinBERT pre-training; step 2, mining new data from external related data based on a similar self-training thought; step 3, pre-training a downstream task corpus, pre-training the FinBERT again, andcalling an obtained model as TASK FinBERT; step 4, using a semi-supervised learning framework to make full use of the label-free corpus, and on the basis of TASK FinBERT, calling a model obtained by training as UDA FinBERT; and step 5, distillation learning: distilling the learned knowledge and characteristics to the lightweight model.

Description

technical field [0001] The present invention relates to the field of natural language processing, in particular to an end-to-end natural language processing training framework and method for financial scenarios. Background technique [0002] The rapid development of modern information technology has resulted in explosive growth of data and information contained on the Internet. A large amount of data is presented in the form of text, such as the web page data of major Internet, and if big data wants to reflect the value of the data, it is necessary to use Natural Language Processing (Natural Language Processing, NLP) technology to mine the core content of the text. In text mining, different fields will have a variety of natural language processing tasks. For example, natural language processing tasks in the financial field include text classification, sentiment analysis, text clustering, entity recognition, text similarity calculation, information extraction, etc. There are...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/30G06F16/35G06K9/62G06N3/04G06N3/08G06Q40/02
CPCG06F40/295G06F40/30G06F16/35G06N3/08G06Q40/02G06N3/047G06N3/045G06F18/2415G06F18/241
Inventor 付志兵张梦超李渔费斌杰
Owner 北京熵简科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products