Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method and system for semi-supervised question-answer induction based on deep generative models

A technology for generating models and question-and-answer pairs, applied in the field of automatic question-answer pair induction methods and systems, can solve problems such as labor-intensive, low-information, and high-quality question-and-answer pairs are more difficult, and achieve the effect of improving the quality of training data

Active Publication Date: 2020-12-18
上海乐言科技股份有限公司
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The quality of the question-answer pair is good enough to achieve good results in tasks such as information retrieval-based question answering, sequence generation, and deep learning-based end-to-end question answering. However, question-answer pair induction from dialogue data is a very challenging task. Task:
[0004] 1. There are one-to-many, many-to-one, and many-to-many situations between user questions and corresponding answers in the dialogue data, and the alignment is difficult;
[0005] 2. There are a lot of nonsense with low information content in the dialogue data, such as "okay", "um", etc., which makes it more difficult to summarize high-quality question and answer pairs;
[0006] 3. When summarizing domain-related question and answer pairs, there are also some unimportant dialogues that are irrelevant to the field (for example, chatting dialogues in e-commerce scenarios) in the question and answer pairs, which makes it more difficult to summarize high-quality question and answer pairs
[0007] However, the existing research results at this stage cannot completely solve the many challenging tasks brought about by question answering to induction, and there are still many defects, which are summarized as follows:
[0008] 1. Manual sorting mainly relies on manual reading of dialogue data, from which high-quality question-answer pairs are sorted out. This method requires a huge labor cost, time-consuming and labor-intensive;
[0009] 2. Use an unsupervised method based on high-frequency information to sort out question-answer pairs. This method cannot handle low-information content, such as "okay", "uh-huh" and other nonsense, nor can it handle some domain-independent dialogues , at the same time, this approach stays at the literal level, does not use semantic information, and cannot handle situations where the semantics are equivalent but the literals are different, so this approach does not work well

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for semi-supervised question-answer induction based on deep generative models
  • A method and system for semi-supervised question-answer induction based on deep generative models
  • A method and system for semi-supervised question-answer induction based on deep generative models

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0093] figure 1 It is the overall flowchart of the semi-supervised question-answer induction method based on the deep generation model provided by the present invention, the method includes the following steps: S10, receiving dialogue data, and sorting out candidate questions and answers from the input dialogue data through the candidate question-answer pair generation method Right; S20, score the candidate question-answer pair by using the question-answer pair evaluation method based on the depth generation model; S30, obtain a high-quality question-answer pair by using the question-answer pair screening method according to the scoring result of the candidate question-answer pair; S40, the depth-based The generated model model is pre-trained in a semi-supervised learning manner and then applied to the question-answer pair evaluation method.

[0094] In the step S10 of candidate question-answer pair generation, the question-answer pair generation method will be used in the can...

Embodiment 2

[0141] Figure 6 is an example block diagram of the semi-supervised question-answer induction system based on the deep generation model of the present invention, the system includes:

[0142] Input module 10: for receiving dialogue data;

[0143] Candidate question-answer pair generation module 20: used to sort out candidate question-answer pair from the received dialogue data through the candidate question-answer pair generation model;

[0144] Question-answer pair evaluation module 30: used to score the candidate question-answer pair through a deep generation model, wherein the model is pre-trained by the training module;

[0145] Question-answer pair screening module 40: used to obtain high-quality question-answer pair through question-answer pair screening according to the scoring results of the candidate question-answer pair

[0146] Output module 50: used to provide high-quality question-answer pairs in the dialogue data according to the results of the question-answer ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a semi-supervised question and answer pair induction method and system based on a deep generation model. The method comprises the following steps: sorting out candidate question and answer pairs from input dialogue data through a candidate question and answer pair generation method; scoring the candidate question and answer pairs by adopting a question and answer pair evaluation method based on a depth generation model; obtaining a high-quality question and answer pair by utilizing a question and answer pair screening method according to the candidate question and answer pair scoring result; and pre-training the deep generation model through a semi-supervised learning mode, and then applying the pre-trained deep generation model to the question and answer pair evaluation method. Through the mode, automatic question and answer pair induction can be carried out, manual participation is greatly reduced, and high-quality question and answer pairs are obtained.

Description

technical field [0001] The invention relates to a natural language processing technology, in particular to an automatic question-answer induction method and system. Background technique [0002] Chatbot (Chatbot) is one of the hot research directions in the field of artificial intelligence in recent years, and has received continuous attention from academia and industry. Information Retrieval Based Question Answering (IRQA) is the most commonly used question answering method for chat robots. As one of the methods, question-answer pairs are the most commonly used retrieval objects and reply sources in IRQA as an important data form. At the same time, question-answer pairs are also important supervised data in tasks such as sequence generation and end-to-end question answering based on deep learning. [0003] Question-answer pairs need to be generated by induction, and the most commonly used source of question-answer pair induction is session data. The present invention defin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/332
Inventor 褚善博沈李斌
Owner 上海乐言科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products