Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Corpus generation method and device and man-machine interaction processing method and device

A technology of human-computer interaction and corpus, which is applied in the computer field, can solve problems such as poor use of question answering systems, achieve high retrieval efficiency, good answer accuracy, and improve response speed

Pending Publication Date: 2020-01-24
ALIBABA (CHINA) CO LTD
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In view of this, embodiments of the present invention provide a method and device for generating a corpus, and a method and device for processing human-computer interaction, so as to solve the problem that the question answering system using the existing corpus is not effective

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus generation method and device and man-machine interaction processing method and device
  • Corpus generation method and device and man-machine interaction processing method and device
  • Corpus generation method and device and man-machine interaction processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0026] refer to figure 1 , shows a flowchart of steps of a method for generating a corpus according to Embodiment 1 of the present invention.

[0027] The corpus generation method of the present embodiment comprises the following steps:

[0028] Step S102: Generate an initial corpus vector according to the acquired initial corpus.

[0029] The initial corpus can be one or a combination of text data, image data, voice data and other data expressed in natural language. The initial corpus vector may be a vector corresponding to the initial corpus.

[0030] Wherein, those skilled in the art can generate the initial corpus vector according to the acquired initial corpus in an appropriate manner according to actual needs. For example, the initial corpus vector is generated according to the initial corpus through the Word2vec algorithm; the initial corpus vector can also be generated according to the initial corpus through the BOW (bag-of-Word) model; or the initial corpus vector ...

Embodiment 2

[0040] refer to figure 2 , shows a flow chart of steps of a method for generating a corpus according to Embodiment 2 of the present invention.

[0041] The corpus generation method of the present embodiment comprises the following steps:

[0042] Step S202: Generate an initial corpus vector according to the acquired initial corpus.

[0043] The initial corpus may be text data, image data and / or voice data expressed in natural language. The initial corpus vectors may be vectors corresponding to these initial corpus.

[0044] Wherein, those skilled in the art can generate the initial corpus vector according to the acquired initial corpus in an appropriate manner according to actual needs. For example, the initial corpus vector is generated based on the initial corpus through the Word2vec algorithm; the initial corpus vector can also be generated based on the initial corpus through the BOW (bag-of-Word) model.

[0045] In order to ensure the accuracy of the question and answ...

Embodiment 3

[0093] refer to image 3 , shows a structural block diagram of a corpus generation device according to Embodiment 3 of the present invention.

[0094] The corpus generating device of the present embodiment includes: a vector type determining module 301, which is used to generate an initial corpus vector according to the acquired initial corpus, and determines the vector type of each initial corpus vector; an initial corpus generating module 302, which is used to generate an initial corpus vector according to the described The vector type and the initial corpus vectors generate an initial corpus with an inverted chain index.

[0095] The corpus generation device generates an initial corpus with an inverted chain index structure, and clusters and stores the initial corpus vectors with the same vector type, so that the corpus generated by the corpus generation method The storage space occupied by the initial corpus is smaller, the retrieval efficiency is higher during retrieval,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a corpus generation method and device and a man-machine interaction processing method and device. The corpus generation method comprises the steps that initialcorpus vectors are generated according to obtained initial corpora, and vector types of the initial corpus vectors are determined; and an initial corpus with an inverted chain index is generated according to the vector type and the initial corpus vector. When the question-answering system applies the corpus generated by the embodiment of the invention, the question-answering system has a better effect.

Description

technical field [0001] The embodiments of the present invention relate to the field of computer technology, and in particular, to a method and device for generating a corpus, and a method and device for processing human-computer interaction. Background technique [0002] The question answering system refers to the natural language understanding as the core, so that the computer can understand and respond to the user's question (query), and realize the question and answer dialogue between the computer and the user. [0003] The industry has different division dimensions for question answering systems. According to the content dimension, it can be divided into structured data question answering, unstructured data question answering, and question answering based on question answering pairs. From a technical point of view, question answering systems are generally divided into retrieval-based question answering systems and generative-based question answering systems. The retriev...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/332G06F16/33G06F40/30G06F40/289
CPCG06F40/20G06F16/90332G06F40/44G06F40/289
Inventor 王晓军
Owner ALIBABA (CHINA) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products