Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

51 results about "Perplexity" patented technology

In information theory, perplexity is a measurement of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models. A low perplexity indicates the probability distribution is good at predicting the sample.

Chinese image semantic description method combined with multilayer GRU based on residual error connection Inception network

The invention discloses a Chinese image semantic description method combined with multilayer GRU based on a residual error connection Inception network, and belongs to the field of computer vision andnatural language processing. The method comprises the steps: carrying out the preprocessing of an AI Challenger image Chinese description training set and an estimation set through an open source tensorflow to generate a file at the tfrecord format for training; pre-training an ImageNet data set through an Inception_ResNet_v2 network to obtain a convolution network pre-training model; loading a pre-training parameter to the Inception_ResNet_v2 network, and carrying out the extraction of an image feature descriptor of the AI Challenger image set; building a single-hidden-layer neural network model and mapping the image feature descriptor to a word embedding space; taking a word embedding characteristic matrix and the image feature descriptor after secondary characteristic mapping as the input of a double-layer GRU network; inputting an original image into a description model to generate a Chinese description sentence; employing an evaluation data set for estimation through employing the trained model and taking a Perplexity index as an evaluation standard. The method achieves the solving of a technical problem of describing an image in Chinese, and improves the continuity and readability of sentences.
Owner:HARBIN UNIV OF SCI & TECH

Information interactive network-based criminal individual recognition method

The invention belongs to the field of data mining, and relates to an information interactive network-based criminal individual recognition method. The method comprises the following steps of: (1) obtaining a data set which comprises criminal activity contents, and pre-processing the data set; (2) extracting keyword descriptions of criminal topics; (3) determining the number of subjects of a subject model LDA on the basis of perplexity; (4) extracting interactive subjects between individuals in the pre-processed data set on the basis of LDA, wherein the interactive subjects are as follows: an association probability matrix of the interactive subjects and keywords, and an association probability matrix of interactive edges of the interactive subjects; (5) calculating weights of the interactive edges; (6) calculating local criminal suspects of the individuals on the basis of a structure of a weighted information interactive network; and (7) calculating global criminal suspects of the individuals on the basis of a fuzzy K-means cluster and distance density cluster combined method, and recognizing the criminal individuals. The method is independent of prior information, and can be used for analyzing the most possible suspected person according to communication contents so that the case handling efficiency is improved.
Owner:NAT UNIV OF DEFENSE TECH

Method, system and storage medium for classifying text set

InactiveCN108846120APreserve semanticsPreserve word order informationCharacter and pattern recognitionSpecial data processing applicationsFeature vectorText categorization
The invention provides a method, system and storage medium for classifying a text set, and belongs to the technical field of the text classification algorithm. The method comprises the following steps: reading the text set needing to be classified and preprocessing the text set; determining the perplexity of the text set; determining topic number of the text set when the perplexity takes the minimum value; generating the topic vector of the text set by adopting a BTM model according to the topic number; generating a feature vector according to the text set by adopting the Doc2vec model; combining the topic vector and the feature vector to generate a feature space vector of the text set; and serving the feature space vector as the original input space vector of a SVM classifier to input into the SVM classifier, thereby performing the classification. Through the method, system and storage medium for classifying the text set disclosed by the invention, the efficiency of the text classification algorithm can be improved.
Owner:HEFEI UNIV OF TECH

Perplexity calculation device

A perplexity calculation device 500 includes: a weight coefficient calculating part 501 for, with respect to each of a plurality of text constituent words constituting a text, calculating a weight coefficient for correcting a degree of ease of word appearance having a value which becomes larger as a probability of appearance of the text constituent word becomes higher based on a statistical language model showing probabilities of appearance of words, based on word importance representing a degree of importance of the text constituent word; and a perplexity calculating part 502 for calculating perplexity of the statistical language model to the text, based on the calculated weight coefficients and the degrees of ease of word appearance.
Owner:NEC CORP

Error correction method, device and equipment and storage medium

The invention relates to the technical field of artificial intelligence, and discloses an error correction method. The method comprises the following steps: detecting that a to-be-corrected object exists in a text; extracting context contents of the to-be-corrected objects based on the positions of the to-be-corrected objects, inputting the corresponding similar objects into the error correction model according to the context contents and the similar objects to obtain corresponding alternative probabilities of the to-be-corrected objects, and selecting one corresponding object from the alternative probabilities as a replacement object to perform replacement processing on the to-be-corrected objects based on the alternative probabilities. The invention further provides an error correction device and equipment and a storage medium. Information of a to-be-corrected object is predicted based on the to-be-corrected object and context content at the same time; the confusion degree of a language model during semantic recognition can be reduced, so that relatively accurate similar objects are extracted, then the alternative probability of each similar object is calculated based on an errorcorrection model in combination with context content, and a relatively large object is selected from the alternative probability, so that the probability of each character or word is improved, and the final error correction accuracy is also improved.
Owner:PING AN TECH (SHENZHEN) CO LTD

Label classification method and device of corpora, computer equipment and storage medium

The embodiment of the invention relates to the field of artificial intelligence, and provides a label classification method of corpora, which comprises the following steps of: carrying out word segmentation on multiple sections of text data of multiple sections of corpus data to obtain corresponding multiple sections of word segmentation results; inputting the multiple sections of word segmentation results into a probability model, and analyzing the word segmentation results by modeling of the probability model to obtain a plurality of K values; calculating the confusion degree of the plurality of K values, and taking the K value with the minimum confusion degree to obtain a corresponding first-level label; and inputting the corresponding multi-segment word segmentation results into a deformed bidirectional encoder representation model corresponding to the first-level label, and obtaining sub-labels under the first-level label through the deformed bidirectional encoder representation model. In addition, the invention further relates to blockchain technology, and the multiple segments of text data can be stored in the blockchain. The invention further provides a label classificationdevice for the corpora, computer equipment and a storage medium. Label classification accuracy of the corpora is improved.
Owner:CHINA PING AN PROPERTY INSURANCE CO LTD

Text generation model training method, target corpus expansion method and related device

The invention discloses a text generation model training method, a target corpus expansion method and a related device. The training method of the text generation model comprises the following steps: acquiring a sample corpus; performing word segmentation processing on the sample corpus, and generating a statistical language model according to a word segmentation processing result; generating a target text by using a generator of the text generation model; according to the sample corpus, utilizing a discriminator of a text generation model to discriminate the target text, outputting a discrimination result, and obtaining an adversarial loss function according to the discrimination result; acquiring the confusion degree of the target text by utilizing a statistical language model, and determining a penalty term according to the confusion degree; and superposing the confrontation loss function and the penalty term to obtain a target loss function of the text generation model, and training the text generation model by using the target loss function to obtain a trained text generation model. According to the scheme, the training of the text generation model can be guided by utilizing the existing corpus, and the performance of the text generation model is improved.
Owner:ZHEJIANG DAHUA TECH CO LTD

Oil suspended agent

The invention relates to an oil suspended agent prepared from an agricultural chemical active ingredient and solvent oil, wherein the agricultural chemical active ingredient is a combination of one or more of sulfonylurea compounds, triketone compounds and triazine compounds; the solvent oil contains compounds as shown in a general formula (I) R-(CH2)n-R minute (I); in the formula (I), R is alkyl with 5 to 10 carbon atoms, n is an integer of 6 to 20, and R minute is phenyl, benzyl or naphthyl. The oil suspended agent is good in medicinal effect and has no biosecurity perplexity; in addition, after the oil suspended agent is placed for a long time, less oil is separated out, and no bottom precipitation phenomenon is caused; the product is stable in quality, the problem of poor stability in the storage process of the product is solved, and the use of farmers is facilitated.
Owner:FMC CHINA INVESTMENT

Unsupervised online public opinion junk long text recognition method

The invention discloses an unsupervised online public opinion junk long text recognition method. The recognition method comprises the following steps of obtaining data of corresponding public opinionjunk texts with marks and normal texts from an existing internal system; respectively constructing two models, including a language model trained based on an online public opinion text and a BERT nextsentence prediction model based on the online public opinion text, and respectively inputting a to-be-predicted online public opinion long text into the language model and the BERT next sentence prediction model; evaluating whether the interior of a sentence is a junk text or not by utilizing a language model confusion index; evaluating the context coherence between sentences of the text by utilizing a next sentence prediction model of BERT; completing the junk text recognition task of the long text by combining the junk text information and the supervision data, thus the junk text information can be automatically recognized, meanwhile, the cost generated by obtaining the supervision data is greatly reduced, and a system without the supervision data can recognize the junk text from the beginning.
Owner:南京擎盾信息科技有限公司

Pre-training language model-oriented privacy disclosure risk assessment method and system

The invention relates to the field of privacy security, and aims to provide a pre-training language model-oriented privacy disclosure risk assessment method and system. Comprising the following steps: adding forged data into a pre-training data set; inputting the pre-training data set into the initialized neural network model, and calculating loss according to a set pre-training task and a loss function; parameters of the model are continuously updated in the training process, and the privacy leakage risk of the model is increased; inputting the fine tuning data set into a pre-trained neural network model, and performing fine tuning on the feature extraction capability of the model; privacy prefix content is input into the model, and text information serving as a prediction result is output; and calculating, counting and sorting the confusion of the output information, and evaluating the risk of privacy data leakage by comparing the proportion of the generated privacy information. According to the method, the accuracy of evaluating the privacy data leakage risk can be effectively improved, the privacy data leakage risk existing in the pre-training language model is exposed, and a thought is provided for subsequent development of related defense methods.
Owner:ZHEJIANG UNIV

Speech sound identification method

It is one of the objectives of the inventive method to reduce the burden of search within the set of possible candidates for a speech phrase to be recognized. The objective is achieved by a method in which regions of high complexity of a recognition grammar are paired with regions of lower perplexity by using additional constraints. The search proceeds then by evaluating the low-perplexity region of the grammar first finding word candidates which are used to limit the search effort that needs to be expended when recognizing the parts of speech corresponding to the higher perplexity part of the recognition grammar.
Owner:SONY INT (EURO) GMBH

Cross-domain language model training method and device, electronic equipment and storage medium

The invention provides a cross-domain language model training method and device, electronic equipment and a storage medium. The method comprises the steps of obtaining corpus training sets of multipledomains; training a plurality of language models based on the corpus training sets of the plurality of fields to obtain respective outputs of the plurality of language models, the plurality of language models being in one-to-one correspondence with the corpus training sets of the plurality of fields; and according to respective outputs of the plurality of language models and preset interpolationcoefficients of the plurality of language models, interpolating the plurality of language models to obtain a cross-domain language model. According to the cross-domain language model training method provided by the invention, language models in multiple domains are mixed into one model through a language model mixing method based on linear interpolation, so that the cross-domain vulnerability of the language model is effectively solved, the performance index of the language model is improved, and the confusion degree of the language model is reduced.
Owner:北京明朝万达科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products