Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

365 results about "Part-of-speech tagging" patented technology

In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context—i.e., its relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

Part-of-speech tagging using latent analogy

Methods and apparatuses to assign part-of-speech tags to words are described. An input sequence of words is received. A global fabric of a corpus having training sequences of words may be analyzed in a vector space. A global semantic information associated with the input sequence of words may be extracted based on the analyzing. A part-of-speech tag may be assigned to a word of the input sequence based on POS tags from pertinent words in relevant training sequences identified using the global semantic information. The input sequence may be mapped into a vector space. A neighborhood associated with the input sequence may be formed in the vector space wherein the neighborhood represents one or more training sequences that are globally relevant to the input sequence.
Owner:APPLE INC

Acronym extraction system and method of identifying acronyms and extracting corresponding expansions from text

An acronym expansion system of the present invention receives electronic documents and extracts acronyms and their corresponding expansions. A part-of-speech tagger decomposes text into string tokens or words and tags them with their part-of-speech, while an acronym identifier determines whether a word is a potential acronym based on various conditions. An expansion identifier retrieves lists of words preceding and following a potential acronym to search for the expansion. The resulting word lists are examined sequentially to identify and retrieve an expansion for the potential acronym. An expansion extractor receives the potential acronym and a processed word list to retrieve the expansion of the potential acronym from that list. The extractor may utilize information from prior search iterations, and verifies an extracted expansion against a set of rules to remove spurious expansions.
Owner:PERATON INC

Method and computer system for part-of-speech tagging of incomplete sentences

The invention relates to a method and a computer system for enhanced part-of-speech (POS-) tagging as well as grammatically disambiguating a phrase. A phrase is usually a short multiword expression that may be ambiguous. By introducing grammatical constraints the invention supports POS-tagging as well as grammatically disambiguating the phrase. According to an identifier for the phrase, the phrase is supplemented with artificial context information. The supplemented phrase is then POS-tagged or grammatically disambiguated. Important applications are POS-tagging, Automatic Term Encoding, Headword Detection and Information Retrieval.
Owner:XEROX CORP

Enquiry statement analytical method and system for information retrieval

The invention discloses a query sentence analyzing method based on understanding of natural languages and a system thereof, and belongs to the technical field of information retrieval. The query sentence analyzing method comprises the following steps: (1) automatic segmenting, named entity identification and part-of-speech tagging of an input Chinese query sentence are implemented; (2) syntax structure of the segmented sentence is analyzed so as to obtain a syntax structural tree, and meaning of each word is determined according to the sentence after the part-of-speech tagging; (3) according to the syntax structure and the meaning of each word, semantic roles of predicates in the sentence are tagged; and (4) according to the analyzed result of the sentence from the levels of syntactics, syntax and semantics, keywords are expanded and the keywords that can reflect user information retrieval requirements are extracted. The query sentence analyzing system of the invention comprises a syntactic analyzing module, a syntax analyzing module, a semantic analyzing module and a keyword extracting module. The query sentence analyzing method and system can greatly improve the accuracy of query results and provide desired query results for users.
Owner:PEKING UNIV

Chinese natural language interrogative sentence semantization knowledge base automatic question-answering method

The invention discloses a Chinese natural language interrogative sentence semantization knowledge base automatic question-answering method. The method includes the following steps that Chinese natural language processing is performed on a fact type question input by a user, word segmentation, part-of-speech tagging and identification and expanding of a named entity are achieved, and a semantic dependency tree is generated; a generalization template and a semantic analysis technology are used for acquiring time, space, a fact entity, a fact object and the like in an interrogative sentence, then semantic processing is performed, composition element attributes relevant to all events in the interrogative sentence and values of the attributes are extracted, a plurality of 'attribute-value' pairs are generated, to-be-answered elements are substituted by interrogatives, and a complex fact triple set is formed; after a triple where a to-be-answered part is located is combined with other relevant fact triple sets to form knowledge base query with conditional constraints, and query matching based on similarity calculation is performed in a knowledge base, a result is extracted from the knowledge base, and a final answer is obtained. Fast and accurate query response to the knowledge base is achieved.
Owner:NANJING UNIV

Chinese entity relation extraction method based on keyword and verb dependency

The invention discloses a Chinese entity relation extraction method based on keyword and verb dependency. Taking large-scale unstructured free text as target text, firstly, the text is segmented and keywords are extracted to form a text keyword thesaurus. Then the text is subjected to sentence segmentation, word segmentation, part-of-speech tagging, named entity recognition, dependency parsing, and entity corpus is constructed by combining named entity thesaurus and keyword thesaurus. According to the characteristics of Chinese sentence structure, syntactic structure and the dependency betweenwords, the entity-relation syntactic rules are constructed from verbs, and then each sentence in the text is matched with the relation syntactic rules. Finally, the relation triple is output and theset of text relation triple is obtained. The invention can make the entity relation extraction of the large-scale Chinese text more effective and more accurate.
Owner:SHANGHAI DATATOM INFORMATION TECH CO LTD

Method for extracting text-oriented field term and term relationship

The invention discloses a method for extracting a text-oriented field term and term relationship. The method is characterized by comprising the following steps of: firstly, preprocessing original linguistic data to obtain a candidate word set including clauses, participles and part of speech tagging, and filtering noise words; secondly, extracting term characteristics from the original linguisticdata and the Internet, and separating terms from candidate words by combining with a dual-model structure algorithm; thirdly, constructing a term dictionary by adopting an inverted index method, and tagging the terms in a text to be identified by using a longest match algorithm; and finally, carrying out multilevel sign sequence tagging through a conditional random field model according to a multi-dimensional node signing rule to obtain a relationship among the terms in the text to be identified.
Owner:XI AN JIAOTONG UNIV

Dependency semantic-based Chinese unsupervised open entity relationship extraction method

The invention relates to a dependency semantic-based Chinese unsupervised open entity relationship extraction method. The method comprises the following steps of preprocessing an input text: performing Chinese word segmentation, part-of-speech tagging and dependency grammar analysis on the input text; performing named entity identification on the input text; arbitrarily selecting two entities from identified entities to form candidate entity pairs; searching for a dependency path between two entities in the candidate entity pairs; and analyzing whether a syntactic structure mapped by the dependency path is matched with a normal form of a dependency semantic normal form set or not, if yes, extracting words or phrases from the residual part of the input text according to the matched normal form to serve as relational words, forming a relational triple by the extracted relational words and the candidate entity pairs, and if not, performing normal form matching of a next group of the candidate entity pairs; and outputting the relational triple. Compared with the prior art, the method has the advantages that the calculation complexity is low; the extraction efficiency is high; distance position limitation is overcome; a simple sentence also can be extracted and the like.
Owner:TONGJI UNIV

Hierarchical multi-label categorization method suitable for legal identification

The invention discloses a hierarchical multi-label categorization method suitable for legal recognition. The method comprises the following steps: step 1, extracting facts of a case and legal provisions thereof from a pre-processed judgment document; step 2, based on a hierarchical structure of a label space, expanding the legal provisions corresponding to the facts of the case, so that the categorization labels of the sample of the case are a subset of the label space; step 3, performing word segmentation and part-of-speech tagging on the texts of the facts of the case, selecting features ofword segmentation results, selecting features that fully represent the facts of the case, establishing a feature vector; step 4, establishing a prediction model: finding out the set N(x) of k neighborsamples in the expanded multi-label training set of a new instance x, setting a weight for each neighbor sample, calculating confidence of the new instance to each category according to categorization weight of k neighbor samples to each category, finally, predicting the category label set of the new instance.
Owner:NANJING UNIV

System and method for accurate grammar analysis using a part-of-speech tagged (POST) parser and learners' model

An accurate grammar analyzer that works effectively even with error-ridden sentences input by learners, based on a context-free probabilistic statistical POST (part-of-speech tagged) parser, for a template-automation-based computer-assisted language learning system. For any keyed-in sentence, the parser finds a closest correct sentence to the keyed-in sentence from among the embedded template paths exploiting a highest similarity value, and generates a grammar tree for the correct sentence where some ambiguous words are preassigned by expert language teachers. The system marks the errors under the leaves of the grammar tree by identifying the differences between the keyed-in sentence and the grammar tree of the correct sentence as errors committed by learners. By identifying most frequently recurring grammatical errors of each student, the system sets up a learner's model, providing a unique level of contingent remediation most appropriate to each learner involved.
Owner:SUNFLARE CO LTD

Multi-granularity semantic chunk based entity attribute and attribute value extracting method

The invention relates to a multi-granularity semantic chunk based entity attribute and attribute value extracting method, and belongs to the technical field of Web mining and information extraction. The method comprises the following steps that a corpus set is constructed and free text extraction is performed; a corpus is subjected to word segmentation, part-of-speech tagging and phrase recognition; the corpus is subjected to semantic role labeling; the corpus is subjected to dependency grammar analysis; the corpus is subjected to semantic dependency analysis; candidate entities, attributes and attribute value triads based on three granularities of words, phrases and semantic roles are extracted; the candidate entities, attributes and attribute value triads are corrected and subjected to error classification by means of a trained classifier. Compared with the prior art, the entities, attributes and attribute value triads based on three granularities of words, phrases and semantic roles are automatically extracted from a free text, the entity attribute and attribute value extraction accuracy and efficiency are improved, and the wide application prospect is achieved in the fields of theme detection, information retrieval, automatic abstracting, question and answer systems and the like.
Owner:BEIJING INSTITUTE OF TECHNOLOGYGY

Automatic legal knowledge graph construction method

The invention provides an automatic legal knowledge graph construction method, and aims at automatically constructing legal knowledge graphs according to trial documents. The method comprises the following steps of carrying out stop word removal and word segmentation on obtained trial documents; respectively extracting subject words of three types of trial documents, carrying out part-of-speech tagging and filtration on the extracted subject words, and extracting noun or noun phrase subject word to serve as entity concepts of a legal knowledge graph according to the filtration result; obtaining words similar with each extracted noun or noun phrase subject word, carrying out part-of-speech tagging and filtration on the obtained similar words, and extracting noun or noun phrase subject word similar words as entity concepts of the legal knowledge graph according to the filtration result; and constructing the legal knowledge graph according to the extracted subject word entity concepts, the similar word entity concepts and triple structures such as subject word-subject relationship-subject word and subject word-similar relationship-similar word formed by a relationship between the subject word entity concepts and the similar word entity concepts. The invention relates to the technical field of knowledge engineering.
Owner:UNIV OF SCI & TECH BEIJING

Text sentiment analysis method and system, and computer readable storage medium

The invention relates to the technical field of artificial intelligence, and discloses a text sentiment analysis method, a text sentiment analysis system and a computer readable storage medium, for improving the accuracy of text sentiment analysis. The method comprises the steps of inputting a word vector corresponding to any sentence into a preset LSTM network model, thus acquiring a hiding layervector of each word; tagging the part-of-speech of an acquired text word set, training the text word set carrying part-of-speech tagging information, and splitting a part-of-speech vector matrix generated by training by using words as units, thus acquiring the part-of-speech vector corresponding to each word; using sentences as units, performing word embedding weighted summation attention analysis on the hiding layer vector and the part-of-speech vector corresponding to each word in the sentence to acquire a sentence vector carrying attention information of each sentence, and using the sentence vector carrying the attention information to serve as the input of a sentiment classification model, thus acquiring a sentiment classification result of each sentence and / or a classification resultof the original text.
Owner:CENT SOUTH UNIV

System and method for forecasting fluctuations in future data and particularly for forecasting security prices by news analysis

InactiveUS20090024504A1Affect structureFinanceNews analyticsFinancial transaction
A system and method for predicting price fluctuations in financial markets. Our approach utilizes both market history and public news articles, published before the beginning of trading each day, to produce a set of recommended investment actions. We empirically show that these markets are surprisingly predictable, even by purely market-historical techniques. Furthermore, analyzing relevant news articles captures information features independent of the markets history, and combining the two methods significantly improves results. Capturing usable features from news articles requires some linguistic sophistication the standard naïve bag-f-words approach does not yield predictive features. Instead, we use part-of-speech tagging, dependency parsing and semantic role labeling to generate features that improve system accuracy. We evaluate our system on eight political prediction markets from 2004 and show that we can make effective investment decisions based on our systems predictions, whose profits greatly exceed those generated by a baseline system.
Owner:LERMAN KEVIN +1

Chinese electronic medical record named entity recognition method

InactiveCN109871538ARich grammatical featuresReduce labeling errorsSpecial data processing applicationsMedical recordPart of speech
The invention discloses a Chinese electronic medical record named entity identification method. The method comprises the following steps: 1) constructing a common vocabulary dictionary; 2) simple part-of-speech tagging; 3) constructing a text and part-of-speech vector mapping table; 4) training a prediction model of the named entity; and 5) predicting the label of the named entity. According to the method, the part-of-speech characteristics are added to improve the boundary distinguishability of the named entity and the common vocabularies, so that the boundary accuracy of the named entity isimproved. At the same time, a self-attention mechanism is introduced into the bidirectional LSTM-CRF model, and the relevancy between the input at each moment and other components in the sentence is calculated, so that the long dependency problem is relieved, and the named entity recognition accuracy is improved.
Owner:SOUTH CHINA UNIV OF TECH

System and method of disambiguating and selecting dictionary definitions for one or more target words

Systems and methods for automatically selecting dictionary definitions for one or more target words include receiving electronic signals from an input device indicating one or more target words for which a dictionary definition is desired. The target word(s) and selected surrounding words defining an observation sequence are subjected to a part of speech tagging algorithm to electronically determine one or more most likely part of speech tags for the target word(s). Potential relations are examined between the target word(s) and selected surrounding keywords. The target word(s), the part of speech tag(s) and the discovered keyword relations are then used to map the target word(s) to one or more specific dictionary definitions. The dictionary definitions are then provided as electronic output, such as by audio and / or visual display, to a user.
Owner:DYNAVOX SYST

Parsing method

A method of parsing natural language comprising the steps of: a) receiving a tokenised and part-of-speech tagged utterance comprising n tokens b) for the first token; i) calculating a partial parse consisting of one dependency relation by assigning a role and a head for the first token; ii) calculating the probability of the partial parse from step (i) iii) repeating steps (b)(i) and (b)(ii) for all possible heads and roles of the token and storing the A most likely resulting partial parses c) advancing to the next successive token and, for each of the A partial parses from the previous step: iv) calculating a possible next extension to the partial parse by one dependency relation v) calculating the probability of the extended partial parse from (c)(i) vi) repeating steps (c)(i) and (c)(ii) for all possible heads and roles of the token and storing the A most likely resulting partial parses d) repeating step (c) for each successive token until all n tokens have been parsed.
Owner:KK TOSHIBA

Construction and utilization method for context-aware dynamic word or character vector on the basis of deep learning

The invention belongs to the technical field of the natural language processing of computers, in particular to a construction and utilization method for a context-aware dynamic word or character vector on the basis of deep learning. The dynamic construction method for the context-aware dynamic word or character vector on the basis of the deep learning comprises the following steps of: in massive texts, through an unsupervised learning method, simultaneously learning a global feature vector of a word or character and the feature vector representation of the global feature vector when a specific context appears, and combining the global feature vector with the context feature vector, and dynamically generating word or character vector representation. By use of the method, the word or character vector dynamically constructed on the basis of the context can be applied to a natural language processing system. The method is mainly used for solving a problem that the word or character vector expresses different meanings in different contexts, i.e. the problem that one word or one character has multiple meanings can be solved. The dynamic word or character vector can be used for obviously improving the performance of various natural language processing tasks of different languages, wherein the tasks comprise Chinese word segmentation, part-of-speech tagging, naming recognition, grammatical analysis, semantic role tagging, sentiment analysis, text classification, machine translation and the like.
Owner:FUDAN UNIV

Text analysis method and text analyzer

The invention discloses a text analysis method and a text analyzer. The method comprises the following steps of: performing splitting processing on an acquired text by utilizing characters as a unit, and performing characteristic tagging on characters obtained by splitting according to preset character characteristics so as to form tagged word strings; performing word segmentation processing on the tagged word strings according to pre-constructed word segmentation models so as to obtain word segmentation results containing word orders; performing merging processing on the word orders contained in the word segmentation results, and performing characteristic tagging on words obtained by merging according to the preset character characteristics so as to obtain tagged word strings; performing part-of-speech tagging on the tagged word strings according to pre-constructed part-of-speech tagging models so as to obtain part-of-speech tagging results; and if confirming that the part-of-speech tagging results contain part-of-speech tags of entity words, merging the entity words containing the part-of-speech tags in the part-of-speech tagging results according to same adjacent rules, so as to obtain a text analysis result. By applying the text analysis method and the text analyzer, the entity word text analysis accuracy rate can be improved.
Owner:新浪技术(中国)有限公司

Text classification method combining dynamic word embedding with part-of-speech tagging

The invention discloses a text classification method combining dynamic word embedding with part-of-speech tagging, and provides the text classification method based on a deep neural network through combining dynamic word embedding with part-of-speech tagging. The method can fully utilize the advantages that a large-scale corpus can provide more accurate grammar and semantic information, and can also adjust word embedding by combining with the features of the corpus during a model training process, and thus the features of the corpus can be better learned. Meanwhile, classification accuracy can be further improved by combining with part of speech information of words in sentences. The invention also comprehensively utilizes the advantages of LSTM in the aspect of learning context information of words and part of speech in the sentences, and the advantages of CNN in the aspect of learning text local features. The classification model provided by the invention has the advantages of high accuracy and strong universality, and achieves good effect in some famous public corpuses including IMDB corpus, Movie Review and TREC.
Owner:SOUTH CHINA UNIV OF TECH

Training a natural language processing model with information retrieval model annotations

Systems and techniques are provided for training a natural language processing model with information retrieval model annotations. A natural language processing model may be trained, through machine learning, using training examples that include part-of-speech tagging and annotations added by an information retrieval model. The natural language processing model may generate part-of-speech, parse-tree, beginning, inside, and outside label, mention chunking, and named-entity recognition predictions with confidence scores for text in the training examples. The information retrieval model annotations and part-of-speech tagging in the training example may be used to determine the accuracy of the predictions, and the natural language processing model may be adjusted. After training, the natural language processing model may be used to make predictions for novel input, such as search queries and potential search results. The search queries and potential search results may have information retrieval model annotations.
Owner:GOOGLE LLC

An information retrieval-based question and answer system and method for knowledge graph energization

The invention discloses an information retrieval-based question and answer system and method for knowledge graph energization, which integrally improve the question and answer effect of the system, expand the user consultation range and improve the question feedback accuracy. According to the technical scheme, the system comprises a knowledge map database for storing domain knowledge map information; a word segmentation and part-of-speech tagging module which segments the user questions and tags the part-of-speech of the user questions; an entity identification and link module which identifiesentities in the user questions and links the entities to nodes in the knowledge graph database; an intention understanding module which obtains an intention understanding result of the user problem based on the entity link result and the distributed representation vector; a retrieval module which retrieves a plurality of corresponding question and answer pairs as roughing results according to theinformation in the user questions based on the retrieval data source; a sorting module which is used for resorting the roughing results by utilizing the distributed representation vectors of the entities; and a semantic matching module which scores the reordering result by using the distributed representation vector of the entity and finally outputs an answer.
Owner:上海乐言科技股份有限公司

Method and system for identifying named entity

InactiveCN106557462AMake full use of "prior knowledgeSolve predictive powerBiological neural network modelsNatural language data processingHidden layerFeature vector
The technical scheme of the invention discloses a method and a system for identifying a named entity. The method for identifying the named entity comprises the following steps: merging feature vectors, wherein the feature vectors comprise pre-trained word vectors, self-training word vectors and part-of-speech tagging vectors, and a neural network is a convolutional neural network or a deep belief neural network; using the merged and obtained feature vectors as input, and obtaining a classified output result through treatment of a hidden layer, a reducing layer and an output classification layer of the neural network; and adopting a multi-mode matching algorithm to identify the classified output result so as to obtain the target entity. Through the technical scheme, the feature vectors are merged as input features of the neural network, so that the method can be well applied to specific classification scenes through treatment of the neural network and multi-mode matching.
Owner:数库(上海)科技有限公司

Text emotion analysis system based on deep learning

The invention discloses a text emotion analysis system based on deep learning. The system comprises an information collection module, an information pre-processing module, an emotion analysis module and an information display module, wherein the information collection module is used to collect comment information in each Internet resource website; the information pre-processing module is used to conduct classification, word segmentation, part-of-speech tagging, emotion information tagging processing and storage of the collected comment information; the emotion analysis module transforms the processed comment information into a phrase vector by a word representation model, a sentence module and a section and chapter model, and also inputs the phrase vector into the emotion classification model for emotion analysis; and the information display module is used to present emotion analysis results in a visualized manner. The system has the advantages that emotion orientation analysis can be conducted on the comment information; the analysis results can be presented to users in a visualized manner; and further public opinion analysis results or early warning can be provided to related departments such as enterprises or governments.
Owner:ZHEJIANG GONGSHANG UNIVERSITY

Text semantic analysis method

A text semantic analysis method and system can realize semantic analysis of text data base on lexical level and sentence level. Aiming at the semantic analysis at the lexical level, the invention firstly adopts an improved word segmentation algorithm to solve the problem that English words are segmented only by spaces. Secondly, based on word segmentation, TF-IDF modeling is performed to obtain weight value; Then the text is vectorized by weighting and summing the weight value and the word vector trained by Word2Vec, and finally the document similarity is solved. At the same time, the invention considers the contribution degree of the vocabulary to the document content and the semantic status to calculate the similarity degree of the document, the result has higher accuracy, and provide agood foundation for subsequent text clustering. The present invention extracts subject-predicate object structure based on text segmentation, part-of-speech tagging, syntactic analysis and dependencyrelation for sentence level semantic analysis. The invention realizes the extraction of subject-predicate-object structures of various sentence types in all aspects, and realizes the noun expansion function, which is more consistent with the manual extraction result.
Owner:BEIJING UNIV OF TECH

A multi-strategy fusion knowledge question and answer method and system

The invention discloses a multi-strategy fusion knowledge question and answer method and system, and the method comprises an offline part and an online part, the offline part is mainly used for data preparation and model training, and the online part is mainly used for system service, and the method comprises the steps: receiving a statement input by a user, and correcting a spelling error; Performing word segmentation and part-of-speech tagging on the statement input by the user; Extracting entity information in the user statement, and linking the entity to the knowledge graph node; Obtainingan executable query statement through a multi-strategy fusion semantic analysis step according to the result of the entity recognition and connection process; Executing a query on a knowledge graph by the executable query statement to obtain an answer, and then generating a corresponding natural language reply user according to the answer through a reply generation mode combined by multiple methods, so that the question and answer system is suitable for question query of general and domain knowledge graphs, the system robustness is improved, and meanwhile, the good interpretability and controllability are achieved.
Owner:上海乐言科技股份有限公司

System and method for decoding speech

The system and method for speech decoding in speech recognition systems provides decoding for speech variants common to such languages. These variants include within-word and cross-word variants. For decoding of within-word variants, a data-driven approach is used, in which phonetic variants are identified, and a pronunciation dictionary and language model of a dynamic programming speech recognition system are updated based upon these identifications. Cross-word variants are handled with a knowledge-based approach, applying phonological rules, part-of-speech tagging or tagging of small words to a speech transcription corpus and updating the pronunciation dictionary and language model of the dynamic programming speech recognition system based upon identified cross-word variants.
Owner:KING FAHD UNIVERSITY OF PETROLEUM AND MINERALS +1
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products