Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

2248 results about "Text categorization" patented technology

Text categorization (a.k.a. text classification) is the task of assigning predefined categories to free-text documents. It can provide conceptual views of document collections and has important applications in the real world.

Robust information extraction from utterances

The performance of traditional speech recognition systems (as applied to information extraction or translation) decreases significantly with, larger domain size, scarce training data as well as under noisy environmental conditions. This invention mitigates these problems through the introduction of a novel predictive feature extraction method which combines linguistic and statistical information for representation of information embedded in a noisy source language. The predictive features are combined with text classifiers to map the noisy text to one of the semantically or functionally similar groups. The features used by the classifier can be syntactic, semantic, and statistical.
Owner:NANT HLDG IP LLC

Category based, extensible and interactive system for document retrieval

In information retrieval (IR) systems with high-speed access, especially to search engines applied to the Internet and / or corporate intranet domains for retrieving accessible documents automatic text categorization techniques are used to support the presentation of search query results within high-speed network environments. An integrated, automatic and open information retrieval system (100) comprises an hybrid method based on linguistic and mathematical approaches for an automatic text categorization. It solves the problems of conventional systems by combining an automatic content recognition technique with a self-learning hierarchical scheme of indexed categories. In response to a word submitted by a requester, said system (100) retrieves documents containing that word, analyzes the documents to determine their word-pair patterns, matches the document patterns to database patterns that are related to topics, and thereby assigns topics to each document. If the retrieved documents are assigned to more than one topic, a list of the document topics is presented to the requester, and the requester designates the relevant topics. The requester is then granted access only to documents assigned to relevant topics. A knowledge database (1408) linking search terms to documents and documents to topics is established and maintained to speed future searches. Additionally, new strategies are presented to deal with different update frequencies of changed Web sites.
Owner:COGISUM INTERMEDIA

System and method for sentiment-based text classification and relevancy ranking

The sentimental significance of a group of historical documents related to a topic is assessed with respect to change in an extrinsic metric for the topic. A unique sentiment binding label is included to the content of actions documents that are determined to have sentimental significance and the group of documents is inserted into a historical document sentiment vector space for the topic. Action areas in the vector space are defined from the locations of action documents and singular sentiment vector may be created that describes the cumulative action area. Newly published documents are sentiment-scored by semantically comparing them to documents in the space and / or to the singular sentiment vector. The sentiment scores for the newly published documents are supplemented by human sentiment assessment of the documents and a sentiment time decay factor is applied to the supplemented sentiment score of each newly published documents. User queries are received and a set of sentiment-ranked documents is returned with the highest age-adjusted sentiment scores.
Owner:MARKETCHORUS

Method and system for extracting and classifying geolocation information utilizing electronic social media

Methods, systems and processor-readable media for extracting and classifying location information utilizing social media messages and / or data thereof. The social media messages can be sampled from a social media database and the messages filtered based on a heuristic rule. A geolocation entity from the unstructured social media messages can be extracted utilizing a geolocation entity extracting module. The messages with the geoentities can be uploaded onto a crowd sourcing platform to manually annotate the messages with a label. A text classification model can be built and learned from the label utilizing a machine learning algorithm and the messages can be classified by a location classifier in order to extract the user location. The user location can then be transformed into a geocode so that a spatial search can be enabled and the distance between the locations can be easily calculated.
Owner:XEROX CORP

Document categorisation system

A document categorization system, including a clusterer for generating clusters of related electronic documents based on features extracted from the documents, and a filter module for generating a filter on the basis of the clusters to categorize further documents received by the system. The system may include an editor for manually browsing and modifying the clusters. The categorization of the documents is based on n-grams, which are used to determine significant features of the documents. The system includes a trend analyzer for determining trends of changing document categories over time, and for identifying novel clusters. The system may be implemented as a plug-in module for a spreadsheet application for permitting one-off or ongoing analysis of text entries in a worksheet.
Owner:TELSTRA CORPORATION LIMITD

Robust Information Extraction from Utterances

The performance of traditional speech recognition systems (as applied to information extraction or translation) decreases significantly with, larger domain size, scarce training data as well as under noisy environmental conditions. This invention mitigates these problems through the introduction of a novel predictive feature extraction method which combines linguistic and statistical information for representation of information embedded in a noisy source language. The predictive features are combined with text classifiers to map the noisy text to one of the semantically or functionally similar groups. The features used by the classifier can be syntactic, semantic, and statistical.
Owner:NANT HLDG IP LLC

System and method for sentiment-based text classification and relevancy ranking

The sentimental significance of a group of historical documents related to a topic is assessed with respect to change in an extrinsic metric for the topic. A unique sentiment binding label is included to the content of actions documents that are determined to have sentimental significance and the group of documents is inserted into a historical document sentiment vector space for the topic. Action areas in the vector space are defined from the locations of action documents and singular sentiment vector may be created that describes the cumulative action area. Newly published documents are sentiment-scored by semantically comparing them to documents in the space and / or to the singular sentiment vector. The sentiment scores for the newly published documents are supplemented by human sentiment assessment of the documents and a sentiment time decay factor is applied to the supplemented sentiment score of each newly published documents. User queries are received and a set of sentiment-ranked documents is returned with the highest age-adjusted sentiment scores.
Owner:MARKETCHORUS

Method for improvement accuracy of decision tree based text categorization

A text categorization method automatically classifies electronic documents by developing a single pooled dictionary of words for a sample set of documents, and then generating a decision tree model, based on the pooled dictionary, for classifying new documents. Adaptive resampling techniques are applied to improve the accuracy of the decision tree model.
Owner:NUANCE COMM INC

System and method for document categorization

The present invention provides methods and systems for automatic categorization of documents. More specifically, the present invention provides for the automatic assignment of a set of pre-defined topics to a set of documents.
Owner:STEICHEN TERRIL JOHN

Text categorization toolkit

A module information extraction system capable of extracting information from natural language documents. The system includes a plurality of interchangeable modules including a data preparation module for preparing a first set of raw data having class labels to be tested, the data preparation module being selected from a first type of the interchangeable modules. The system further includes a feature extraction module for extracting features from the raw data received from the data preparation module and storing the features in a vector format, the feature extraction module being selected from a second type of the interchangeable modules. A core classification module is also provided for applying a learning algorithm to the stored vector format and producing therefrom a resulting classifier, the core classification module being selected from a third type of the interchangeable modules. A testing module compares the resulting classifier to a set of preassigned classes, where the testing module is selected from a fourth type of the interchangeable modules, where the testing module tests a second set of raw data having class labels received by the data preparation module to determine the degree to which the class labels of the second set of raw data approximately corresponds to the resulting classifier.
Owner:IBM CORP

Automated topic discovery in documents and content categorization

ActiveUS9047283B1Easy to findEfficient and accurate and scalableWeb data indexingSemantic analysisSemantic propertyPart of speech
A computer-assisted method for discovering topics and categorizing contents in a document includes the steps of calculating an importance score for a term based on grammatical roles, parts of speech, and semantic attributes, selecting terms based on the importance score values of the respective terms, and outputting terms comprising the selected term to represent topics in the document, and building a category structure based on the selected terms.
Owner:LINFO IP LLC

Method and system for filtering sensitive web page based on multiple classifier amalgamation

The invention discloses a system and a method for filtering sensitive webpage, which is based on multi-classifier fusion. The processing object is a webpage, and the processing result is whether the webpage contains sensitive content, which may be pornography, reaction, violence and other unhealthy Internet contents harmful to society. The system comprises a data stream obtaining and preprocessing unit, an image and text stream filtering unit and an information fusion unit of image filter and text filter, by the cooperation of multiple classifiers, the system acquires source code of a webpage by using the URL of the webpage, a text and an image are separated at preprocessing stage to obtain text information and effective image information; an input webpage is divided into three modes by decision tree algorithm; the webpage is recognized by using a consecutive text classifier, a discrete sensitive text classifier and an image classifier, the output result recognized by the classifiers is fused and calculated, then a judge factor is given, and the final result is returned to a browser.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

System and method for automatically classifying text

A method is provided for automatically classifying text into categories. In operation, a plurality of tokens or features are manually or automatically associated with each category. A weight is then coupled to each feature, wherein the weight indicates a degree of association between the feature and the category. Next, a document is parsed into a plurality of unique tokens with associated counts, wherein the counts are indicative of the number of times the feature appears in the document. A category score representative of a sum of products of each feature count in the document times the corresponding feature weight in the category for each document is then computed. Next, the category scores are sorted by perspective, and a document is classified into a particular category, provided the category score exceeds a predetermined threshold.
Owner:AVOLIN LLC

Method and apparatus for adjusting the model threshold of a support vector machine for text classification and filtering

An information need can be modeled by a binary classifier such as support vector machine (SVM). SVMs can exhibit very conservative precision oriented behavior when modeling information needs. This conservative behavior can be overcome by adjusting the position of the hyperplane, the geometric representation of a SVM. The present invention describes a couple of automatic techniques for adjusting the position of an SVM model based upon a beta-gamma thresholding procedure, cross fold validation and retrofitting. This adjustment technique can also be applied to other types of learning strategies.
Owner:JUSTSYST EVANS RES

Academic resource recommendation service system and method

The invention provides an academic resource recommendation service system and method. The method comprises the following steps: crawling academic resources on an internet by using an LDA (Latent Dirichlet Allocation)-based focused crawler, classifying the academic resources according to preset A types by using an LDA-based text classification model, and storing the academic resources in a local academic resource database, wherein the system further comprises an academic resource model, a resource quality value calculation module and a user interest module; implanting a tracking software module at a user terminal, combining interesting subjects and historical browsing behavior data of the user, respectively modeling the academic resource model and the user interest module by virtue of four dimensions such as the academic resource type, subject theme distribution, key word distribution and LDA latent theme distribution, calculating the similarity between the academic resource model and the user interest preference module, combining the resource quality value to calculate the recommendation degree, and finally perform academic resource Top-N recommendation for the user according to the recommendation degree. According to the method disclosed by the invention, personalized accurate recommendation of the academic resources is performed according to the identity, interest and browsing behaviors of users, and the working efficiency of scientific research personnel is improved.
Owner:NINGBO UNIV

Feature selection for two-class classification systems

A two-class analysis system for summarizing features and determining features appropriate to use in training a classifier related to a data mining operation. Exemplary embodiments describe how to select features which will be suited to training a classifier used for a two-class text classification problem. Bi-Normal Separation methods are defined wherein there is a measure of inverse cumulative distribution function of a standard probability distribution and representative of a difference between occurrences of the feature between said each class. In addition to training a classifier, the system provides a means of summarizing differences between classes.
Owner:MICRO FOCUS LLC

Programming guide content collection and recommendation system for viewing on a portable device

An EPG contents collection and recommendation system includes an EPG database of identifications of available programs. A program information acquisition module applies text classification to detailed descriptions of the available programs. An EPG recommendation module recommends an available program to a user based on the text classification. Preferably, EPG contents are collected from publicly available TV websites and parsed into a uniform format. For example, contents are vectorized, and a Maximum Entropy technique is applied. Also, user interaction with the EPG database is used to form a user profile database. Further, classifiers are trained based on contents of the user profile database, and these classifiers are used to recommend EPG contents to the user.
Owner:SOVEREIGN PEAK VENTURES LLC

Domain-knowledge-based short text classification method and text classification system

The invention discloses a domain-knowledge-based short text classification method and a domain-knowledge-based short text classification system used in the technical field of information. The method is used for overcoming the defect that the traditional text classification method cannot well classify short texts. Aiming at the characteristics that the short text description concept signals are relatively weak and the text features are seriously insufficient, the invention provides the short text data classification method and the text classification system suitable for commodity web page data. According to the embodiment, a commodity classifier with excellent classification effect is obtained by reforming the traditional classifier, introducing new elements and devoting to matching application of algorithm and data. The introduction of the new elements comprises the following steps of: introducing a concept of domain words and introducing the concept into the classifier so as to effectively increase the information quantity of the short texts; and performing different-lexical-item-set-based semantic analysis on the short text data, particularly the web page commodity data, and introducing the semantic analysis result into the classifier so as to introduce new information for the commodity data information and improve the accuracy of text classification.
Owner:SHANGHAI BIJIA DATA

Text categorization feature selection and weight computation method based on field knowledge

The invention relates to the artificial intelligence technical field, in particular to a text classification feature selection and weigh calculation method based on field knowledge. The method combines sample statistics and field glossaries to construct a filed classification feature space, utilizes internal knowledge relations in the field, calculates the similarity between the glossaries, and then adjusts the corresponding feature weight of classification feature vectors. Moreover, the method adopts a learning algorithm of a support vector machine to construct a field text classification model and then realize field text classification. As shown by text classification laboratory results of the Yunan tourist field and the non-tourist field, the classification accuracy of the method is improved by 4 percent compared with the text classification effect of the improved TFIDF feature weigh method.
Owner:KUNMING UNIV OF SCI & TECH

Chinese text classification method based on super-deep convolution neural network structure model

The invention provides a Chinese text classification method based on a super-deep convolution neural network structure model. The method comprises the steps of collecting a training corpus of a word vector from the internet, combining a Chinese word segmentation algorithm to conduct word segmentation on the training corpus, and obtaining a word vector model; collecting news of multiple Chinese news websites from the internet, and marking the category of the news as a corpus set for text classification, wherein the corpus set is divided into a training set corpus and a test set corpus; conducting word segmentation on the training set corpus and the test set corpus respectively, and then obtaining the word vectors corresponding to the training set corpus and the test set corpus respectively by utilizing the word vector model; establishing the super-deep convolution neural network structure model; inputting the word vector corresponding to the training set corpus into the super-deep convolution neural network structure model, and conducting training and obtaining a text classification model; inputting the Chinese text which needs to be sorted into the word vector model, obtaining the word vector of the Chinese text which needs to be classified, and then inputting the word vector into the text classification model to complete the Chinese text classification.
Owner:HEBEI UNIV OF TECH

Abnormal information text classification method based on knowledge graph

The invention provides an abnormal information text classification method based on a knowledge graph. According to the method, first, a domain knowledge graph is constructed, and an entity identifierand an entity link based on the domain knowledge graph are constructed; second, text feature representation vectors v<text> and entity feature representation vectors v<ent> are constructed; and last,the text feature representation vectors and the entity feature representation vectors are merged to obtain new text representation vectors v<merge> fusing knowledge features, classified training is performed on the new text representation vectors, and a final classification result is obtained.
Owner:BEIHANG UNIV +1

Text label extracting method and device

The invention relates to a text label extracting method. The text label extracting method comprises the following steps: category prediction is performed on a to-be-extracted text through a text categorization model, and a target category of the text is obtained; topic prediction is performed on the to-be-extracted text through a topic clustering model, and a predicted topic is obtained; if the predicted topic is in a default topic set, a target topic corresponding to the predicted topic is acquired, keyword extraction is performed on the to-be-extracted text, target keywords of the text are obtained, and the target category, the target topic and the target keywords are taken as labels of the text. The text labels have different levels to meet multi-granularity retrieval requirements, and multi-granularity recommended articles can be provided according to different labels. Besides, the invention provides a text label extracting device.
Owner:SHENZHEN TENCENT COMP SYST CO LTD

Video classification method and device and server

ActiveCN109359636AFully consider the characteristics of different dimensionsImprove accuracySemantic analysisVideo data clustering/classificationText categorizationClassification methods
The invention discloses a video classification method and device and a server. The method comprises the following steps of: obtaining a target video; The image frames in the target video are classified by the first classification model, and the image classification result is obtained. The first classification model is used for classification based on the image features of the image frames. The audio in the target video is classified by the second classification model, and the audio classification result is obtained. The second classification model is used to classify the audio based on the audio features. The text description information corresponding to the target video is classified by the third classification model, and the text classification result is obtained. The third classification model is used to classify the text information based on the text characteristics of the text description information. According to the image classification results, audio classification results andtext classification results, the target video target classification results are determined. In the present application, image features, audio features and text features are integrated for classification, and features of different dimensions of the video are fully considered, thereby improving the accuracy of the video classification.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Multi-feature-fusion Chines-text classification method based on Attention neural network

A solution of the invention discloses a multi-feature-fusion Chines-text classification method based on Attention neural network, and belongs to the field of natural language processing. In order to further improve accuracy of Chinese-text classification, the method fully exploits features of text data under three different sizes of convolution kernel granularity through fusing three CNN paths; interconnections among the text data are manifested through fusing an LSTM path; and in particular, relatively important data features are enabled to play a greater role in a Chinese-text class recognition process through merging a provided Attention algorithm model, and thus recognition ability of a model on Chinese text classes is improved. Experiment results show that compared with a CNN model, an LSTM structure model and a combined model of the two parts under the same experiment conditions, the model provided by the invention is significantly improved in Chinese-text classification accuracy, and can be better applied to the Chinese-text classification field with high requirements on the classification accuracy.
Owner:HAINAN NORMAL UNIV

Category based, extensible and interactive system for document retrieval

An integrated, automatic and open information retrieval system comprises an hybrid method based on linguistic and mathematical approaches for an automatic text categorization. It solves the problems of conventional systems by combining an automatic content recognition technique with a self-learning hierarchical scheme of indexed categories. In response to a word submitted by a requestor, said system retrieves documents containing that word, analyzes the documents to determine their word-pair patterns, matches the document patterns to database patterns that are related to topics, and thereby assigns topics to each document. If the retrieved documents are assigned to more than one topic, a list of the document topics is presented to the requestor, and the requestor designates the relevant topics. The requestor is then granted access only to documents assigned to relevant topics. A knowledge database linking search terms to documents and documents to topics is established and maintained to speed future searches. Additionally, new strategies are presented to deal with different update frequencies of changed Web sites.
Owner:COGISUM INTERMEDIA

Method and device for text classification

An embodiment of the invention discloses a method and a device for text classification. The method comprises: acquiring an affective characteristic word from an input text; acquiring an affective aptitude degree of the affective characteristic word according to a synonym storehouse constructed in advance; and classifying the text according to the affective aptitude degree of the affective characteristic word. The embodiment of the invention is used to acquire the affective aptitude degree of the affective characteristic word in the text for text classification according to the synonym storehouse constructed in advance and improves the accurate degree of judging the affective aptitude degree of the words.
Owner:HUAWEI TECH CO LTD

Natural language processing-based multi-language analysis method and device

The invention discloses a natural language processing-based multi-language analysis method and device. The method comprises the following steps of: selecting to input a natural language text information language category through a language detection training model; obtaining word embedding expression information of corresponding words which can be recognized by a computer through a trained word vector model, and extracting a keyword of the obtained word embedding expression information through a TF-IDF manner; calculating an article vector and a category vector of each preset category according to the keyword and a keyword weight, and calculating a similarity between an article of natural language text information and each preset category so as to determine a text classification result ofthe natural language text information; and inputting the word embedding expression information of the natural language text information into a trained convolutional neural network and a parallel-framework text emotion analysis model of a bidirectional gate circulation unit, and obtaining a final emotion tendency value through calculation. According to the method and device, the problem that traditional multi-language analysis method needs to know domain knowledges of related linguistics and needs plenty of manpower to carry out operation is solved.
Owner:北京百分点科技集团股份有限公司

Text information analysis apparatus and method

Text information analysis apparatus arranges a plurality of texts according to the content of each text. In the text information analysis apparatus, a category decision unit classifies text to one of a plurality of predetermined categories. A cluster generation unit clusters texts having similar contents from the plurality of texts. A control unit controls the category decision unit and the cluster generation unit to simultaneously execute a category decision and clustering for the plurality of texts.
Owner:KK TOSHIBA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products