Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

323 results about "Vector space model" patented technology

Vector space model or term vector model is an algebraic model for representing text documents (and any objects, in general) as vectors of identifiers, such as, for example, index terms. It is used in information filtering, information retrieval, indexing and relevancy rankings. Its first use was in the SMART Information Retrieval System.

Three-folded webpage text content recognition and filtering method based on the Chinese punctuation

A method based on Chinese website punctuation triple recognition and text content filtering. The method based on existing URL, the website information keywords in the method of filtration - filtration rate and the low rate of filtration of the whole problem, Bringing on a method for composite based on the URL and on keywords, as well as text-based knowledge representation method of vector space website text content filtering. Applying to a method Based on black-and-white list of URL filtering and Chinese punctuation statistical characteristics to effectively remove navigation information, relevant linked information, advertising linked information, copyright information and other Web content noise information to extract content of text; adopting vector space model text knowledge representation, By calculating vector text template and unhealthy information in the feature vector cosine angle, and set the threshold, compared to the text of the class. The invention can be widely used in the filtering of undesirable information network and website personalized information services.
Owner:DALIAN UNIV OF TECH

Method for mining data in construction regulation field based on associative regulation mining technology

The invention discloses a method for mining data in construction regulation field based on associative regulation mining technology; 1. a construction regulation text vector space model is generated, 2. a construction regulation data vector space model is generated, 3. the construction regulation data vector space model is subject transposition to generate a construction regulation data feature vector space model, namely, a frequent feature set is generated, and 4. construction regulation data association degree is calculated and an association rule is output. The method can mine the data in construction regulation field, provides higher recall ratio for a user inquiring data, recommends associative query contents, and solves the technical problem that the existing association analysis technologies can not carry out association analysis on outlier data.
Owner:XI'AN UNIVERSITY OF ARCHITECTURE AND TECHNOLOGY

Network hot event detection method based on text classification and clustering analysis

The invention discloses a network hot event detection method based on text classification and clustering analysis. The method solves the problem that the efficiency and accuracy rate of the existing network hot event detection method based on clustering analysis need to be improved. The method comprises the steps that feature words are respectively selected for various classes of files through feature extraction and feature selection by utilizing a training corpus; each training text and test text are represented as vectors in all of the feature spaces by utilizing a vector space model method, and the weight of each dimension of the vectors is determined by utilizing a TF-IDF (term frequency-inverse document frequency) method, and then each test text is classified; the classified test texts in different classes are respectively subjected to clustering analysis, so the hot cluster of each class is obtained, the feature word representing the hot event is obtained through further analysis, and then the word property and other aspects of each feature word are analyzed; the description of each hot event is generated by utilizing relevant language knowledge and necessary linguistic organization. With the network hot event detection method based on text classification and clustering analysis, the detection efficiency and accuracy rate of hot events can be effectively improved.
Owner:NANJING UNIV OF POSTS & TELECOMM

Multi-model fused short text classification method

The invention discloses a multi-model fused short text classification method. The multi-model fused short text classification method comprises a learning method and a classification method. The learning method comprises the following steps: carrying out word segmentation and filtration on short text training data to obtain a word set; calculating the IDF value of each word; calculating the TFIDF values of all the words and constructing a text vector VSM; and carrying out text learning on the basis of a vector space model, and constructing an ontology tree model, a keyword overlapping model, a naive Bayesian model and a support vector machine model. The classification method comprises the following steps: carrying out word segmentation and filtration on a to-be-classified short text; generating a text vector on the basis of the support vector machine model; respectively classifying by using the ontology tree model, the keyword overlapping model, the naive Bayesian model and the support vector machine model to obtain single model classification results; and fusing the single model classification results to obtain a final classification result. According to the method disclosed in the invention, multiple classification modes are fused and the short text classification correctness is improved.
Owner:XI AN JIAOTONG UNIV

Quick multi-keyword semantic sorting search method for protecting data privacy in cloud computing

The invention relates to a quick multi-keyword semantic sorting search method for protecting data privacy in cloud computing. A domain weighted scoring concept is introduced in document scoring, and keywords in different domains such as a title, an abstract and the like are endowed with different weights to be distinguished; a retrieval keyword is subjected to semantic expansion, semantic similarity is calculated, a three-factor sorting method is designed by combining the semantic similarity, the domain weighted scoring and a correlation score, and a cloud server can perform accurate sorting on search results and return a sorting result to a search user; and for the defect of low query efficiency of a searchable encryption scheme, a vector block segmentation marking matching algorithm is designed, and a document vector created by a vector space model is subjected to block segmentation to generate a marking vector with a relatively small dimension number. According to the method, the query efficiency can be improved, the index creation time can be shortened, and semantic ciphertext keyword search is realized.
Owner:FUZHOU UNIV

Text feature extraction method based on categorical distribution probability

The invention discloses a text feature extraction method based on categorical distribution probability. The text feature extraction method based on the categorical distribution probability extracts text feature words by means of the manner according to which categorical distribution difference estimation is carried out on words of a text to be categorized. Mean square error values of probability distribution of each word at different categories are worked out by means of category word frequency probability of the words. A certain number of words with high mean square error values are extracted to form a final feature set. The obtained feature set is used as feature words of a text categorizing task to build a vector space model in practical application. A designated categorizer is used for training and obtaining a final category model to categorize the text to be categorized. According to the text feature extraction method based on the categorical distribution probability, category distribution of the words is accurately measured in a probability statistics manner. Category values of the words are estimated in a mean square error manner so as to accurately select features of the text. As far as the text categorizing task is concerned, a text categorizing effect of balanced linguistic data and non-balanced linguistic data is obviously improved.
Owner:EAST CHINA NORMAL UNIV

LDA (latent dirichlet allocation) and VSM (vector space model) based similar Chinese herb literature recommendation method

ActiveCN103823848AFast and efficient similar recommendationRobustSpecial data processing applicationsLexical itemVector space model
The invention discloses an LDA (latent dirichlet allocation) and VSM (vector space model) based similar Chinese herb literature recommendation method. The method includes: adopting an IKAnalyzer to perform word segmentation on topics and summary information of literature on the basis of a terminological dictionary for Chinese herbs, constructing a vector space, performing dimensionality reduction on the vector space, constructing a semantic dictionary, numbering all lexical items in the dictionary in sequence, performing vectorization through each document on the basis of the semantic dictionary, constructing term vectors of each document, utilizing LDA and a Gibbs sampling algorithm to perform training to obtain probability distribution of each document on themes, then computing a value of similarity between every two documents by the aid of KL divergence, computing cosine similarity of the term vectors of each document on the basis of term frequency, performing joint weighting on the two kinds of similarities prior to performing similarity sorting, and then making recommendation. By the method, the literature, similar both in content and theme, in the Chinese herb literature can be recommended to users, and recommendation results are closer to user requirements.
Owner:ZHEJIANG UNIV

Topic information acquisition method based on network topology

The invention relates to a topic information acquisition method based on network topology. An initial web page set is obtained from a search engine and is expressed as a vector set through purification, word division and removal of stop words, and a vector space model is used to calculate the text similarity. A network structure is utilized to perform linkage analysis to extracted URLs first, the linkage is filtered through directory hierarchies of the URLs, and then the weights of the URLs are modified according to the scaleless property of a network to perform the prior absorption selection. At the same time, unrelated topic areas are feedback, and the lengths of buffer areas of unrelated URLs are set through the distance between the URLs and a seed set. The heat of acquired topics is calculated to select one topic to obtain a new reply.
Owner:BEIJING JIAOTONG UNIV

Natural language intention understanding method in man-machine interaction

The invention discloses a natural language intention understanding method in man-machine interaction.The method comprises the steps that intention labeling is conducted on text natural language instruction data, and each sentence of text is labeled with an intention; the text is vectorized, on the basis of a traditional text vector space model, information of parts of speech of a text instruction is fused, and a new text representation model, namely, a vector space model of the parts of speech is defined; a stackable denoising auto-encoder is applied to natural language instruction intention understanding, and the high-order characteristic of the instruction is extracted; at last, training and prediction are conducted through a support vector machine, and intention understanding of the natural language instruction is achieved.According to the natural language intention understanding method in man-machine interaction, more semantic information in the natural language instruction can be excavated, the recognition rate of intention understanding is increased, the stackable denoising auto-encoder is adopted, random noise is added during the training process, the actual application scene is more approached, and a model obtained from training has higher generalization capacity.
Owner:SHANGHAI JIAO TONG UNIV

Information recommendation method and system combining image content and keywords

The invention discloses an information recommendation method and system combining image content and keywords. The information recommendation method combining image content and keywords comprises the steps that keyword information of images in an image library and image content information containing color features and textural features are extracted, the keyword information and the image content information are expressed as a vector space model, and a corresponding keyword information matrix and an image content information matrix are obtained; the keyword information matrix and the image content information matrix are processed by utilizing a linear sparse model, and a similarity chart is obtained by calculating the similarity among the images; an image similar to a target image is inquired from the similarity chart according to the target image searched by a user, and an original recommendation list is formed; the original recommendation list is arranged to obtain a final recommendation list, and the final recommendation list is displayed.
Owner:TCL CORPORATION

Job recommending method

The invention discloses a job recommending method, and belongs to the technical field of recommending systems. The job recommending method has the advantages that the Matthew effect is avoided, the problem of cold start is solved, and the populations are well utilized to realize personalized recommending. The job recommending method comprises the following steps of obtaining user data and job data; establishing a user preference vector space model and a job vector space model; according to the user preference vector space model and the job vector space model, calculating multi-domain scoring values based on contents, obtaining first scoring values of jobs, and sequencing, so as to obtain a job set; when one job is submitted and belongs to the job set, calculating the scoring valves of the corresponding job based on the similarity of user background information according to the user preference vector space model and the job data, and obtaining second scoring valves of the corresponding job; according to the first scoring valves and the second scoring valves, obtaining the mixed scoring valves of the corresponding job, and sequencing, so as to obtain a recommending list.
Owner:COMMUNICATION UNIVERSITY OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products