Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

981 results about "Chinese word" patented technology

Deep learning-based blog text abstract generation method

ActiveCN106980683AIntuitive and effective understandingSkip the cumbersome processSpecial data processing applicationsNeural learning methodsEncoder decoderDeep level
The invention discloses a deep learning-based blog text abstract generation method. The method comprises the following steps of: crawling blog data; preprocessing the crawled blog data and selecting blog text data; converting the selected blog text data into vector matrix data according to a Chinese word vector dictionary; constructing a deep learning encoder-decoder model, separately training an encoder and a decoder of the model, and connecting the encoder and the decoder for use after the training is completed; and repeating the steps S01 to S03 to obtain generated data, and generating a predicted abstract from the generated data through the trained model. According to the method, text abstracts of blogs are automatically generated on the basis of a deep learning frame encoder-decoder, and deeper semantic relation of the blogs can be obtained at the same time. The generated text abstracts can visually display the main content of the current blog, so that the text abstracts have wide application prospect.
Owner:SUZHOU INST FOR ADVANCED STUDY USTC

Chinese text classification method based on super-deep convolution neural network structure model

The invention provides a Chinese text classification method based on a super-deep convolution neural network structure model. The method comprises the steps of collecting a training corpus of a word vector from the internet, combining a Chinese word segmentation algorithm to conduct word segmentation on the training corpus, and obtaining a word vector model; collecting news of multiple Chinese news websites from the internet, and marking the category of the news as a corpus set for text classification, wherein the corpus set is divided into a training set corpus and a test set corpus; conducting word segmentation on the training set corpus and the test set corpus respectively, and then obtaining the word vectors corresponding to the training set corpus and the test set corpus respectively by utilizing the word vector model; establishing the super-deep convolution neural network structure model; inputting the word vector corresponding to the training set corpus into the super-deep convolution neural network structure model, and conducting training and obtaining a text classification model; inputting the Chinese text which needs to be sorted into the word vector model, obtaining the word vector of the Chinese text which needs to be classified, and then inputting the word vector into the text classification model to complete the Chinese text classification.
Owner:HEBEI UNIV OF TECH

Text similarity computing method

The invention discloses a text similarity computing method. The method comprises the following steps of text representation and text similarity computing. The aim of text representation is that a text document of product description is converted into a vector for description. In the text similarity computing method, natural language processing technologies such as Chinese words segmentation, stop word removing, word frequency statistics and the like are used for converting all the description texts of products into vectors; the text similarity is computed by a method based on a Hamming distance, and the other advantage of the Hamming distance is that the computing speed is very high. Due to the fact that the method of statistical machine learning is used, so that the text similarity computing method is more stable and effective compared with a method based on rules.
Owner:DALIAN LINGDONG TECH DEV

Text-based query expansion and sort method in image retrieval

InactiveCN101901249AGuaranteed a high degree of commonalityImprove accuracySpecial data processing applicationsData setImage retrieval
The invention belongs to the field of multimedia information retrieval and relates to a method for realizing thesaurus-based query expansion and sort in image retrieval. The method comprises a WordNet-based English word semantic similarity metric algorithm, a HowNet-based Chinese word semantic similarity metric algorithm, an expansion rule-based query expansion word selection and optimization algorithm and a retrieval result evaluation and optimization algorithm. In the method, an image search engine is improved by the relevant text processing method and the relevant semantic network dictionary; and the retrieval result is sorted through semantic expansion, user interaction and improved similarity measurement. Compared with the traditional method, the method has the advantages of high accuracy rate, high integrality and low space-time cost. The method has very important significance for performing high-efficiency image retrieval according to image high-layer semantic information and on the basis of a large-scale image data set, and has wide application value in the field of cross-linguistic and cross-media retrieval.
Owner:FUDAN UNIV

Dependency semantic-based Chinese unsupervised open entity relationship extraction method

The invention relates to a dependency semantic-based Chinese unsupervised open entity relationship extraction method. The method comprises the following steps of preprocessing an input text: performing Chinese word segmentation, part-of-speech tagging and dependency grammar analysis on the input text; performing named entity identification on the input text; arbitrarily selecting two entities from identified entities to form candidate entity pairs; searching for a dependency path between two entities in the candidate entity pairs; and analyzing whether a syntactic structure mapped by the dependency path is matched with a normal form of a dependency semantic normal form set or not, if yes, extracting words or phrases from the residual part of the input text according to the matched normal form to serve as relational words, forming a relational triple by the extracted relational words and the candidate entity pairs, and if not, performing normal form matching of a next group of the candidate entity pairs; and outputting the relational triple. Compared with the prior art, the method has the advantages that the calculation complexity is low; the extraction efficiency is high; distance position limitation is overcome; a simple sentence also can be extracted and the like.
Owner:TONGJI UNIV

Industry comment data fine grain sentiment analysis method

The invention relates to an industry comment data fine grain sentiment analysis method. The industry comment data fine grain sentiment analysis method is applied to Internet data analysis and comprises obtaining comment data of e-commerce industry goods and preprocessing the comment data; establishing initial industry sentiment word libraries and computing distribution of words under different sentiment polarities through 1-gram and 2-gram; performing Chinese word segmentation on the comment data; based on the sentiment word libraries established through the 1-gram and the 2-gram, utilizing combined sentiment models to perform word modeling to obtain the probability distribution of the words which belong to different topics under different sentiment distributions; utilizing context information to re-determine the sentiment alignment of sentiment words in sentences; performing named entity identification and extracting comment characteristics through conditional random fields to compute the sentiment alignment of comment words of the comment characteristics. The industry comment data fine grain sentiment analysis method computes the sentiment of the comment words through the two dimensions of topic and sentiment to achieve fine grain sentiment analysis on the industry comment data, thereby achieving high precision and interpretability of analysis results.
Owner:中科嘉速(北京)信息技术有限公司

Chinese network review emotion classification method based on integrated study frame

The invention discloses a Chinese network review emotion classification method based on an integrated study frame. According to the method, a part-of-speech combination mode, an order-preserving sub-matrix mode and a frequent word sequence mode are adopted as input characteristics, in the level of characteristics, factors of the influence of Chinese word order information, interval phrase characteristics and the sentence length are considered, and the characteristic vector sparsity problem is solved through semantic similarities; the problem that many review text characteristics exist is solved, the inter-base-classifier independence is guaranteed, and the classification performance of base classifiers is improved as much as possible; a base classifier algorithm constructed based on product attributes is adopted to comprehensively review emotion information of each attribute in a text, and then the sentence-level emotional tendency of reviews is judged, so that a final classification result is more accurate. The Chinese network review emotion classification method based on the integrated study frame is applicable to e-commerce network review emotion classification in various fields, can make a potential consumer know evaluation information of a commodity before purchase and can also make a merchant better sufficiently know the consumer's opinion, and therefore the service quality is improved.
Owner:NANJING SILICON INTELLIGENCE TECH CO LTD

Method and system for extracting Chinese event

The invention provides a method and a system for extracting a Chinese event. The method comprises the following steps of: performing phrasing, word-splitting, entity identification and analysis for syntax and dependence relationship on a text with a to-be-extracted event in turn; marking the words meeting an extracting condition as candidate triggering words, according to internal structures of the words; filtering the triggering words meeting a filtering condition according to the probability, the word class and the internal structures of the words; extracting the triggering words by utilizing the maximum entropy identifying model and obtaining the reliability of each of the triggering words; dividing the triggering words into a consistency processing training set and a consistency processing testing set according to the reliability of each of the triggering words; utilizing a maximum entropy classifier to extract the triggering words from the consistency processing testing set; and utilizing a maximum entropy classifying model to classify the triggering words, thereby obtaining an event set. According to the method and the system provided by the invention, started from the characteristics of Chinese, the internal structures of Chinese words and the semantic consistency of the Chinese words in sections and chapters are comprehensively considered and analyzed, so that the property of extracting the Chinese event is increased.
Owner:平江县鑫晟信息科技有限公司

Short text clustering and hotspot theme extraction method based on TF-IDF characteristics

The invention discloses a short text clustering and hotspot theme extraction method based on TF-IDF characteristics. The method includes the following steps of firstly, conducting Chinese word segmentation on short text samples, and screening out high-frequency vocabularies; secondly, automatically conducting TF-IDF characteristic extraction and generation on each short text sample on the basis of the screened-out high-frequency vocabularies, and establishing a whole sample characteristic vector spatial model; thirdly, reducing spatial dimensions of the samples through singular value decomposition (SVD); fourthly, clustering the short text samples through the combination of the cosine law and the k-means method, and finding potential hotspot themes in each cluster through a visual analysis means. By means of the method, the characteristic selection problem, the sample control dimension reduction problem and the clustering problem of short texts can be well solved; meanwhile, visual analysis on the clustering result can be achieved by means of the visual technology; finally, extraction and analysis are conducted on hotspot themes.
Owner:TIANJIN UNIV

Construction and utilization method for context-aware dynamic word or character vector on the basis of deep learning

The invention belongs to the technical field of the natural language processing of computers, in particular to a construction and utilization method for a context-aware dynamic word or character vector on the basis of deep learning. The dynamic construction method for the context-aware dynamic word or character vector on the basis of the deep learning comprises the following steps of: in massive texts, through an unsupervised learning method, simultaneously learning a global feature vector of a word or character and the feature vector representation of the global feature vector when a specific context appears, and combining the global feature vector with the context feature vector, and dynamically generating word or character vector representation. By use of the method, the word or character vector dynamically constructed on the basis of the context can be applied to a natural language processing system. The method is mainly used for solving a problem that the word or character vector expresses different meanings in different contexts, i.e. the problem that one word or one character has multiple meanings can be solved. The dynamic word or character vector can be used for obviously improving the performance of various natural language processing tasks of different languages, wherein the tasks comprise Chinese word segmentation, part-of-speech tagging, naming recognition, grammatical analysis, semantic role tagging, sentiment analysis, text classification, machine translation and the like.
Owner:FUDAN UNIV

Chinese word segmentation method based on two-way LSTM, CNN and CRF

The invention discloses a Chinese word segmentation method based on two-way LSTM, CNN and CRF which improves and optimizes traditional Chinese word segmentation base on deep learning algorithm. The method comprises following specific steps: preprocessing the initial corpus, extracting corpus character feature information and pinyin feature information corresponding to characters; using the convolutional neural network to obtain pinyin feature information vector of the characters; using the word2vec model to obtain the character feature information vector of text; splicing pinyin feature vectors and character feature vectors to obtain context information vectors and put the context information vectors to a bidirectional LSTM neural network; decoding the output of the bidirectional LSTM using the linear chain condition random field to obtain the word segmentation sequence; decoding the word segmentation label sequence to obtain word segmentation results. The invention utilizes the deep neural network to extract text character features and pinyin features and combines the conditional random field decoding, can effectively extract Chinese text features and achieve good effect on Chinese word segmentation tasks.
Owner:NANJING UNIV OF POSTS & TELECOMM

Naive Bayesian classification based mobile phone spam short message filtering method and system

The invention provides a Naive Bayesian classification based mobile phone spam short message filtering method and system. The system comprises a message intercepting module, a cache, a blacklist filtering module, a keyword filtering module and an intelligent Naive Bayesian classification filtering module. The message intercepting module is used for intercepting newly received short messages; the blacklist filtering module is used for filtering the new short messages according to a preset blacklist; the keyword filtering module is used for filtering the new short messages on the basis of preset keyword pairs; the intelligent Naive Bayesian classification filtering module is used for calculating probability that whether the new short messages are spam short messages or not by adopting a Naive Bayesian algorithm on the basis of a pre-trained feature word bank, and judging the new short messages as the spam short messages if the probability ratio exceeds a preset threshold, and as normal short messages otherwise. By the Naive Bayesian classification based mobile phone spam short message filtering method and system, through combination of the blacklist, the keywords, Naive Bayesian classification technology and Chinese word segmentation technology, the short messages are judged whether to be the spam short messages or not intelligently, so that the spam short messages are filtered.
Owner:青岛腾信汽车网络科技服务有限公司

Chinese word segmentation based text similarity identifying method and device

An embodiment of the invention discloses a Chinese word segmentation based text similarity identifying method. The method is characterized by including: filtering unidentified and meaningless characters in texts in given coded format and obtaining preprocessed texts; segmenting words of the preprocessed texts according to a preset word segmenting mode; selecting characteristic words in words obtained from word segmentation according to preset policy; sequencing the selected characteristics words to obtain a special character string, and calculating characteristic values of the texts according to the special character string; and determining similarity of the texts by comparing the characteristic values of the texts. The embodiment of the invention further discloses a Chinese word segmentation based text similarity identifying device. By the Chinese word segmentation based text similarity identifying method and device, identifying complexity can be reduced, identifying efficiency can be improved, and higher identifying correct rate can be achieved.
Owner:SHENZHEN TENCENT COMP SYST CO LTD

Self-adaptive Chinese word segmentation method based on embedded representation

The embodiment of the invention discloses a self-adaptive Chinese word segmentation method based on embedded representation and belongs to the field of information processing. The method is characterized in that an embedded representation layer of a character is shared by a word segmentation network and a character language model. As for embedded representation of the character, on the one hand, hidden multi-granularity local features of a to-be-segmented text is obtained by means of the word segmentation network based on convolutional neural network; then label probability of the character is obtained through a forward network layer; finally, label inference is used to obtain the optimum segmentation result in the sentence level; on the other hand, an unlabelled text is randomly extracted, a character next to the character is predicted by means of a character language model based on a long- and short-term memory unit (LSTM) recurrent neural network and the word segmentation network is constrained. By modeling a character co-representing relationship in texts in different fields by means of the character language model and transferring information to the word segmentation network by means of embedded representation, the field transfer ability of word segmentation is enhanced, and the method has very huge practical value.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Method for matching Chinese similarity

The invention provides a method for matching Chinese similarity. An edit distance formula and a keyboard fingering rule are used to obtain the edition similarity of the corresponding pinyin of Chinese, namely, whether the Chinese and the pinyin are easily mixed up during edition is reflected; the pronunciation rules of the initial consonant and the final sound of Chinese characters are used for obtaining the initial consonant similarity and the final sound similarity of character strings; and common fuzzy tones in dialects or common pronunciation are combined to calculate the pronunciation similarity among character strings. Because the Chinese character pattern is one of the most important characteristics of Chinese, character pattern coding namely the Five-stroke Method coding is used for calculating the character pattern similarity among character strings; information is collected and calculated at the same time for updating data; and the above similarities are combined to obtain the whole similarity of Chinese word, various factors, such as Chinese spelling custom, user input custom, keyboard layout, mandarin pronunciation rules, dialects, common wrong pronunciation, Chinese character patterns and the like are fully considered, the statistical regularity is combined, and the similarity among Chinese words is comprehensively evaluated.
Owner:TSINGHUA UNIV

Self-help intelligent uprightness searching method

InactiveCN101114294AIn line with "taste"Modify the search processSpecial data processing applicationsHabitNetwork data
The present invention relates to a self-service intelligent vertical search method and includes the following steps: cookies files of users, registered information, historical search information and ordered attention module can be utilized to study preferences of users, and the preferences of the users are set as statistical models of the users which are real-timely stored to a database of a search engine dynamically. A final key word / words collection file can be obtained by studying high speed Chinese word segmentation and search habits of users. The search engine can search all the information that is relevant to the inertial key sentence / word through a network database. Meanwhile, the search results are matched with the statistical models of the users and the search results which fit for the preferences of the users can be returned to the users. The present invention has the beneficial effects that the users can find out the needed information from the huge information collection and study using preferences and habits of the users actively, and then the search results can cater to ''tastes'' of the users more and the users make judgment to the values of the search results totally.
Owner:HANGZHOU JOINVC HLDG

Dictionary-based lucene Chinese word segmentation method

The invention discloses a dictionary-based Chinese word segmentation method. The method comprises the steps of collecting linguistic data; establishing a terminological dictionary, wherein the establishing method comprises the steps of removing stop words firstly, dividing the linguistic data into text fragments, exacting candidate words from the text fragments, obtaining the appearance probability of the candidate words and each individual character in all the text fragments through statistics, calculating the mutual information of two Chinese characters in each candidate word, keeping the candidate words if mutual information is larger than a preset mutual information threshold value, deleting the candidate words otherwise, combining the candidate words obtained after screening, matching and filtering the combined candidate words by means of a general dictionary, and adding the candidate words obtained after filtration into the terminological dictionary; conducting word segmentation on a text with words to be segmented by means of the terminological dictionary firstly, and then conducting word segmentation on the rest of texts by means of the general dictionary. The terminological dictionary is established by extracting terminologies from the linguistic data through statistics, universality is high, and requirements of the professional field can be effectively met by conducting word segmentation with the terminological dictionary.
Owner:成都天府云数信息技术有限公司

Chinese word stock automatic generation method based on writing style modeling

The invention discloses a Chinese word stock automatic generation method based on writing style modeling, and the method comprises the steps: carrying out the stroke track automatic extraction and wrong result screening operation of inputted handwritten Chinese character image, and obtaining training data; carrying out the learning and modeling of handwritten Chinese character stroke shape and style and stroke inter-frame structure style through a neural network, and obtaining handwriting stroke connection characteristics and stroke end part contour handwriting features through statistical analysis, and then generating a high-quality Chinese character form consistent with the handwriting style of a user. The method just needs a few of commonly-used Chinese characters as the input, does not need any manual intervention, can automatically generate a vector Chinese word stock comprising a large number of Chinese character patterns, can quickly and automatically generates a handwriting Chinese word stock with other handwriting style for a user, can remarkably improves the manufacturing efficiency of the Chinese word stock, and greatly reduces the production cost.
Owner:PEKING UNIV

Chinese word segmentation

The present invention relates to a corpus for use in training a language model. The corpus includes a plurality of characters and a plurality of morphological tags associated with a plurality of sequences of characters. The plurality of morphological tags indicate a morphological type of an associated sequence of characters and a combination of parts forming a morphological subtype.
Owner:MICROSOFT TECH LICENSING LLC

Microblog-oriented dynamic topic detection and evolution tracking method

The invention provides a microblog-oriented dynamic topic detecting and evolution tracking method and belongs to the technical field of intelligent information processing. The method includes the steps of 1, establishing a distributed crawler to acquire microblog data; 2, pre-processing the microblog data; 3, performing Chinese word segmentation to remove stop words, and acquiring a word set VOC; 4, subjecting the microblog data to LDA (latent Dirichlet allocation) clustering in each time interval so as to extract latent topics; 5, screening out microblog hot topics in each time interval; 6, subjecting the hot topics of a global time to hierarchical clustering to acquire inter-topic aggregation and differentiation relations; 7, visualizing a topic evolution process according to the inter-topic aggregation and differentiation relations. The method has the advantages such that topic word distribution of an event in different times and a fine-grained topic of a same topic in different times are mined under low time complexity, efficiency is high, and robustness is high; the method has greater practical value.
Owner:中科明远(北京)并行软件有限公司

Chinese word segmentation method and system

The invention discloses a Chinese word segmentation method, which comprises the following steps of: performing word segmentation on a Chinese text according to word semantics, segmenting ambiguous fields and outputting a first text string taking words as units; and identifying and combining Chinese names in the first text string to generate a second text string taking words as units. The ambiguous fields are segmented by combining a dictionary rule method with a statistical method; and the ambiguous fields are segmented and the names are identified by word standard a maximum entropy model in the statistical method. The invention also discloses a Chinese word segmentation system, which comprises a word segmentation module, a name identification module and the like. The method and the system improve word segmentation efficiency and accuracy.
Owner:BEIJING FEINNO COMM TECH

Microblog-based neologism emotional tendency judgment method

The invention relates to a microblog-based neologism emotional tendency judgment method, belonging to the field of natural language processing. The microblog-based neologism emotional tendency judgment method disclosed by the invention comprises the following steps: dividing words of microblog corpuses through a Chinese word division tool, blocking the corpuses, the words in which are divided, by taking stop words in a word division result as a division point, pairwise combining adjacent word strings in each block, calculating the combined word string frequency, and taking the word strings, the frequencies of which are higher than a threshold value, as neologism candidate strings; filtering the neologism candidate strings according to a word formation rule of Chinese linguistics and an adjacent change number rule so as to obtain neologisms; calculating the similarity between co-occurrence words and hownet emotional words by utilizing an emotional dictionary of a hownet; calculating the relevancy between the neologisms and the co-occurrence words; constructing an image model; and obtaining the emotional polarity distribution of the neologisms by utilizing a label propagation algorithm, and obtaining the emotional tendency of the neologisms by constructing a linear classifier. By means of judgement of the emotional tendency of the neologisms, a blogger can express views better; and furthermore, the emotional tendency of the blogger can be accurately known by users.
Owner:KUNMING UNIV OF SCI & TECH

Methods and systems for splitting a chinese character sequence into word segments

Systems, methods and machine readable medium including machine readable code for splitting a Chinese character sequence into word segments are disclosed. A synchronization list including a plurality of Chinese words is provided. An input data string including a Chinese character sequence is received and one of the plurality of Chinese words from the synchronization list is identified in the Chinese character sequence. The identified Chinese word is defined as a word segment in the Chinese character sequence. An undefined character sequence is identified in the Chinese character sequence. The undefined character sequence is segmented into at least one word segment.
Owner:MICRO FOCUS LLC

Address matching method based on semantic recognition

The invention discloses an address matching method based on semantic recognition. The method uses an address matching engine and a log analysis engine. The address matching engine comprises an administrative division semantic module, a place name class semantic module, a standard address module, a semantic rule module, a Chinese word segmentation module, a semantic recognition module, and a search module. According to the method, matching addresses are quickly and accurately searched through semantic recognition according to user-entered addresses under research, search results are returned to uses in the form of online services, the log analysis engine records, analyzes and searches logs, and the address matching engine is optimized according to the log analysis results.
Owner:吉奥时空信息技术股份有限公司

Chinese text verification system and method based on Chinese vague pronunciation and voice recognition

The invention discloses a Chinese text verification system and method based on Chinese vague pronunciation and voice recognition. The system comprises a voice collecting and processing module, a voice recognition module and a text verifying and sharing module, wherein the voice collecting and processing module is used for collecting an audio and compressing and denoising the audio, the voice recognition module is used for recognizing voices into a text, and the text verifying and sharing module is used for achieving text verification and meanwhile supports text editing and sharing. The method comprises the steps that Chinese error judgment rules based on parts of speech are defined; word segmentation is carried out on the Chinese text obtained after voice recognition; the segmented words are scanned according to the Chinese error judgment rules to find out wrong Chinese words; a vogue pronunciation table is defined based on Chinese vogue pronunciation rules; all vogue pinyin of the wrong words is found out through a Cartesian product mode; a dictionary table is inquired to obtain a candidate word set of all the vogue pinyin; a candidate error correction set is selected out of words of the candidate set of all the vogue pinyin according to a word frequency rank. By means of the Chinese text verification system and method based on Chinese vague pronunciation and voice recognition, Chinese errors, caused by Chinese vogue pronunciation, in voice recognition are eliminated, and the error correction accuracy rate of a verification algorithm is effectively increased.
Owner:HOHAI UNIV

Internet forum-oriented opinion leader mining method

The invention discloses an Internet forum-oriented opinion leader mining method. An opinion leader mining system is involved in the method and comprises a computing center and a database server which communicates with the computing center. The method comprises the following steps of: capturing forum data by using a crawler, and improving data processing real-time property by using message-oriented middleware (MOM); extracting web page information, performing word segmentation by using a Chinese word segmentation system, and filtering spam comments by a spectral clustering method; analyzing text tendency by using an emotional corpus; setting a selection standard value of an opinion leader, and determining the opinion leader; and visualizing a result. By the method, the opinion leader in a forum can be accurately mined, and technical support is provided for related Internet public opinion supervision departments to timely find hot issues and guide the healthy development of Internet public opinions.
Owner:NAT UNIV OF DEFENSE TECH

Rubbish article classification method based on distributed feature representation of text

The present invention discloses a rubbish article classification method based on distributed feature representation of text. The method comprises: performing word segmentation on article text by using a Chinese word segmentation algorithm based on a dictionary and a statistical strategy; using a Skip-Gram model based on a Negative-Sampling algorithm in word2vec to select a support vector machine of a linear kernel; and training text vectors of the article to acquire an article classification model of SVM. The correct rate of article category discrimination is obviously improved, and thus the accuracy of article category discrimination is greatly improved.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Cryptogram-based safe full-text indexing and retrieval system

The invention discloses a cryptogram-based safe full-text indexing and retrieval system. In the system, a cryptogram index library comprises a cryptogram entry reverse index and an internal document object set; a cryptogram document library is responsible for storing and managing an encrypted XML document; a word segmentation encryption server carries out Chinese word segmentation on a plaintext document and encrypts the plaintext document item by item; a cryptogram full-text indexing server standardizes an original plaintext document into an XML document, encrypts and stores the XML document in the cryptogram document library, creates a corresponding internal document object in the cryptogram index library by combining document metamessage, and creates a cryptogram reverse index for the XML document through the cryptogram entry; and a cryptogram full-text retrieval server retrieves the cryptogram index library to obtain the internal document object set through user authority information and the cryptogram entry, obtains a corresponding encrypted XML document result set from the cryptogram document library according to a pointer, decrypts the corresponding encrypted XML document result set, and returns the decrypted corresponding encrypted XML document result set to a user. The Chinese word segmentation method, the safe and high-efficiency indexing structure and the retrieval mechanism of the invention based on the special requirements of cryptogram full-text indexing can realize the cryptogram full-text indexing integrated with an access control strategy. The cryptogram-based safe full-text indexing and retrieval system has the advantages of a safe and high-efficiency indexing process, no decrypted docuterms in the indexing process, a high recall ratio and a high precision ratio in a cryptogram environment, and the like.
Owner:HUAZHONG UNIV OF SCI & TECH

Abstract service logic-based interactive semantic Web service dynamic combination method

The invention discloses an abstract service logic-based interactive semantic Web service dynamic combination method. The method comprises the following steps of: separating abstract service description from specific semantic Web service instances by establishing a resource service mapping model, performing unified semantic description on Web services by using an ontology web language for services (OWL-S), realizing static binding with lower-layer services, and providing a unified interface for the service combination; performing abstract description on the combined process through combined service process modeling and providing a combined template; finishing intelligent search and match of the services by combining Chinese word segmentation technology with an ontology-based conception matching algorithm under the support of a domain ontology and a professional thesaurus; and analyzing a timing sequence relationship and a control relationship of nodes in the combined process to realize dynamic binding and executing of the services, and constructing a service combination engine.
Owner:BEIHANG UNIV

A Neural Network Mongolian-Chinese Machine Translation Method Based on Encoder-Decoder

Neural Network Mongolian-Chinese Machine Translation Method Based on Encoder-Decoder is provided. The method comprises the following steps of using an encoder e and two-layer decoders d1 and d2, encoding the Mongolian source language into a vector list by the encoder E, Then, at the hidden layer of the encoder, adopting a retrospective step with attention mechanism, In the decoding process, obtaining the implied state before softmax and the draft sentence by the decoder D1, and then taking the implied state of the encoder E and the decoder D1 as the input of the decoder D2 to obtain the secondchannel sequence, i.e. The final translation. At first, that Chinese corpus is divided into words in the preprocess stage, The Mongolian-Chinese bilingual corpus is segmented into stem, affixes and cases, and the Mongolian-Chinese bilingual corpus is segmented into word segments (BPE), which can effectively refine the translation granularity and reduce the number of unknown words, and then the Mongolian-Chinese word vector is constructed by Word2vec. For unknown words, a Mongolian-Chinese dictionary of proprietary vocabulary is also constructed, which can effectively improve the quality of translation.
Owner:INNER MONGOLIA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products