Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

33results about How to "Improve word segmentation efficiency" patented technology

Chinese word segmentation method based on two-way LSTM, CNN and CRF

The invention discloses a Chinese word segmentation method based on two-way LSTM, CNN and CRF which improves and optimizes traditional Chinese word segmentation base on deep learning algorithm. The method comprises following specific steps: preprocessing the initial corpus, extracting corpus character feature information and pinyin feature information corresponding to characters; using the convolutional neural network to obtain pinyin feature information vector of the characters; using the word2vec model to obtain the character feature information vector of text; splicing pinyin feature vectors and character feature vectors to obtain context information vectors and put the context information vectors to a bidirectional LSTM neural network; decoding the output of the bidirectional LSTM using the linear chain condition random field to obtain the word segmentation sequence; decoding the word segmentation label sequence to obtain word segmentation results. The invention utilizes the deep neural network to extract text character features and pinyin features and combines the conditional random field decoding, can effectively extract Chinese text features and achieve good effect on Chinese word segmentation tasks.
Owner:NANJING UNIV OF POSTS & TELECOMM

Chinese word segmentation method and system

The invention discloses a Chinese word segmentation method, which comprises the following steps of: performing word segmentation on a Chinese text according to word semantics, segmenting ambiguous fields and outputting a first text string taking words as units; and identifying and combining Chinese names in the first text string to generate a second text string taking words as units. The ambiguous fields are segmented by combining a dictionary rule method with a statistical method; and the ambiguous fields are segmented and the names are identified by word standard a maximum entropy model in the statistical method. The invention also discloses a Chinese word segmentation system, which comprises a word segmentation module, a name identification module and the like. The method and the system improve word segmentation efficiency and accuracy.
Owner:BEIJING FEINNO COMM TECH

Multi-language based word segmentation method and apparatus

The invention provides a multi-language based word segmentation method and apparatus. The method comprises: receiving a to-be-segmented text transmitted by a user, wherein the to-be-segmented text carries a statement separator; according to the statement separator, identifying the language type of each statement in the to-be-segmented text; according to the language type, searching for a corresponding word segmentation method in a pre-stored corresponding relationship between the language type and the word segmentation method; by adopting the word segmentation method corresponding to the language type, performing word segmentation on a statement of the corresponding language type; and outputting a word segmentation result of the to-be-segmented text to the user. According to the multi-language based word segmentation method, uniform word segmentation for applications or texts involving multiple languages can be performed, thereby improving the word segmentation efficiency.
Owner:ORA

A query method for indefinite words and sentences of evaluation documents based on inverted index

The invention relates to a query method for indefinite words and phrases of evaluation documents based on inverted index, which relates to an index method in the field of data science and a word segmentation method in the field of NLP, and solves the query problem of indefinite words and phrases of evaluation documents. The invention comprises the following steps: 1, data preprocessing is carriedout on the document to be queried, word segmentation is carried out by using jieba word segmentation method, and word dictionary and word frequency information are obtained; 2, based on the inverted index principle of complete reconstruction strategy, an adaptive inverted table is established. 3, combine that information of the indefinite words and sentence to be searched, identifying the indefinite words and phrases position information in each word and phrases in the adaptive inverted table index, and indexing the paragraphs where the indefinite words and phrases are located, so as to complete the query function of the indefinite words and phrases in the evaluation documents. The basic idea of the invention is to divide the text data into words and establish an inverted index so as to realize fast searching for indefinite words and sentences, thereby realizing the inquiry function of evaluation documents. The application scenario is wide, so it has high socio-economic value.
Owner:HARBIN INST OF TECH

Chinese word segmentation method and apparatus

Embodiments of the invention disclose a Chinese word segmentation method and apparatus. The method comprises the steps of dividing a text set into a plurality of short sentences and numbering the short sentences; for each Chinese character in the text set, obtaining a first short sentence number list corresponding to a current Chinese character, obtaining a second short sentence number list corresponding to an adjacent Chinese character adjacent to the right side of the current Chinese character, and calculating a degree of co-occurrence according to the first short sentence number list and the second short sentence number list; obtaining an adjacent character set corresponding to the current Chinese character, and calculating a relevant degree of adjacency according to the adjacent character set; determining whether a word consisting of the current Chinese character and the adjacent Chinese character is added into a candidate word set or not according to the degree of co-occurrence and the relevant degree of adjacency; and performing word segmentation on the text set according to the candidate word set. The method is small in calculation amount and high in accuracy when calculating the candidate word set, can effectively improve the accuracy of a word segmentation result and improve the efficiency of word segmentation, does not depend on a corpus dictionary, and can realize unsupervised candidate vocabulary extraction.
Owner:RUN TECH CO LTD BEIJING

Method and system for intelligently understanding user query intention

The invention discloses a method and system for intelligently understanding a user query intention, and the method comprises the steps: inputting a query statement, and carrying out the word segmentation processing through combining with a dictionary; performing part-of-speech tagging on the word segmentation result; performing named entity recognition on the words after the part-of-speech tagging; and carrying out grammatical analysis through a named entity identification result and a set grammatical rule to obtain a user query intention. According to the method, the input query statements are analyzed layer by layer according to the wording characteristics in the loan auditing industry, the query intention of the user is deeply understood, and the query efficiency is improved on the premise that the accuracy is ensured.
Owner:鼎复数据科技(北京)有限公司

Method and system for constructing emergency knowledge graph based on Chinese word segmentation technology

The invention discloses a method for constructing an emergency knowledge graph based on a Chinese word segmentation technology. The method specifically comprises the following steps: S1, inputting anemergency information text; S2, analyzing elements in the emergency information text in the step S1, extracting key data, and constructing an emergency knowledge base by utilizing the extracted key data; S3, performing word segmentation and judgment on the emergency information text input in the step S1 by adopting a multi-strategy combined Chinese word segmentation algorithm, and outputting a word segmentation result; S4, searching and matching the word segmentation result obtained in the step S3 in the emergency knowledge base by utilizing a retrieval engine, and outputting result data afterthe matching is successful; and S5, constructing an emergency knowledge graph according to the emergency service system in combination with the result data, and outputting graph result data. A scientific and comprehensive emergency knowledge graph is constructed according to an emergency business system, the data matching speed and the word segmentation precision are improved, the problems of lowretrieval efficiency and the like are solved, and the shared application service of emergency knowledge is realized.
Owner:SPEED SPACE TIME INFORMATION TECH CO LTD

Chinese place name phonetic spelling standardization method

The invention relates to a Chinese place name phonetic spelling standardization method. According to the obtained place name type, a corresponding Chinese keyword dictionary is constructed, and the best matching and automatic word segmentation of the keywords in the place name are completed by combining the character string label. Then, the operation of transforming place names into pinyin and spelling standardization is carried out, and finally the Chinese place names are transformed into the corresponding pinyin elements of the spelling standard. The word segmentation method of the inventioncan avoid word segmentation ambiguity, improve word segmentation efficiency, and solve the problem of automatic word segmentation of Chinese place names under the requirements of large amount of data, rich semantics and complex types of Chinese place names in a Chinese place names database. The paper realizes the Chinese place name phonetic transformation and the standardization of phonetic spelling in Chinese place name database, and solves the problem of Chinese place name phonetic transformation and standardization spelling in Chinese place name database with a large amount of data.
Owner:江苏省基础地理信息中心

Word segmentation method supporting large number of word banks, and computer readable storage medium and system

The invention provides a word segmentation method supporting a large number of word banks, and a computer readable storage medium and a system. The method comprises the following steps: constructing adomain dictionary; constructing an offline word segmentation model based on a domain dictionary; for the original text to be subjected to word segmentation, performing word segmentation through an offline word segmentation model to obtain a first word segmentation result; carrying out to-be-searched word extraction on the original text to be subjected to word segmentation, then carrying out first-level index search and second-level index search in the domain dictionary based on the to-be-searched words, and finally screening second-level index results to extract candidate words; and recombining the candidate words and the first word segmentation result, constructing a directed graph of the original text based on a recombining result, and calculating an optimal word segmentation result based on a shortest path method. According to the method, the word segmentation result in the single field is combined with the big word search result, the directed graph is constructed based on the combination result, the problem of solving the optimal word segmentation scheme is converted into the problem of the optimal path to be quickly solved, and the method is very suitable for segmenting the big words.
Owner:启业云大数据(南京)有限公司

Intelligent matching system

The invention provides an intelligent matching system which comprises: a data acquisition module used for acquiring user registration input data and user behavior logs published on a network and a system platform; a recommended object modeling module which is used for extracting a keyword in the announcement information according to each piece of announcement information in user registration inutdata, obtaining all keywords interested by the user according to all announcement information concerned by the user in the user behavior log of the user for each user, and obtaining the interest degree of each keyword interested by the user according to the attention behavior of the user on each announcement information concerned by the user in the user behavior log of the user; and a recommendation algorithm module which is used for calculating the interest degree of each user in the announcement information according to the keyword extracted from the announcement information and the interestdegree of each user in the extracted keyword, and recommending the announcement information to a plurality of users with the highest interest degree. The intelligent matching system is high in information recommendation precision and efficiency.
Owner:安徽省优质采科技发展有限责任公司

Adaptive task scheduling method and system and retrieval method comprising adaptive task scheduling method

The invention relates to the field of big data processing, in particular to an adaptive task scheduling method and system and a retrieval method comprising the adaptive task scheduling method. The adaptive task scheduling method comprises the steps that according to the resource utilization condition and the load change condition of each working node, the working nodes adjust the weights of the nodes every certain period; and a task scheduling node reads the weight condition of each working node, ranks the weights, and performs task scheduling according to the rank of the weights. According tothe method, the cluster heterogeneity and the current resource remaining situation of the working nodes are considered, the problem of relative imbalance of calculation task allocation of the workingnodes is solved, the overall calculation capability of the system is improved, the overall task completion time is shortened, and the performance is improved.
Owner:BEIJING XUEZHITU NETWORK TECH

Chinese word segmentation method based on Hash algorithm

The invention discloses a Chinese word segmentation method based on a Hash algorithm, and relates to the field of natural language processing. The method comprises the following steps of S1, configuring a word segmentation device on a search engine and establishing a dictionary structure; s2, monitoring the return operation of the user, and obtaining the first character in an input box; s3, inputting the first character into a dictionary for primary searching and screening; s4, forming a tree by all words with the same first characters in the dictionary; s5, placing a second word in the word on a second layer of the tree, and creating a Hash index table; s6, carrying out Hash searching on the remaining characters; s7, after an IK reads the new lexicon, notifying the search engine to update; and S8, updating the dictionary information in the memory by the search engine. According to the invention, the Hash search is carried out on the first character by creating a dictionary storage mechanism, the dictionary structure and the algorithm of carrying out Hash search on the remaining characters via the tree result are established, and the search engine is updated by using IK word segmentation, so that the Chinese word segmentation efficiency is improved, the system complexity is reduced, and the index redundancy degree is reduced.
Owner:合肥天毅网络传媒有限公司

Chinese word segmentation method and device and search lexicon reading method

In order to overcome the defects in the prior art, the invention provides a Chinese word segmentation method and device and a search lexicon reading method. The method comprises the steps of performing word segmentation on sentences to be subjected to word segmentation according to the input maximum word length, and obtaining a first-time word segmentation result; gradually reducing the length of the maximum word length, and performing word segmentation on the sentence to be subjected to word segmentation when the maximum word length changes each time to obtain an Nth word segmentation result; and comparing the first word segmentation result to the Nth word segmentation result with a word bank to obtain an output list. According to the invention, the to-be-segmented sentences can be accurately segmented, and particularly, the recognition rate of fixed words in the middle of the to-be-segmented sentences can be improved. The method has the advantages of being high in word segmentation efficiency and accurate in word segmentation result.
Owner:深圳市华南城数字科技有限公司

Text word segmentation method and text word segmentation device

The invention relates to the technical field of Chinese text processing, in particular to a text word segmentation method and a text word segmentation device. The word segmentation method comprises the steps of obtaining a to-be-processed Chinese text, segmenting the Chinese text into a plurality of Chinese short texts; wherein each Chinese short text comprises a plurality of continuous Chinese characters representing a semantic meaning; according to the method, the length of the Chinese text can be reduced, interference of non-Chinese characters can be filtered out, the Chinese text subjectedto word segmentation is output on the basis of the multiple segmented Chinese short texts and the pre-trained Chinese word segmentation model, and the word segmentation efficiency of the Chinese textcan be improved.
Owner:BEIJING DIDI INFINITY TECH & DEV

A guiding system and method for electric power civil engineering foundation acceptance based on the Internet of Things

The invention belongs to the technical field of Internet of Things and discloses a guiding system and method for electric power civil engineering foundation acceptance based on the Internet of Things.The geographic and meteorological information is collected by using a data layer, and a disaster analysis model and historical disaster information are inquired and sent to a basic service layer, sothat the meteorological monitoring and disaster analysis are completed, the information is automatically pushed. A business system layer monitors weather in real time, analyzes the disaster and givesan early warning, so that a user can check the meteorological disaster information at any time. The system is maintained by professionals through the Internet in real time, and the stability of the system is ensured. According to the present invention, the structure is reasonable, and the meteorological disasters can be effectively prevented.
Owner:HUBEI POLYTECHNIC UNIV

Word segmentation method and device

The invention provides a word segmentation method and device. The method comprises the following steps: dividing a to-be-processed corpus into a plurality of corpus segments according to a preset granularity; inserting mask segments among the plurality of corpus segments, and inputting a to-be-predicted corpus containing the plurality of corpus segments and the mask segments into a pre-training language model; corpus information in the mask fragments adjacent to the corpus fragments is predicted through a pre-training language model; and performing word segmentation processing on the to-be-processed corpus based on the plurality of corpus segments and the corpus information to obtain a target word segmentation result. According to the method, the corpus information of the mask fragment can be predicted through the pre-training language model, so that word segmentation processing is completed through the corpus information obtained through prediction, word segmentation can be completed without the help of a dictionary or a word segmentation text, efficiency reduction caused by manual construction of the dictionary or the word segmentation text is avoided, and word segmentation efficiency is improved.
Owner:CLOUDMINDS SHANGHAI ROBOTICS CO LTD

A Chinese word segmentation method based on bidirectional lstm, cnn and crf

The invention discloses a Chinese word segmentation method based on bidirectional LSTM, CNN and CRF, which is an improvement and optimization of traditional Chinese word segmentation based on a deep learning algorithm. The specific steps of the method are as follows: preprocessing the initial corpus, extracting the character feature information of the corpus and the corresponding pinyin feature information of the character; using the convolutional neural network to obtain the pinyin feature information vector of the character; using the word2vec model to obtain the character feature information vector of the text; Splicing the pinyin feature vector and the character feature vector to get the context information vector, put it into the bidirectional LSTM neural network; use the linear chain conditional random field to decode the output of the bidirectional LSTM to get the word segmentation tag sequence; decode the word segmentation tag sequence to get Word segmentation results. The present invention uses a deep neural network to extract text character features and pinyin features and combines conditional random fields for decoding, which can effectively extract Chinese text features and achieve good results in Chinese word segmentation tasks.
Owner:NANJING UNIV OF POSTS & TELECOMM

Word segmentation method, computer-readable storage medium and system supporting a large number of lexicons

The present invention proposes a word segmentation method, computer-readable storage medium and system supporting a large number of thesaurus. The method includes the following steps: constructing a domain dictionary; constructing an offline word segmentation model based on the domain dictionary; for the original text to be segmented, by offline word segmentation The model performs word segmentation to obtain the first word segmentation result; the original text to be segmented is extracted to be searched, and then based on the word to be searched, the first-level index search and the second-level index search are performed in the domain dictionary, and finally the second-level index results are screened. Extract the candidate words; reorganize the candidate words and the first word segmentation results, construct the directed graph of the original text based on the reorganization results, and calculate the optimal word segmentation results based on the shortest path method. The present invention combines word segmentation results in a single field with big word search results, constructs a directed graph based on the combined results, and converts the problem of solving the optimal word segmentation scheme into the problem of the optimal path to quickly solve, which is very suitable for separating big words.
Owner:启业云大数据(南京)有限公司

A Query Method of Indefinite Length Words and Sentences Based on Inverted Index

A query method for variable-length words and sentences in evaluation documents based on an inverted index, which involves an indexing method in the field of data science and a word segmentation method in the field of NLP, and solves the query problem of variable-length words and sentences in evaluation documents. The steps of the present invention are: 1. Carry out data preprocessing on the document to be queried, and use the jieba word segmentation method to carry out word segmentation processing to obtain word dictionary and word frequency information; 2. Establish an adaptive inverted table based on the inverted index principle of the complete reconstruction strategy; 3. , Combining the information of the variable-length words and sentences to be searched, through the self-adaptive inverted list indexing the position information of each word in the words and sentences, identifying the position information of the variable-length words and sentences and indexing the paragraphs where they are located, to complete the query function of variable-length words and sentences in evaluation documents. The basic idea of ​​the present invention is to segment the text data into words, establish an inverted index, and then realize fast searching for words and sentences of indefinite length, so as to realize the query function of evaluation documents. It has a wide range of application scenarios, so it has high socio-economic value.
Owner:HARBIN INST OF TECH

Address data correction method and device

The invention discloses an address data correction method and device. The method comprises the steps: obtaining address data, wherein the address data comprises matching address data and N comparison address data, and N is a positive integer greater than or equal to 0; respectively matching the matched address data with the N comparison address data to obtain N address accurate values; screening out a target address accurate value from the N address accurate values, using the comparison address data corresponding to the target address accurate value serve as target address data, wherein the target address accurate value is the address accurate value with the maximum value and larger than a preset deviation correction value in the N address accurate values; extracting and displaying associated address coordinates of the target address data, and replacing preset address coordinates with the associated address coordinates. The address data can be automatically matched and compared, whether updating and correction are needed or not is determined, the workload of a user is reduced, the correction time is shortened, and the correction efficiency is improved.
Owner:广东精一信息技术有限公司

Word segmentation method and word segmentation device

The invention relates to word segmentation technology and provides a word segmentation method and a word segmentation device aiming at the defects of large amount of computation and low identification accuracy of ambiguous words in the existing word segmentation methods. The word segmentation method comprises the steps of receiving the input word sequence and extracting at least a word segmentation package contained in the word sequence; and extracting and outputting the word string obtained in each extracted word segmentation package. The invention also provides the word segmentation device. In the technical scheme provided by the invention, the word sequence segmentation can be automatically finished in the input process; therefore, the technical scheme can greatly reduce the amount of computation of word segmentation operation and improve the word segmentation efficiency. In addition, the real segmentation intention of the user can be accurately reflected according to the segmentation on the word sequence conducted by the word segmentation packages, thus greatly improving the word segmentation accuracy of the word sequence.
Owner:卓望数码技术(深圳)有限公司

Coding method for clinical examination medical text

The invention provides a coding method for clinical laboratory medicine text, and relates to the field of clinical laboratory medicine. The method comprises the following steps: analyzing and processing a clinical examination medical text to obtain an internal content structure of the clinical examination medical text, and carrying out structured coding on each structure of the clinical examination medical text; and calculating the similarity with each clinical examination medical term in a clinical examination medical term library before carrying out structured coding. Therefore, repeated and similar clinical examination medical terms can be effectively reduced; when the clinical examination medical terms are stored, a source structure based on segmented words is adopted, the segmented words serve as basic units of coding, and then different segmented words in a segmented word library are coded in a combined mode, so that the corresponding clinical examination medical terms are formed; a great storage space can be saved; word segmentation is performed by combining a word segmentation dictionary and a machine learning word segmentation device, so that the workload of manual auditing is reduced, and the word segmentation efficiency is improved; and three mapping modes of full mapping, basic mapping and main segmented word mapping are added, so that the universality is better.
Owner:THE AFFILIATED HOSPITAL OF SOUTHWEST MEDICAL UNIV

A Chinese word segmentation method and device

Embodiments of the invention disclose a Chinese word segmentation method and apparatus. The method comprises the steps of dividing a text set into a plurality of short sentences and numbering the short sentences; for each Chinese character in the text set, obtaining a first short sentence number list corresponding to a current Chinese character, obtaining a second short sentence number list corresponding to an adjacent Chinese character adjacent to the right side of the current Chinese character, and calculating a degree of co-occurrence according to the first short sentence number list and the second short sentence number list; obtaining an adjacent character set corresponding to the current Chinese character, and calculating a relevant degree of adjacency according to the adjacent character set; determining whether a word consisting of the current Chinese character and the adjacent Chinese character is added into a candidate word set or not according to the degree of co-occurrence and the relevant degree of adjacency; and performing word segmentation on the text set according to the candidate word set. The method is small in calculation amount and high in accuracy when calculating the candidate word set, can effectively improve the accuracy of a word segmentation result and improve the efficiency of word segmentation, does not depend on a corpus dictionary, and can realize unsupervised candidate vocabulary extraction.
Owner:RUN TECH CO LTD BEIJING

A Personalized Parallel Word Segmentation Processing System and Processing Method

The invention relates to a personalized concurrent word segmentation processing system and a processing method of the processing system. The personalized concurrent word segmentation processing system and the processing method of the processing system comprises a word segmentation requesting module, a word segmentation module based on a personalized word segmentation dictionary, a word segmentation module based on a general word segmentation dictionary, a control module and a high speed word segmentation processing module. Word segmentation requests of a user are simultaneously sent to the word segmentation module based on the personalized word segmentation dictionary and the word segmentation module based on the general word segmentation dictionary. When the word segmentation module based on the personalized word segmentation dictionary is destined, word segmentation processing result is sent back to the word segmentation requesting module through the control module, and meanwhile word segmentation requests of the word segmentation requesting module to the word segmentation module based on the general word segmentation dictionary is interrupted; otherwise, dynamic update of the personalized word segmentation dictionary is proceeded according to an earliest and least using principle and the word segmentation processing result of the word segmentation module based on the personalized word segmentation dictionary by the control module. The personalized concurrent word segmentation processing system and the processing method of the processing system is capable of satisfying accuracy rate of the word segmentation, meanwhile improving word segmentation efficiency of the system greatly and satisfying efficient referring requirements of a mobile user.
Owner:XIAN UNIV OF POSTS & TELECOMM

Text word segmentation processing method and device, equipment and medium

The invention discloses a text word segmentation processing method and device, equipment and a medium. The method comprises the following steps: collecting a text to be subjected to word segmentation, the text to be subjected to word segmentation comprises a plurality of suspected words which are connected in series, and the suspected words are composed of pronunciation characters; all characters in the text to be subjected to word segmentation are sequentially traversed, redundant characters formed by continuous repetition in the suspected words are ignored in the traversing process, the redundant characters are converted into words in a dictionary tree diagram, the words are sequentially added into a result list, the dictionary tree diagram comprises a plurality of paths starting from a root node of the dictionary tree diagram and respectively reaching different tail end nodes, and the word sequence of the dictionary tree diagram is obtained; nodes through which each path passes store each character of the single word in sequence; and outputting the words in the result list in sequence as word segmentation results. According to the word segmentation device, word segmentation processing is carried out according to the tree diagram, abnormal repeated characters can be processed in the word segmentation process, redundant characters in the text to be subjected to word segmentation are ignored, and words contained in the text are extracted accurately and accurately.
Owner:GUANGZHOU HUADUO NETWORK TECH

Word segmentation method and device, electronic equipment and storage medium

The invention discloses a word segmentation method and device, electronic equipment and a storage medium. The word segmentation method comprises the steps: inputting a word segmentation word stock into a pre-stored baseline word segmentation model, and determining a preliminary word segmentation result of the word segmentation word stock based on the baseline word segmentation model; inputting thepreliminary word segmentation result into a pre-trained word segmentation model, and outputting a segmentation result of the preliminary word segmentation result based on the word segmentation model,the segmentation result comprising a segmentation unit, and the segmentation unit comprising a segmentation character and / or a segmentation character set; and combining the segmentation units according to a preset combination rule, and determining a final word segmentation result of the segmented word stock. According to the word segmentation method, the existing baseline word segmentation modelis not changed, and the convergence rate of the word segmentation model is ensured, and the word segmentation efficiency is improved, and the word segmentation result of the baseline word segmentationmodel is corrected, so that the accuracy of the word segmentation result is improved.
Owner:CHINA MOBILE SUZHOU SOFTWARE TECH CO LTD +1
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products