Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

69 results about "Content word" patented technology

In linguistics, content words are words that name objects of reality and their qualities. They signify actual living things (dog, cat, etc.), family members (mother, father, sister, etc.), natural phenomena (snow, Sun, etc.) common actions (do, make, come, eat, etc.), characteristics (young, cold, dark, etc.), etc. They consist mostly of nouns, lexical verbs and adjectives, but certain adverbs can also be content words. They contrast with function words, which are words that have very little substantive meaning and primarily denote grammatical relationships between content words, such as prepositions (in, out, under, etc.), pronouns (I, you, he, who, etc.), conjunctions (and, but, till, as, etc.), etc.

Video searching system based on content analysis

This invention provides a video search system based on content including an analysis server of video, phone, caption and man-face, a merging analysis server, a video search server cluster and a search dispatch server, in which, the analysis server can analyze and pick up video content information from the video and generate related spelling pattern information by analyzing audio signals to set up video index by analyzing and describing video content word information, the merging analyzing system merges data got from video analysis, phone analysis and caption analysis to generate index information, the video index server cluster stores the video content and the structure by the analysis system, the search system can enquires various information and feed back the result to the search dispatch server, which receives the research request of the customer end to analyze and transmit the request to the research cluster and generate ordered enquiry result.
Owner:北京新岸线网络技术有限公司

Method and system for inputting contact information

A method and a system for inputting contact information are provided. The method includes: acquiring a content attribute of a current edit box; starting up a camera device, and entering a shoot preview interface of the camera device; placing a text content of contact information to be input in the shoot preview interface of the camera device, and shooting the text content of the contact information; analyzing and recognizing the text content located near the positioning identifier in the preview interface in an image through an optical character recognition technology, and extracting a contact information character string conforming to the content attribute of the current edit box; and inputting a recognition result character string into the current edit box.Through the method and system, a user does not need to input the text content word by word through a keyboard input method or a touch screen, thereby saving the input time for the user. Moreover, since the system knows what type of character string is required in the current edit box during recognition, high accuracy of the recognition result character string can be ensured.
Owner:SHANGHAI HEHE INFORMATION TECH DEV

Document summarizer for word processors

An author-oriented document summarizer for a word processor is described. The document summarizer performs a statistical analysis to generate a list of ranked sentences for consideration in the summary. The summarizer counts how frequently content words appear in a document and produces a table correlating the content words with their corresponding frequency counts. Phrase compression techniques are used to produce more accurate counts of repeatedly used phrases. A sentence score for each sentence is derived by summing the frequency counts of the content words in a sentence and dividing that tally by the number of the content words in the sentence. The sentences are then ranked in order of their sentence scores. Concurrent with the statistical analysis, during the same pass through the document the summarizer performs a cue-phrase analysis to weed out sentences with words or phrases that have been pre-identified as potential problem phrases. The cue-phrase analysis compares sentence phrases with a pre-compiled list of words and phrases and sets conditions on whether the sentences containing them can be used in the summary. Following the cue-phrase analysis, the summarizer creates a summary containing the higher ranked sentences. The summary may also include a conditioned sentence if the conditions established for inclusion of the sentence have been satisfied. The summarizer then inserts the sentence at the beginning of the document before the start of the text.
Owner:MICROSOFT TECH LICENSING LLC

Machine translation using learned word associations without referring to a multi-lingual human authored dictionary of content words

A method and computer-readable medium are provided that perform a series of steps associated with machine translation. These steps include using a first text in a first language and a second text in a second language, to produce an association list where words in the first language are associated with words in the second language. A first syntactic structure for a sentence from the first text is aligned with a second syntactic structure for a sentence in the second text based on the association list without referring to a bilingual dictionary of content words. The association list is also used during translations. Specifically, a word in the first language is translated into a word in the second language based on an entry in the association list without referring to a bilingual dictionary that contains content words. Thus, training and translation are performed without the need for a bilingual dictionary of content words.
Owner:MICROSOFT TECH LICENSING LLC

Automated system and method for generating reasons that a court case is cited

A computer-automated system and method identify text in a first “citing” court case, near a “citing instance” (in which a second “cited” court case is cited), that indicates the reason(s) for citing (RFC). The automated method of designating text, taken from a set of citing documents, as reasons for citing (RFC) that are associated with respective citing instances of a cited document, has steps including: obtaining contexts of the citing instances in the respective citing documents (each context including text that includes the citing instance and text that is near the citing instance), analyzing the content of the contexts, and selecting (from the citing instances' context) text that constitutes the RFC, based on the analyzed content of the contexts. A related computer-automated system and method selects content words that are highly related to the reasons a particular document is cited, and gives them weights that indicate their relative relevance. Another related computer-automated system and method forms lists of morphological forms of words. Still another related computer-automated system and method scores sentences to show their relevance to the reasons a document is cited. Also, another related computer-automated system and method generates lists of content words. In a preferred embodiment, the systems and methods are applied to legal (especially case law) documents and legal (especially case law) citations.
Owner:RELX INC

Automatic generation of statistical language models for interactive voice response applications

A Statistical Language Model (SLM) that can be used in an ASR for Interactive Voice Response (IVR) systems in general and Natural Language Speech Applications (NLSAs) in particular can be created by first manually producing a brief description in text for each task that can be performed in an NLSA. These brief descriptions are then analyzed, in one embodiment, to generate spontaneous speech utterances based pre-filler patterns and a skeletal set of content words. The pre-filler patterns are in turn used with Part-of-Speech (POS) tagged conversations from a spontaneous speech corpus to generate a set of pre-filler phrases. The skeletal set of content words is used with an electronic lexico-semantic database and with a thesaurus-based content word extraction process to generate a more extensive list of content words. The pre-filler phrases and content words set, thus generated, are combined into utterances using a lexico-semantic resource based process. In one embodiment, a lexico-semantic statistical validation process is used to correct and / or add the automatically generated utterances to the database of expected utterances. The system requires a minimum amount of human intervention and no prior knowledge regarding the expected user utterances, and the WWW is used to validate the word models. The system requires a minimum amount of human intervention and no prior knowledge regarding the expected user utterances in response to a particular prompt.
Owner:LYMBA CORP

Method for applying phrase index technology into internet search engine

The invention applies phrase index technology to Internet search engine, decomposes the sentences in page documents into words and expressions, adds a plurality of other phrases to compose index phrase set in front of and behind key words which are taken as head words and generates index documents of web contents with phrases as unit; extracts the content words in query information submitted by user through word segmentation procedure and performs reasonable and possible combination of the words to gain the phrase set for search; precisely matches the phrases in the phrase set for search with the phrases in the index document in turn to gain search results; the phrase emphasizes single words in the aspect of expressing semantics, which facilitates the search result embody the possible intention of query more precisely.
Owner:新百丽鞋业(深圳)有限公司 +1

Knowledge network-based text indexing system and method

The invention discloses a knowledge network-based text indexing system and method. The text indexing system comprises a single text feature extraction unit, a multi-text word relation extraction unit, a knowledge tree generating unit, a knowledge tree application unit and a knowledge base storage unit. The text indexing method comprises the following steps of: partitioning words in a text input to the text indexing system, and acquiring text feature words in the text; deducing a class word TAG corresponding to the text according to node positions of a knowledge tree corresponding to the text feature words; and judging the validity of the TAG through a judgment type model based on the TAG, then extracting a reliable TAG word set, and repositioning a text feature word set through the reliable TAG word set to form a reliable text feature word set. According to the system and the method, content word extraction, class labeling and phrase extraction are integrated, so that the extraction effects can be mutually promoted; and the semantics of the words are expressed through the nodes of the knowledge network, so that different meanings are reduced.
Owner:HYLANDA INFORMATION TECH

Method and device thereof for improving search efficiency of search engine

The invention discloses a method and a device thereof for improving search efficiency of a search engine. The device comprises a search result preprocessing module, a webpage url analyzing module, a webpage crawler module, a webpage structure analyzing module, a webpage content analyzing module, a classified search result bank and a classified display module. The method comprises the following steps of: obtaining a webpage url and a hitting keyword by preprocessing a return result of the search engine; analyzing the webpage url and preserving the webpage into the classified search result bank, wherein filtering webpages belonging to a website homepage type, downloading other webpages, judging whether the other webpages belong to list type webpages according to a character link ratio, extracting the content of the webpages in non website homepage types and non list webpage types, extracting content word numbers and judging whether the content includes keywords; and finally, displaying the analyzing results preserved into the classified search result bank in a classified mode. The invention can greatly improve search efficiency of users and reduce labor intensity.
Owner:XIAMEN MEIYA PICO INFORMATION

Statistical machine translation apparatus and method

A statistical machine translation apparatus and method reflecting linguistic information are provided. In the process of generating a translation model based on statistical information on source language sentences and target language sentences during word alignment, the translation model is generated using word alignment results that are amended based on a bilingual dictionary. Further, instead of using the source language sentence and the target language sentence (i.e., their bilingual corpora) as materials to generate the translation model, it is determined whether or not the morphemes are meaningful content words in the source and target language sentences. Based on the determination, pre-processing is performed on the source language sentence and the target language sentence.
Owner:SAMSUNG ELECTRONICS CO LTD

Web Bookmark Manager

A web bookmark manager processes a collection of web bookmarks to produce a richly structured presentation of the bookmark collection. The bookmark collection includes representations of resources, topics, and notice events. A notice event includes a reference to a web resource and a natural language description provided by a user. The notice description is processed by a classifier to determine topics to which the referenced web resource shall be associated. The processing of the notice description includes parsing to obtain sequences of content words, to which topics are associated. Generalizations of a topic are determined by subsequences of the associated word sequence. The presentation of a collection of bookmarks includes a chronology of notices, a ranking of topics, a taxonomy of topics, and an index of content words from topics. The presentation further includes per-topic and per-resource presentations.
Owner:ROJER ALAN S

Illegal online commodity detection method

The invention relates to an illegal online commodity detection method, comprising the following steps: step (1) of taking information of a page on which to-be-detected commodities exist through a web crawler; step (2) of finding a least-depth node comprising a plurality of information blocks with similar structure in a webpage as a critical node through analyzing a DOM (document object model) tree structure of an electronic commerce website on which to-be-detected commodities exist, forming an associated information point to extract and establishing a template, and extracting commodity attribute information data from webpage information acquired by the crawler; step (3) of establishing a semantic dictionary, performing word segmentation to the extracted commodity attribute information through a method based on character matching; step (4) of manually establishing an illegal semantic library, recognizing and judging words in the illegal semantic library and content words field related to the commodity attribute information processed through word segmentation according to a function Illegal List, and judging an illegal class of the illegal online commodity according to a function return result. The method provided by the invention is characterized by simple calculation and good timeliness, and is suitable for changeful page modes.
Owner:ZHEJIANG PANSHI INFORMATION TECH

English sentence simplification algorithm based on pre-trained Transformer language model

The invention discloses an English sentence simplification algorithm based on a pre-trained Transformer language model, and the algorithm comprises the following steps: 1, carrying out the statisticsof word frequency through a public Wikipedia corpus; 2, utilizing a public pre-trained word embedding model to obtain vectorized representation of words; 3, preprocessing sentences needing to be simplified to obtain content words; 4, for the content words in the sentences, utilizing a public pre-training Transformer language model Bert to obtain a candidate alternative word set of the words; 5, sorting the candidate alternative word set of each content word by utilizing a plurality of features; 6, comparing the word frequencies of the candidate words with the highest sequence with the word frequencies of the original content words, and determining a final substitute word; and 7, processing other content words in the sentence according to the steps 4 to 6 in sequence to obtain a final simplified sentence. According to the method, the pre-trained Transformer language model is fully utilized without utilizing any labeled parallel corpus, so that the English sentence simplification accuracy is effectively improved.
Owner:YANGZHOU UNIV

Theme word extraction method, and method and device for obtaining related digital resource by using same

The invention provides a theme word extraction method, and a method and a device for obtaining related digital resources by using the same. The theme word extraction method comprises: firstly, performing word segmentation on a text of digital resource, and then obtaining content words according to a word segmentation result; aimed at each theme, obtaining probability distribution of the content words, the probability distribution comprising the content words and corresponding weight thereof; obtaining each meaning of the content words, combining the content words in the same meaning and combining the corresponding weight; and according to the combined content words and the weight thereof, determining the theme words. The scheme views from an angle of the meaning of a word, and the words in the same meaning are combined, so as to prevent interference of polysemic words and synonyms on extraction of the theme words in the prior art, and improve accuracy of extraction of the theme words. The method eliminates dependence on selection of feature words and identity of named entities in the prior art, weakens interference of polysemic words and synonyms on extraction of the theme words, and a user oriented customized special subject organization and generation thereof are realized.
Owner:NEW FOUNDER HLDG DEV LLC +2

Electronic homework copying detection method

The invention relates to an electronic homework copying detection method, which comprises the following steps of: first performing document type adaptation and document content extraction on homework to be processed; then performing Chinese word segmentation and speech tagging on a processed plain text file, and calculating similarity between the homework according to the word frequency of high frequency words and the semantic similarity of content words; and finally fusing the similarity between the homework and the semantic similarity of the content words, and judging whether two documents are copied or not according to a threshold value. The method is characterized in that: copying detection on the electronic homework is realized by utilizing the word frequency of the high frequency words 'De', 'Yi', 'Shi', 'Le' and 'Wo' and the semantic similarity of the content words. By the method, the problems of the copying detection on batch homework in an E-learning system and detection on the homework copying ways of synonymous substitution, statement regulation and the like of students are solved; and the method can be used for performing the copying detection on the homework of various document types in a computer-aided teaching system.
Owner:NORTH CHINA UNIVERSITY OF TECHNOLOGY

Data summarization method and apparatus

A method of generating caption abstract, including: generating a target text from a predetermined caption, analyzing a morpheme of a word included in the target text, and analyzing a grammatical structure of the target text by referring to the morpheme; extracting and removing low content words from the target text by using the morpheme or information on the grammatical structure and determining a main predicate; extracting a major sentence component with respect to the main predicate by referring to the information on the grammatical structure, as a candidate abstract word; substituting a relevant word for a complex noun phrase or a predicate phrase from the candidate abstract words by referring to a predetermined database; and generating an abstract by rearranging the candidate abstract words according to a predetermined rule.
Owner:SAMSUNG ELECTRONICS CO LTD

Method and device for displaying words

InactiveCN101963954AImplement character additionAchieve deleteSpecial data processing applicationsContent wordGlyph
The embodiment of the invention provides a method for displaying words, which comprises: acquiring text contents; according to a pre-built character display internal code word bank, converting the alphabetic characters of the text contents into character display internal codes, which correspond to the alphabetic characters, in the character display internal code word bank; rearranging the converted character display internal codes according to a display sequence; and according to the rearranged character display internal codes, searching the character display internal code word bank for rearranged character display internal code associated character pattern lattice data, acquiring the rearranged character display internal code associated character pattern lattice data, and outputting and displaying the character pattern lattice data. The embodiment of the invention also discloses a word display device. When the method and the device are used, the display of the text content words can be realized simply and conveniently.
Owner:KONKA GROUP

Original text and translated text alignment method and apparatus

The invention discloses an original text and translated text alignment method. The method comprises: performing word segmentation on all original text statements to remove stop words and obtain content words; obtaining all translation items of the content words of the original text statements; matching all the translation items of the content words of the original text statements in all translated text statements to obtain the similarity between the content words of the original text statements and the translated text statements; according to the similarity between the content words of the original text statements and the translated text statements, matching the original text statements with the translated text statements to obtain the similarity between the original text statements and the translated text statements; and performing matching and alignment on a translated text statement with highest similarity with an original text statement and the original text statement. The invention discloses an original text and translated text alignment apparatus. According to the method and the apparatus, the problem in original text and translated text alignment is solved.
Owner:IOL WUHAN INFORMATION TECH CO LTD

A method and system for copying and pasting content of a word file with format

The invention discloses a method and system for copying and pasting Word file content with format, belonging to the technical field of printing and typesetting. The present invention first creates a temporary Word file in docx format, pastes the content of the Word file to be copied into the temporary Word file; then obtains the XML source file of the content to be copied according to the temporary Word file; then converts the XML source file into a target software that can recognize The XML target file; finally import the XML target file data into the target software. The present invention can directly copy all the content in the Word file to the professional typesetting software, without re-arranging the text format, graphics, images, tables and other objects in the professional typesetting software, which greatly simplifies the copying of the Word file content to the professional typesetting software The process of software improves the efficiency of typesetting.
Owner:PEKING UNIV FOUNDER GRP CO LTD +1

Method and system for converting PowerPoint file into word file

The invention relates to the field of processing techniques of computer files, and particularly relates to a method and a system for converting a PowerPoint file into a word file. The method is characterized by comprising the following steps of using a file name of the PowerPoint file as a primary title of the word file; for a first shape of each page, if a character in the shape is the same with a character of a first shape of a previous page, omitting the character in the shape; otherwise, using the file name of a text in the first shape in this shape as a secondary subtitle of the word file; for each page in the ppt (PowerPoint) file, firstly reading the shape of each area of a current page; judging the attribute type of the area, so as to distinguish that the area is content of the text, a form, a picture, an embedded object and the like; converting the area according to the different contents. The invention also provides the system for converting the PowerPoint file into the word file. The system for converting the PowerPoint file into the word file comprises a content reading module, a content recognizing module, a content classifying and processing module and a classified and processed content word-writing-in module.
Owner:TIANJIN CHENGJIAN UNIV +2

A word sense disambiguation method and system based on graph model

The invention discloses a word sense disambiguation method and system based on a graph model, and belongs to the field of natural language processing technology. The technical problem to be solved bythe present invention is how to combine multiple Chinese and English resources, complement each other's advantages, realize full exploitation of disambiguation knowledge in resources, and improve wordsense disambiguation performance.The technical scheme adopted is as follows: 1, a word sense disambiguation method based on graph model, comprising the following steps: S1, extracting contextual knowledge: carrying out part-of-speech tagging on ambiguous sentences, extracting substantive words as contextual knowledge, wherein the substantive words refer to nouns, verbs, adjectives and adverbs; S2, similarity calculation: performing similarity calculation based on English, similarity calculation based on word vector and similarity calculation based on HowNet; 3, constructing a disambiguation graph; S4, performing the correct choice of word meaning. 2, A word sense disambiguation system based on graph model, which comprises a context knowledge extraction unit, a similarity calculation unit,a disambiguation graph construction unit and a word sense correct selection unit.
Owner:ZAOZHUANG UNIV

Method for extracting key words of single text

The invention discloses a method for extracting key words of single text, especially comprising the following steps: (1) opening the single text in the field collection; (2) pre-processing the content of the text; (3) extracting the meaningful notional word; (4) making statistic of the word frequency of the notional word; (5) opening all the texts in the field collection; (6) making statistic of the message frequency of the notional word in the field collection; (7) making statistic of the returning pages of search engine retrieving the notional word; (8) using the developed TFIDF word right formula to calculate the weights of all the notional words in the single text to extract a certain percentage of the key words. Besides, the method can compensate the insufficient of the TFIDF algorithm and can prevent the impacts of the irrelevant field connection to extract the key words, thereby improving the extracting precision of the key words and maintaining the field features of the extracting result for the key words.
Owner:SHANGHAI UNIV

Context similarity calculation-based word sense disambiguation method

The invention relates to a context similarity calculation-based word sense disambiguation method. The method comprises the steps of processing training corpora, and training a model by using a part-of-speech tagging version of ukWaC; screening parts of speech, and only reserving notional words including nouns, adjectives, adverbs and verbs; training a bidirectional LSTM model by using the corporasubjected to part-of-speech screening; inputting example sentences of to-be-disambiguated words to the bidirectional LSTM model to obtain context vectors; inputting contexts of the to-be-disambiguatedwords to the bidirectional LSTM model to obtain context vectors of the to-be-disambiguated words; and calculating cosine similarity for the context vectors of the to-be-disambiguated words and the context vectors of the example sentences, and further selecting semanteme of the to-be-disambiguated words by utilizing a k-neighbor method according to an obtained similarity result. According to the method, the semanteme is better modeled; the words and the parts of speech are combined by using an underline behind the words directly; obtained word vectors well distinguish different parts of speechof the same word; and the disambiguation accuracy is improved by 0.5% on an experimental basis of baselines.
Owner:SHENYANG AEROSPACE UNIVERSITY

Data summarization method and apparatus

A method of generating caption abstract, including: generating a target text from a predetermined caption, analyzing a morpheme of a word included in the target text, and analyzing a grammatical structure of the target text by referring to the morpheme; extracting and removing low content words from the target text by using the morpheme or information on the grammatical structure and determining a main predicate; extracting a major sentence component with respect to the main predicate by referring to the information on the grammatical structure, as a candidate abstract word; substituting a relevant word for a complex noun phrase or a predicate phrase from the candidate abstract words by referring to a predetermined database; and generating an abstract by rearranging the candidate abstract words according to a predetermined rule.
Owner:SAMSUNG ELECTRONICS CO LTD

Method of building a sorting model, and application method and apparatus based on the model

The present disclosure provides a method of building a sorting model, and an application method and apparatus based on the model. The method of building a sorting model comprises: obtaining, from a search log, a query including a relationship triple and a clicked title of a search result corresponding to the query, wherein the relationship triple includes a content word pair and a relationship word of the content word pair; obtaining training data using the obtained query, the clicked title corresponding to the query, and times of click of the clicked title; using the training data to train a neural network-based sorting model, the sorting model being used to sort sentences according to the sentences' description of a relationship of the content word pair. The sorting model may be used to implement the following applications: sorting the search results of the query comprising the relationship triple, determining the sentence describing the relationship of the content word pair; upon displaying the relevant content word with respect to the query including the content word, the search engine can further display the sentence describing the relationship between the relevant content word and the searched content word.
Owner:BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Intelligent contract classification method based on keyword feature extraction and attention

The invention provides an intelligent contract classification method based on keyword feature extraction and attention, and the method comprises the steps: processing codes of intelligent contracts through a long-term and short-term memory network, carrying out the feature extraction of corresponding keywords, and combining with an attention mechanism, thereby achieving a purpose of classifying the intelligent contracts; training the intelligent contract into a content word vector by using a word-to-word vector model Word2Vec, and converting the keyword into a serialized vector by using a vectorized text tool Tokenizer; and putting the serialized vector into a long-term and short-term memory network, and connecting the final hidden state vector with each word vector of the intelligent contract; after the connected vectors are subjected to one-layer convolution operation and one-layer pooling operation, putting the operated vectors into a long-short-term memory neural network, and multiplying the final hidden state vector by a vector generated through attention; and putting the obtained sentence representation into a long-term and short-term memory neural network, and finally classifying the intelligent contracts by using a softmax classifier; and finally, evaluating the model on the data set of the Ethereum website by combining the DApps decentralization application program, and proving the effectiveness of the model by an experimental result. The training accuracy reaches 89.1%.
Owner:SHANDONG UNIV OF SCI & TECH

Automated system and method for generating reasons that a court case is cited

A computer-automated system and method identify text in a first “citing” court case, near a “citing instance” (in which a second “cited” court case is cited), that indicates the reason(s) for citing (RFC). The automated method of designating text, taken from a set of citing documents, as reasons for citing (RFC) that are associated with respective citing instances of a cited document, has steps including: obtaining contexts of the citing instances in the respective citing documents (each context including text that includes the citing instance and text that is near the citing instance), analyzing the content of the contexts, and selecting (from the citing instances' context) text that constitutes the RFC, based on the analyzed content of the contexts. A related computer-automated system and method selects content words that are highly related to the reasons a particular document is cited, and gives them weights that indicate their relative relevance. Another related computer-automated system and method forms lists of morphological forms of words. Still another related computer-automated system and method scores sentences to show their relevance to the reasons a document is cited. Also, another related computer-automated system and method generates lists of content words. In a preferred embodiment, the systems and methods are applied to legal (especially case law) documents and legal (especially case law) citations.
Owner:RELX INC

Device, system, and method for determining information relevant to a clinician

A system, method and device for determining and notifying a clinician of information relevant to the clinician. The method that is performed by the device or system includes identifying at least one keyword in a user profile of a clinician, identifying at least one content word in a new information item, determining a relevance score between the new information item and the clinician based on the at least one keyword and the at least one content word and when the relevance score is above a predetermined threshold value, generating a notification for the clinician indicating the new information item.
Owner:KONINKLJIJKE PHILIPS NV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products