Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

245 results about "Encyclopedia" patented technology

An encyclopedia or encyclopaedia is a reference work or compendium providing summaries of knowledge either from all branches or from a particular field or discipline. Encyclopedias are divided into articles or entries that are often arranged alphabetically by article name and sometimes by thematic categories. Encyclopedia entries are longer and more detailed than those in most dictionaries. Generally speaking, unlike dictionary entries—which focus on linguistic information about words, such as their etymology, meaning, pronunciation, use, and grammatical forms—encyclopedia articles focus on factual information concerning the subject named in the article's title.

Systems and methods for natural language processing

Methods, systems and computer programs for automatic, highly accurate machine comprehension of a plurality of segments of free form unstructured text in a natural language. The system answers a plurality of complex, free-form questions asked in a natural language, based on the totality of input text. The system further uses a multi-dimensional data model to measure the total effects of actions / verbs acting on various unique nouns present in the input text. The system may convert the questions into another multi-dimensional data model and may then compare the two data models in program memory to derive the answers to the posed questions. The system may then automatically detect unknown words and optionally look them up in digital information sources, such as online dictionaries and encyclopedias, to fill in the gaps in knowledge to answer the questions with expert-like reliability.
Owner:INSTAKNOW COM

Semi-automatic construction method for knowledge base of encyclopedia question answering system

The present invention relates to a semi-automatic construction method for knowledge base of an encyclopedia question answering system, in which concept-oriented systematic templates are designed and important fact information related to entries is automatically extracted from summary information and body of the encyclopedia to semi-automatically construct the knowledge base of the encyclopedia question answering system. A semi-automatic construction method for knowledge base of an encyclopedia question answering system of the present invention comprises the steps of: (a) designing structure of the knowledge base with a plurality of templates for each entry and a plurality attributes related to each of the templates; (b) extracting structured information including the entry, an attribute name and attribute values from summary information of the encyclopedia; (c) extracting unstructured information including an attribute name and attribute values of the entry from a body of the encyclopedia; and (d) storing the structured information and the unstructured information in corresponding template and attribute of the knowledge base according to the entry.
Owner:ELECTRONICS & TELECOMM RES INST

Question and answer method based on knowledge graph, and agricultural encyclopedia question and answer system

The invention provides a question and answer method based on a knowledge graph, and an agricultural encyclopedia question and answer system. A natural language question raised by a user can be automatically analyzed; a topological structure based on a syntax tree is formed; retrieval and comparison are carried out through the topological structure and a question template in a grammar library; according to a mapping relation between the topological structure and a predicate nominatum, and a mapping relation between a synonym set and a relation or an attribute in the knowledge graph, a question-mapped predicate is obtained; in combination with an entity identified in the question, a final structured knowledge graph query statement is generated; retrieval is carried out in the knowledge graphaccording to the query statement; and a final result is returned. When the relevant topological structure cannot be retrieved in a question template library, the question answering is carried out bycalling common question-answer pairs of an FAQ question library. The question and answer system can give accurate answer retrieval for the question posed by the user, so that the satisfaction degree of the user to the agricultural encyclopedia question retrieval is improved.
Owner:南京柯基数据科技有限公司

Systems and methods for employing an orthogonal corpus for document indexing

The invention provides for indexing and cataloging of content on the Internet, as well as from other stores of information, may be performed by applying a process that employs an orthogonal corpus, or corpora, of information, such as an Encyclopedia. To this end, the processes described herein identify the topics discussed within the corpus. The process also identifies within the corpus a set of keywords that are relevant to the topics presented in the corpus. The keywords associated with a topic may be employed to identify documents stored in another database that are related to the topic. A graphical representation of the index of topics found in the corpus may then be generated, with individual topics operating as links to these related documents. Thus, a user interested in reviewing content in the corpus related to a certain topic, may also activate a link in the graphical representation of the index to access other documents that have been identified as related to the topic of interest to the user.
Owner:LINKAPEDIA INC

Video manager and organizer

An online video search system, including a tag discoverer including a web encyclopedia crawler for (i) accessing a web encyclopedia to find web pages related to at least one designated reference topic, and (ii) retrieving a plurality of web pages by performing an n-level depth recursive traversal of the web pages found, and web pages that are hyper-linked thereto, a concept extractor for extracting important concepts founds in the retrieved plurality of web pages, and a user interface for providing at least of the important concepts extracted by the web page processor to an online video search engine. A method and a computer-readable storage medium are also described and claimed.
Owner:GULA CONSULTING LLC

Method for constructing knowledge graph based on entity extraction and relationship mining of rule model

The invention relates to a method for constructing a knowledge graph based on entity extraction and relationship mining of a rule model. The method comprises the following steps: step 1: crawling data of an encyclopedia knowledge base of a target region, and defining dictionaries of foods, pesticides, nutrition and plant diseases and insect pests, so as to be convenient for rule mining; step 2: carrying out HTML (Hypertext Markup Language) label removal on encyclopedia type data to obtain Chinese texts and obtaining a URL (Uniform Resource Locator) link, so as to be convenient for subsequent processing; step 3: obtaining more complete entity attribute information by adding manually annotated relation attribute information; and step 4: obtaining an event and establishing a graph relation. According to the method provided by the invention, text information is converted into word vector mathematical information; vector similarity comparison is carried out and a relation between entities is labeled according to a relation between numbers, so as to represent a core knowledge base for the field and improve and optimize search quality; and a process from a simple character string to entity comprehending is realized.
Owner:湖南中科优信科技有限公司

Virtual learning environment for children

InactiveUS6517351B2Enhance his or her motor skillsEnhanced interactionReadingElectrical appliancesEncyclopediaBody movement
A virtual learning system environment which provides for progressive education of children, at their own pace, through enhancement in both language arts (e.g. spelling, reading comprehension) and physical skills (interactive prompts). The system of this invention includes a microphone for sensing an audible word or command, a video camera for sensing bodily movement, and means for effecting a computer generated response to said audible word or command, or said bodily movement, wherein said response includes both graphical depiction of the letters of said audible word or command, an object image corresponding to bodily movement or said audible word or command, and an action or object related to said bodily movement or said audible word or command, or any combination thereof, so as to effect a progressive learning or teaching experience. The system also provides for direction to a pathway alternative to said system based upon a series of links, similar to the Encarta Encyclopedia, to web pages and the like, where it directs the child to additional sources of information concerning the one or more aspects of the learning exercise. In addition, the system allows for a live or computer mediator to monitor the progress of the learning experience.
Owner:HANGER SOLUTIONS LLC

Method and apparatus for organizing data by overlaying a searchable database with a directory tree structure

Method and apparatus for organizing data by overlaying a searchable database with a directory tree structure. The method includes generating the directory tree structure that includes nodes comprising a designated category for each node and branches comprising links between the nodes, and generating one or more pointers. Each pointer corresponds to a specific node and the pointer links the specific node to an item of data within the searchable database. All pointers associated with the specific node link related items of data corresponding to the designated category. Each node within the directory tree structure can include a corresponding html address. Items of data can be web-based multimedia including audio, video, images, and appropriately formatted text, displayed in an encyclopedia-like format. Nodes, branches, and pointers within the directory tree structure can continually be added, edited, or deleted.
Owner:BYTEWEAVR LLC

Building method of knowledge map based on vertical field

The invention provides a building method of a knowledge map based on a vertical field. The method comprises the following steps of (1) extracting the word realization of classes of an on-line encyclopedia and the hyponymy between classes; (2) merging the field knowledge information, defining the data attribute and the relationship attribute of the field, and further setting the statute on the definition domain and the value domain of the attributes; (3) studying an entity layer, i.e., extracting an entity and filling the attribute value of the entity; performing mass processing on structurized and semi-structurized data by D2R or data collecting tools; and for non-structurized text data, defining the classes and the attributes of the upper layer body and the relationship between the classes and the attributes, and recognizing examples according to the relationship between the classes and the attributes. The method has the advantages that by using the method, the built knowledge classification of the vertical field knowledge map is clear; the self study and the automatic expansion of the knowledge map are realized; and the key effects are achieved on the information retrieval and semantic analysis of the vertical field.
Owner:QINGDAO PENGHAI SOFT CO LTD

Method of Peer Review of a Web-Based Encyclopedia

The invention concerns a method of creation, maintenance, and peer-review of web-based collectively written encyclopedia. The invention combines the mechanism of Wiki-style collaborative environment, which allows users to modify articles, with the principles of peer-reviewed encyclopedias, in which articles are approved (i.e., endorsed) by experts. In the preferred embodiment, each article has a curator or curators who are responsible for the article content. Each article can be modified by users, but the modification is hidden from the general public until it is evaluated and approved by the curators. The encyclopedia stores the history of all revisions and evaluations. If the curators fail to evaluate the modification within a certain predefined period of time, the curatorship of the article is offered to the person who made useful modifications to the article (according to the history of evaluations). This method ensures that each article has a curator who maintains its content in a timely manner.
Owner:IZHIKEVICH EUGENE M

Webpage information extraction method

The invention discloses a webpage information extraction method, in particular a method for extracting concept attributes from a network encyclopedia data source and processing the concept attributes. The method comprises the following steps of: constructing an example list, and extracting candidate attributes of examples in the list from a multi-source heterogeneous data source; performing synonymic induction on the extracted attributes, and putting synonymic attributes in the same set; sub-classifying the induced attributes; analyzing the corresponding attribute value types of the classified attributes; and recommending the attributes and corresponding attribute value type information to a user, or storing the attributes and the corresponding attribute value type information into a structured database. By adoption of the scheme of the invention, high-quality concept attribute information can be extracted from a webpage, a knowledge base can be better constructed, and other natural language processing tasks such as extraction of attribute values, text classification and classification of query logs in a search engine can be better performed.
Owner:PEKING UNIV

Method for answering with natural language

InactiveCN102637192ATimely and effective answerSpecial data processing applicationsQuestion analysisUser input
The invention discloses a method for answering with natural language, which is used for instantly and effectively answering the questions of users. The method comprises the following steps of: (1) performing question analysis on a question input by a user; (2) answering the question by use of the question analysis result and the corpus of community questions and answers; (3) answering the question by use of the question analysis result and the encyclopedia corpus; and (4) verifying and selecting the answers returned by the steps (2) and (3), and finally returning the best answer to the user.
Owner:TSINGHUA UNIV

Knowledge map-based Chinese tourism domain knowledge service platform construction method

The invention discloses a knowledge map-based Chinese tourism domain knowledge service platform construction method. The method comprises the steps of obtaining structured tourism knowledge from an existing Chinese encyclopedia knowledge base, fusing the knowledge and crawling tourism website page data; performing knowledge completion on the Infobox attributes of the entity through a self-definedattribute matching rule; using a Stanford ontology modeling tool Protege to construct a tourism domain ontology, and combining D2RQ with the constructed tourism domain ontology to convert data into anRDF triple format to obtain a tourism domain knowledge graph and a Neo4j graph database storage task of a tourism knowledge base; wherein the knowledge fusion task comprises the steps of calculatingsemantic similarity between entities by using an improved deep learning knowledge representation model BERT to complete entity alignment, performing attribute fusion based on a principle and a statistical method, and performing a triple fusion subtask by adopting a majority voting algorithm. According to the invention, tourists can obtain one-stop comprehensive services conveniently.
Owner:SHAANXI NORMAL UNIV

Text similarity detection device

The invention discloses a text similarity detection device. The text similarity detection device comprises the following steps: constructing a thesaurus according to classification labels of Baidu Encyclopedia entries; inputting two Chinese documents needing to be compared, and pre-processing the two Chinese documents respectively; filtering words in the two Chinese documents and removing repeated words to generate a word item set; dividing word items in the word item set into a specialized word set and a common word set; aligning specialized words in two sentences in the two Chinese documents and aligning common words in the two sentences; calculating the similarity, relative to the word with the corresponding property, of each word respectively; and calculating the similarity of each sentence in the two Chinese documents. According to the method, manpower resources are saved to the greatest extent, and the judgment accuracy and the judgment speed of a computer network system to Chinese are improved.
Owner:CHINA AGRI UNIV

Natural Language Relatedness Tool using Mined Semantic Analysis

Mined semantic analysis techniques (MSA) include generating a first subset of concepts, from a NL corpus, that are latently associated with an NL candidate term based on (i) a second subset of concepts from the corpus that are explicitly or implicitly associated with the candidate term and (ii) a set of concept association rules. The concept association rules are mined from a transaction dictionary constructed from the corpus and defining discovered latent associations between corpus concepts. A concept space of the candidate term includes at least portions of both the first and second subset of concepts, and includes indications of relationships between latently-associated concepts and the explicitly / implicitly-associated concepts from which the latently-associated concepts were derived. Measures of relatedness between candidate terms are deterministically determined based on their respective concept spaces. Example corpora include digital corpora such as encyclopedias, journals, intellectual property datasets, health-care related datasets / records, financial-sector related datasets / records, etc.
Owner:THE UNIV OF NORTH CAROLINA AT CHAPEL HILL

Text categorization using external knowledge

A method of providing weighted concepts related to a sequence of one or more words, including: providing on a computer an encyclopedia with concepts and a document explaining each concept, forming a vector, which contains the frequency of the word for each concept, for each word in the encyclopedia, arranging the vector according to the frequency of appearance of the word for each concept, selecting the concepts with the highest frequencies for each word from the vector, truncating the rest of the vector, inducing a feature generator using the truncated vectors; wherein the feature generator is adapted to receive as input one or more words and provide a list of weighted concepts, which are most related to the one or more words provided as input.
Owner:TECHNION RES & DEV FOUND LTD

Method and equipment for building indexes and matching inquiry input information of user

The invention provides a method and equipment for building indexes and matching inquiry input information of a user. According to text information, structural information is determined and subject words are extracted; according to a subject corresponding to the subject words, label words corresponding to the subject are determined; and the indexes are built form the subject words and the label words. Moreover, the subject words and the label words are obtained through analysis of the inquiry input information input by the user, matching inquiry is carried out in the built indexes, and candidate text information is obtained; according to the semantic matching degree of the candidate text information and the inquiry input information, and target text information matched with the inquiry input information is determined. Compared with the prior art, on the basis of encyclopedia or other network resource knowledge, extracting of subjects and titles is carried out, effective description of resource knowledge content is formed, and accordingly semantic searching of the resource knowledge is more efficient, the searching requirements of the user for complicated descriptions which the user cannot accurately express by means of key words are met, and use experience of the user is improved.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Blind person Internet system based on voice technology

The invention relates to a system suitable for a blind person to surf the Internet, comprising an automatic server news downloading system and a client application system. A server system can realize real-time news downloading, store the news into a server and realizes real-time updating; and a client system can correspondingly respond to voice input of the blind person by virtue of voice recognition and synthesis and outputs a voice. For special case of the blind person, the system realizes three core functions of the Internet, namely information acquisition, knowledge learning and interaction. By applying the system provided by the invention, the blind person can effectively listen to news by virtue of the Internet, listen to an electronic books, inquire encyclopedia nouns and report and listen to a post, wherein the news listening system can support the blind person to sequentially listen to news in programs, and the blind person can pay close attention to interested news by keyword search and related news search.
Owner:BEIHANG UNIV +1

A field entity attribute relation extraction method based on distance supervision

The invention relates to a field entity attribute relation extraction method based on distance supervision, belonging to the technical field of natural language processing and depth learning. The method inlcudes constructing a domain knowledge base of Chinese tourist attractions, through the Chinese encyclopedia website and tourism website to obtain a large number of tourism domain text collections, using the constructed tourism domain knowledge base of entity pairs to obtain the relational instance text collections from the tourism domain text collection; using the theme model keyword similarity calculation and keyword pattern matching to denoise; finally, using the training corpus which is composed of positive and negative data under each relationship, the part-of-speech feature, dependency feature and short syntax tree feature of the training corpus are extracted, and the three features are fused into a larger feature with more abundant semantic information, and then the relationship extraction model is trained. Experiments show that the F value of the fusion of the three features extracted from the de-noising training corpus is the highest and the extraction performance is thebest.
Owner:KUNMING UNIV OF SCI & TECH

Method for establishing mapping knowledge domain based on book catalogue

The invention discloses a method for establishing a mapping knowledge domain based on a book catalogue. The method comprises the steps that a catalogue page in a digitized book is extracted, the lengths of items in the catalogue are differentiated, and part-of-speech tagging is conducted on the long items through a natural language processing tool, so that part-of-speech arrays are obtained, and candidate nodes are extracted according to rules of conjunctions, punctuations and parts of speech; the long items and the short items are authenticated in the Baidu encyclopedia and the Hudong encyclopedia, a leader-member relation and parallel relations are formed through a catalogue structure and serve as a framework of the mapping knowledge domain, the strong and weak parallel relations are differentiated and serve as increments respectively, and the leader-member relation is supplemented with the strong and weak parallel relations; according to a noisy data excavating algorithm with suffixes serving as a base, nodes are selected from the items which do not pass the authentication of the encyclopedias and the mapping knowledge domain is supplemented with the selected nodes; finally, the weights of relations in the supplemented mapping knowledge domain are calculated and ranked, so that noise is removed through screening. Compared with an existing mapping knowledge domain, the mapping knowledge domain established through the method is richer in node, better in expandability and higher in accuracy.
Owner:ZHEJIANG UNIV

Field encyclopedia establishment system based on general encyclopedia websites

The invention belongs to the technical field of open knowledge extraction and specifically relates to a field encyclopedia establishment system based on general encyclopedia websites. The system is divided into a plurality of modules, namely an encyclopedia data crawling module, an encyclopedia data preprocessing module, a related entity searching and ranking module and an entity clustering module. The field encyclopedia establishment system based on the general encyclopedia websites has the following beneficial effects: the field encyclopedia is mostly established manually at present, which takes time and labor, and as all related entities cannot be found out manually, the coverage rate is low; instead, the field encyclopedia is established on the basis of the field related entities found out by the field encyclopedia establishment system, and in this way, the labor of establishing the field encyclopedia can be greatly reduced and the coverage rate can be greatly increased; meanwhile, the field encyclopedia established by the field encyclopedia establishment system is greatly convenient for users to obtain the knowledge in specified fields; complex searching and screening processes are omitted, and the pattern that a user passively searches for information is changed into the pattern that the system initiatively provides information.
Owner:FUDAN UNIV

Method and system based on encyclopedia data for classifying entities

The invention relates to a method and a system based on encyclopedia data for classifying entities. The method comprises the following steps of analyzing items of the entities in the encyclopedia data to obtain a descriptive attribute set corresponding to the entities; extracting categories set in the encyclopedia data and an attribute template set corresponding to the categories; determining the categories of the entities and classifying the entities based on the similarity of the descriptive attribute set and the attribute template set. According to the method, the descriptive attribute set in the encyclopedia data is compared with the preset attribute template set, the entities are classified, the data of which the similarity is lower than a threshold value are clustered, and thus the purpose of classifying the data is realized.
Owner:BEIJING QIHOO TECH CO LTD +1

User-context-based search engine

A method and apparatus for determining contexts of information analyzed. Contexts may be determined for words, expressions, and other combinations of words in bodies of knowledge such as encyclopedias. Analysis of use provides a division of the universe of communication or information into domains, and selects words or expressions unique to those domains of subject matter as an aid in classifying information. A vocabulary list is created with a macro-context (context vector) for each, dependent upon the number of occurrences of unique terms from a domain, over each of the domains. This system may be used to find information or classify information by subsequent inputs of text, in calculation of macro-contexts, with ultimate determination of lists of micro-contests including terms closely aligned with the subject matter.
Owner:GOOGLE LLC

System and methods for growth, peer-review, and maintenance of network collaborative resources

System and methods for managing collaborative content resources, such as blogs, collaborative portals, and encyclopedias. In one embodiment, the collaborative resources comprise so-called “wikis” managed within an encyclopedia environment comprising a group of curators. The curators sponsor, peer-review, and accept or reject articles written by experts. When an article is accepted, the senior author joins the group of curators. Each accepted article has a curator and a group of assistant curators. When a registered user modifies the article, the modification is not shown to the public until it is approved by the curator or at least one assistant curator of the article. Upon approval, the user joins the group of assistant curators of the article. Each user has a rank, which in one variant reflects the number of times the approval or rejection decision by the user coincided with the approval or rejection decision by the curator.
Owner:SCHOLARPEDIA

A multi-data source oriented network data collection and presentation method

The invention discloses a multi-data source oriented network data collection and presentation method. Based on the research of data collection strategies of six media platforms, such as Sina Weibo, People's Daily, Baidu Encyclopedia, Baidu Tieba, WeChat Public Homepage and Dongfang Wealth Stock Bar, the method adopts Servlet background scheduling technology to integrate the web crawler oriented tomulti-data sources, and solves the data collection problem of different media platforms. In this implementation, Firstly, the manual operation such as simulated login is realized by means of Web application test kit Selenium. Secondly, the Xpath element query technology is used to analyze the source code of the web page, and the data information is extracted and stored in the database. Finally, the crawled data is read out from the database and displayed on the front-end page. Experiments show that the crawler achieves the maximum collection efficiency on the premise of ensuring data integrity.
Owner:BEIJING INFORMATION SCI & TECH UNIV

Chinese integrated entity linking method based on graph model

The present invention discloses a Chinese integrated entity linking method based on a graph model. An ambiguous entity in a text can be mapped into a specific entity in a real world, in order to provide aid for knowledge base expansion, information extraction and search engines. The method mainly comprises three parts of generating a candidate entity, constructing an entity indicator diagram, and disambiguating an integrated entity. For a given text, an entity referent item therein is recognized to obtain the candidate entity. The entity referent item and the candidate entity thereof are regarded as graph nodes to construct an entity referent graph. An in-degree and out-degree algorithm is applied to the entity indicator diagram for implementing disambiguation of multiple ambiguous entities in the text. The present invention does not depend on the knowledge base completely in the establishment of the entity indicator diagram, and also can implement incremental evidence mining to find evidence on an encyclopedia webpage. Dependence path analysis is employed to find the possibly related entity referent item. When the dependence path sizes of two entity referent items are within a set range, the two entity referent items are regarded as the possibly related entity referent items. Further, whether their candidate entities have relations in the real world is determined, so that the efficiency of disambiguation is greatly improved.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA

Chinese word similarity calculation method based on fusion strategy

InactiveCN109960786AImprove the similaritySpearman's correlation coefficient is highNatural language data processingSpecial data processing applicationsSpearman's rank correlation coefficientPattern recognition
The invention relates to a Chinese word similarity calculation method based on a fusion strategy. The method comprises steps of calculating word similarity based on the combination of four of HowNet,synonym forest, Word2Vec trained Chinese Wikipedia encyclopedia corpus and a Baidu dictionary; for two input words, firstly, judging whether the synonyms exist in a HowNet or synonym forest or not; ifyes, using the HowNet or synonym forest for calculating the similarity, if not, judging whether the HowNet or synonym forest exists in the Wikipedia corpus or the Baidu dictionary or not, and if yes,using the Word2vec or the Baidu dictionary for calculating the similarity of the words. The invention provides a Chinese word similarity calculation method based on a fusion strategy. According to the fusion strategy, the known network, the synonym forest, the word2vec and the Baidu dictionary are comprehensively considered, advantage complementation among strategies is formed, the calculated Spearman correlation coefficient and Pearson correlation coefficient are higher than those of other methods, the accuracy of a word similarity calculation result is improved, and the requirements of practical application can be well met.
Owner:BEIJING INFORMATION SCI & TECH UNIV

Word2vec-based remotely supervised non-taxonomic relation extraction method and system

InactiveCN107145503AGuarantee the effect of removing label noiseImprove accuracySpecial data processing applicationsText database clustering/classificationEncyclopediaTaxonomic relation
The invention discloses a word2vec-based remotely supervised non-taxonomic relation extraction method and system, which can extract non-taxonomic relations in the field of vegetables. The method comprises the steps of crawling network vegetable field non-structured text data of a network encyclopedia and a large vegetable website to serve as corpora, and preprocessing the corpora in sequence to obtain primary training corpora; and training a word2vec model by utilizing the primary training corpora, and obtaining a spatial vector of each sentence by utilizing the word2vec model; aggregating the primary training corpora according to the types of the non-taxonomic relations, and for aggregated data of each relation, extracting a common sentence mode and an uncommon sentence mode; selecting two sentence spatial vectors accordant with two different modes as initial centers of a k-means clustering method, clustering all sentence spatial vectors, selecting a category accordant with the common sentence mode, and obtaining the training corpora with relatively high quality; and training a convolutional neural network model by the training corpora with relatively high quality, and through a fully connected softmax layer, extracting the non-taxonomic relations.
Owner:CHINA AGRI UNIV

Method for automatically establishing back-of-book indexes of book based on book contents

The invention discloses a method for automatically establishing back-of-book indexes of a book based on book contents. The method comprises the following steps: first, analyzing text in a digital book; taking a chapter as a unit, and performing part-of-speech tagging on the text by using a natural language processing tool to obtain a part-of-speech array; matching by utilizing a high-frequency part-of-speech rule, and extracting candidate phrases; then, classifying to obtain the phrases to serve as candidate index terms by using a support vector machine algorithm by utilizing semantic and grammatical characteristics; calculating the similarity between the candidate index terms and the field corresponding to the book to serve as termhood; calculating information amounts, term frequency, point mutual information and encyclopedia key values to obtain an index degree; calculating a title offset distance, a candidate index term proportion and interestingness to obtain a context weight value; finally, combining the termhood, the indexing degree and the context weight value to obtain an index score, and obtaining the book index terms through limited sequencing. According to the method, the indexes can be added to the book which does not have the back-of-book terms, and the readability and the searchability of the book are improved.
Owner:ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products