Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Specific function-related gene information searching system and method for building database of searching workds thereof

A technology of information retrieval and specific functions, applied in special data processing applications, electronic digital data processing, instruments, etc., can solve the problems of wasting manpower and material resources, difficult to retrieve, and cannot be automatically analyzed, so as to save manpower and material resources, Easy to learn and master, and easy to promote commercially

Inactive Publication Date: 2006-03-08
SOUTHERN MEDICAL UNIVERSITY
View PDF0 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] 1) The search was not comprehensive, and gene names included only in the Genome Database (http: / / www.gdb.org) and GENATLAS (http: / / www.dsi.univ-paris5.fr / genatlas / ) databases were omitted and aliases
2) Only the name and alias of the gene are retrieved in the title, and the scope cannot be extended to the abstract, and many relevant documents corresponding to the gene names that are only mentioned in the abstract but not mentioned in the title cannot be retrieved
3) The method of obtaining relevant literature of genes is purely manual, which is time-consuming and laborious
4) Utilize commercial software Provalis Research to carry out word frequency analysis, be difficult for grasping, loaded down with trivial details and error-prone, reason is that commercial software Provalis Research mainly is to analyze the word frequency of newspaper and periodical and develops, and for universality, software function is done complicated and huge; Furthermore, This method needs to convert the file format before doing word frequency analysis, and generate a new file. When analyzing word frequency, it is necessary to match the original file, new file, and result file one by one, which is easy to miss; especially this method cannot automatically analyze the relevant literature of all genes. , can only be analyzed one by one, very mechanical and cumbersome
5) The keywords automatically obtained based on the frequency value sometimes have no biological function meaning, and are prone to false positives, and sometimes omissions
6) The relationship between the obtained keywords and the current biological function is different. After direct clustering, the relationship between high and low frequencies often occurs, and the genes are clustered under the keywords with a low degree of relationship (or even irrelevant) with the current biological function, so that It is difficult to obtain relevant genes with current biological functions
7) The keywords that characterize a certain element of a specific function-related gene often have multiple synonyms or multiple variants can be regarded as the same entity, which will easily lead to the dispersion of clusters and the inconvenience of browsing due to too many keywords
8) Manual search, it is difficult to retrieve relevant documents containing multiple keywords and their synonyms
9) Searching directly with the name of the gene to be searched, each searcher has to go through the complicated and slow process of extracting the relevant information of the gene to be searched from the public gene name database to calculate the word frequency, find the base value, determine the string and auxiliary search terms The processing process wastes a lot of human and material resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Specific function-related gene information searching system and method for building database of searching workds thereof
  • Specific function-related gene information searching system and method for building database of searching workds thereof
  • Specific function-related gene information searching system and method for building database of searching workds thereof

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0078] Example 1 (the establishment method of the human literature search term database):

[0079] The document retrieval term database of the present invention is made up of gene name database, term frequency basic value database, character string database and auxiliary retrieval term database, wherein character string database and auxiliary retrieval term database are to obtain the new gene record in the gene name database respectively built by different techniques. The composition and construction of the specific human literature search term database are also the same, and are described in detail as follows:

[0080] 1. The construction of the human gene name database (see Figure 9 )

[0081] In order to make gene names and aliases more comprehensive, this embodiment collects and integrates four public gene information databases: HUGONomenclature Committee (http: / / www.gene.ucl.ac.uk / nomenclature / ), Entrez Gene (http: / / / www.ncbi.nlm.nih.gov / entrez / query.fcgi?DB=gene), the...

example 2

[0103] Example 2 (the establishment method of the animal and plant literature search term database):

[0104] Since the Entrez Gene database contains the gene name information of almost all sequenced species, we can extract the gene name information of animals and plants of interest to construct a gene name database.

[0105] Animals Taking mice as an example, first obtain the official abbreviations, full names, aliases and product names of 48,039 genes from Entrezgene, build a gene name database for mice, then randomly call 250 genes, retrieve relevant literature, analyze word frequency, and establish Word frequency base value database. Then, the database of gene names is processed to generate a database of character strings and a database of auxiliary search terms for mice. The data processing method and process in the process of building the database are exactly the same as those for establishing the base value database of word frequency, character string database and auxi...

example 3

[0107] Example 3 (the establishment method of the microbiological literature search term database):

[0108]The establishment method of the microbial literature search term database takes Epstein-Barr virus as an example. Firstly, the official abbreviations, full names, aliases and product names of 90 genes are obtained from the Taxonomy database and the Swiss-Prot protein database, and the gene name database of Epstein-Barr virus is established. Then manually set the character string database and auxiliary search word database of EB virus. Retrieve the relevant literature of these 90 genes, analyze the word frequency, and establish a word frequency base value database. The data processing method and process in the process of building the database are exactly the same as that of building the base value database of word frequency of human genes, and can also be carried out with reference to Example 1.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The system searches out relevant documents of gene to be searched through following devices and steps: computer of having input function and display terminal, database of docuterm for document composed of database of gene name, database of word frequency base value, character-string database and database of assistant docuterm, as well as public database of biomedicine document entered through network server. The method includes following steps: carrying out analysis of word frequency, picking out keywords of gene; through professional process, building list of word frequency; finally, searching out information of gene relevant to specific function through cluster analysis. Features are: accurate positioning, searching in quick speed, avoiding rehandling so as to save human and material resources greatly, and suitable commercial development.

Description

Technical field: [0001] The invention relates to a system for retrieving relevant gene information from existing gene information databases, in particular to a system for retrieving specific function-related gene information from public gene information databases. technical background: [0002] With the deepening of life science research, people now know that the abnormality of specific biological functions is caused by abnormal expression of certain genes or abnormal modification of expression products in organisms. These genes are called the genes involved in the biological function. For this reason, since the 1990s, humans have started genome projects, and many biological genomes (eg, yeast, human, rice, chicken, mouse, etc.) have been sequenced. Sequencing results show that the genome of microorganisms is composed of genes ranging from a few to several thousand; the genome of humans has more than 25,000 genes, which is equivalent to that of humans, and even up to hundre...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F19/00G06F19/28
Inventor 黄仲曦姚开泰
Owner SOUTHERN MEDICAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products