Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Systems and methods for leveraging relatedness in genomic data analysis

A technology of kinship and alleles, applied in genomics, biochemical equipment and methods, biological systems, etc., can solve the identification and characterization of kinship and family structure utility variation The degree of availability has not been fully understood and utilized And other issues

Pending Publication Date: 2020-06-12
REGENERON PHARM INC
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Furthermore, the utility of kinship and family structure in these large datasets and the extent to which they can be exploited in the identification and characterization of variants is not well understood and exploited

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Systems and methods for leveraging relatedness in genomic data analysis
  • Systems and methods for leveraging relatedness in genomic data analysis
  • Systems and methods for leveraging relatedness in genomic data analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 11

[0281] Relationship estimation and kinship clarification in a cohort with 61K human exomes

[0282] A cohort with 61K human exomes was analyzed. This cohort stems from a study initiated in 2014 conducted through the Regeneron Genetics Center (RGC) and Geisinger Health System (GHS) (Dewey et al. (2016), Science 354, aaf6814- aaf6814). This DiscovEHR study densely sampled patients in a single healthcare system serving populations with low mobility. The 61K human exome cohort is referred to herein as the DiscovEHR dataset. A large number of family structures were identified in the DiscovEHR dataset, and the simulations disclosed herein predict that when the study targets 250K people, 70%-80% of the individuals in the dataset will have first- or second-degree kinship.

[0283] Different types of family relationships within the data set were identified using identity by ancestry (IBD) estimation, and PRIMUS (Staples et al., (2014), Am. J. Hum. Genet. 95, 553-564) was used to c...

Embodiment 12

[0294] Relationship estimation and phylogenetic interpretation in a cohort with 92K human exomes

[0295]A larger clinical cohort with 92,455 human exomes was analyzed. This cohort was derived from an ongoing study initiated in 2014 through the Regeneron Genetics Center (RGC) and Geisinger Health System (GHS) (Staples et al., (2018), Am . J. Hum. Genet. 102(5): 874-889). This expanded DiscoverEHR cohort was also a dense sample of participants from a single health care system serving a mostly rural population with low migration in central Pennsylvania.

[0296] The set (Example 1.1) comprising the first 61K samples prepared and sequenced was called the "VCRome set". The remaining 31K sample set was prepared by the same method, except that instead of NimbleGen probed capture, a slightly modified version of IDT's xGen probe was used, where the complementary probe was used for capture covered by the NimbleGen VCRome capture reagent, but by the standard xGen Genomic regions wi...

Embodiment 2

[0306] Simulation and kinship projections using SimProgenv

[0307] To model, understand, and predict the growth of relational networks in the DiscovEHR and Extended DiscovEHR datasets, the Suppression Simulation Framework (hereafter referred to as "SimProgeny") was developed, which is capable of simulating The genealogy of millions of people in the From these simulated populations, various sampling methods can be modeled and the amount of kinship a researcher should expect to find for a given set of populations and sampling parameters can be estimated (see Example 17).

[0308] The DiscovEHR and expanded DiscovEHR populations were simulated using SimProgeny and the top 61K and top 92K participants were identified from them, respectively. Simulations indicated that DiscovEHR and Extended DiscovEHR participants were not randomly sampled from the population, but that the datasets were enriched for close kinship. Such as Figure 14A and Figure 14B As shown, the real data a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Methods, non-transitory computer-implemented methods and systems for identifying compound heterozygous mutations (CHMs) and de novo mutations (DNMs) in populations are provided. Also provided are methods for phasing genetic variants in a population by leveraging the populations relatedness. Further provided is a prediction model of relatedness in a human population.

Description

[0001] Cross References to Related Applications [0002] This application claims the benefit of U.S. Provisional Patent Application No. 62 / 555,597, filed September 7, 2017, which is hereby incorporated by reference in its entirety. In addition, the entire contents of the co-pending application titled "System and Method for Predicting Relatedness in a Human Population" filed on September 7, 2018 are also registered as Incorporated by reference. technical field [0003] The present disclosure generally relates to methods and systems for analyzing genomic data and linking rare genetic variation to disease and disease susceptibility using kinship across large population cohorts. More specifically, the present disclosure relates to systems and methods for establishing identity by ancestry, and phasing genetic variation into compound heterozygous mutations or de novo mutations. Background technique [0004] Human disease conditions are not only caused by and influenced by enviro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G16B20/20G16B20/40G16B5/00G16H70/60G16H50/50G16B30/00G16B40/00G16H10/60G06F16/28C12Q1/68
CPCG16B40/00G16B20/20G16B5/00G16B30/00G16H10/60G16H50/50G16H70/60G06F16/288G16B20/00
Inventor J·斯泰普尔斯L·哈贝格J·里德
Owner REGENERON PHARM INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products