Unassociated genomic region in reference genome
Searched over 200M research papers
Research Analysis by Consensus
Consensus Meter
Unassociated Genomic Regions in the Human Reference Genome
Introduction to Non-Reference Sequences (NRS)
Non-reference sequences (NRS) are genomic regions that are not represented in the current human reference genome. These sequences can include structural variations such as insertions, deletions, and alternate alleles, which may have significant functional implications. Recent studies have highlighted the importance of identifying and characterizing these NRS to enhance our understanding of human genetic diversity and disease mechanisms.
Identification and Characterization of NRS
Structural Variations and Alternate Alleles
A comprehensive study compared 31 human de novo assemblies with the current reference genome, identifying 6113 NRS totaling 12.8 Mb. This included 1571 insertions and 3041 alternate alleles, which were defined as having less than 90% identity with the reference alleles. These alternate alleles were found to overlap with 1143 protein-coding genes, including a novel MHC haplotype, and were associated with high tandem repeat content, suggesting their origin from these repeats.
Global Genetic Diversity
Another study analyzed 338 high-quality human assemblies from diverse populations to identify missing sequences in the reference genome. They discovered 127,727 recurrent non-reference unique insertions spanning 18,048,877 bp, some of which disrupted exons and regulatory elements. This led to the construction of a Human Diversity Reference, which enabled the recovery of previously unmapped reads and the identification of transcription evidence in 4781 gene loci, emphasizing the functional importance of these NRS.
Epigenomic Insights and Regulatory Elements
The NIH Roadmap Epigenomics Consortium generated 111 reference human epigenomes, profiling histone modifications, DNA accessibility, DNA methylation, and RNA expression. This integrative analysis revealed that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, highlighting the role of epigenomic information in understanding gene regulation and human disease.
Challenges in Heterochromatic Regions
Heterochromatic regions, composed primarily of tandem repeats like Human Satellites 2 and 3 (HSat2,3), present significant challenges for standard mapping and assembly algorithms. A novel alignment-free method classified HSat2,3 sequences into fourteen subfamilies and predicted their chromosomal distributions. This approach identified 1.3 Mb of non-repetitive sequence interspersed with HSat2,3, including eight annotated gene predictions, providing a framework for studying these complex regions.
Advances in Complete Chromosome Assembly
The first gapless, telomere-to-telomere assembly of a human chromosome was achieved using high-coverage, ultra-long-read nanopore sequencing. This effort closed all 29 remaining gaps in the human X chromosome, including new sequences from pseudoautosomal regions and cancer-testis ampliconic gene families. This complete assembly allowed for the mapping of methylation patterns across complex tandem repeats and satellite arrays, demonstrating the feasibility of finishing the human genome.
Missing Genes and Transcriptome Analysis
RNA-Seq analysis of human brain tissues and mixed cell lines revealed that many transcribed regions are absent from the current reference genome. This study identified 104 RefSeq genes with significant expression levels that were unalignable to the reference genome, suggesting the presence of functional genes not yet represented. Additionally, hundreds of novel transcript contigs were discovered, some conserved among humans, chimpanzees, and macaques, underscoring the need for further refinement of the reference genome.
Conclusion
The identification and characterization of non-reference sequences are crucial for a comprehensive understanding of human genetic diversity and disease mechanisms. Advances in sequencing technologies and integrative analyses have revealed significant gaps and variations in the current reference genome, highlighting the need for continuous refinement and the inclusion of diverse genetic backgrounds. These efforts will enhance the accuracy of genomic studies and improve our understanding of human biology and disease.
Sources and full results
Most relevant research papers on this topic
Recovery of non-reference sequences missing from the human reference genome
Integrative analysis of 111 reference human epigenomes
Towards a reference genome that captures global genetic diversity
Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly
Telomere-to-telomere assembly of a complete human X chromosome
Revealing the missing expressed genes beyond the human reference genome by RNA-Seq
A novel canine reference genome resolves genomic architecture and uncovers transcript complexity
Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions
Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype
Improved maize reference genome with single-molecule technologies
Try another search
what animal is closest to dinosaur
types of acidity in goat, cow and sheep milk
What are the implications of emerging research on the health benefits of different types of tea?
The role of artificial intelligence in advancing environmental science and conservation efforts.
What are effective ways to combat loneliness and isolation?
The impact of microplastics on marine ecosystems and human health.