Unassociated genomic region in reference genome
Pro Research Analysisby 
Searched over 200M research papers
Unassociated Genomic Regions in the Human Reference Genome
Introduction to Non-Reference Sequences (NRS)
Non-reference sequences (NRS) are genomic regions that are not represented in the current human reference genome. These sequences can include structural variations such as insertions, deletions, and alternate alleles, which may have significant functional implications. Recent studies have highlighted the importance of identifying and characterizing these NRS to enhance our understanding of human genetic diversity and disease mechanisms.
Identification and Characterization of NRS
Structural Variations and Alternate Alleles
A comprehensive study compared 31 human de novo assemblies with the current reference genome, identifying 6113 NRS totaling 12.8 Mb. This included 1571 insertions and 3041 alternate alleles, which were defined as having less than 90% identity with the reference alleles. These alternate alleles were found to overlap with 1143 protein-coding genes, including a novel MHC haplotype, and were associated with high tandem repeat content, suggesting their origin from these repeats1.
Global Genetic Diversity
Another study analyzed 338 high-quality human assemblies from diverse populations to identify missing sequences in the reference genome. They discovered 127,727 recurrent non-reference unique insertions spanning 18,048,877 bp, some of which disrupted exons and regulatory elements. This led to the construction of a Human Diversity Reference, which enabled the recovery of previously unmapped reads and the identification of transcription evidence in 4781 gene loci, emphasizing the functional importance of these NRS3.
Epigenomic Insights and Regulatory Elements
The NIH Roadmap Epigenomics Consortium generated 111 reference human epigenomes, profiling histone modifications, DNA accessibility, DNA methylation, and RNA expression. This integrative analysis revealed that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, highlighting the role of epigenomic information in understanding gene regulation and human disease2.
Challenges in Heterochromatic Regions
Heterochromatic regions, composed primarily of tandem repeats like Human Satellites 2 and 3 (HSat2,3), present significant challenges for standard mapping and assembly algorithms. A novel alignment-free method classified HSat2,3 sequences into fourteen subfamilies and predicted their chromosomal distributions. This approach identified 1.3 Mb of non-repetitive sequence interspersed with HSat2,3, including eight annotated gene predictions, providing a framework for studying these complex regions4.
Advances in Complete Chromosome Assembly
The first gapless, telomere-to-telomere assembly of a human chromosome was achieved using high-coverage, ultra-long-read nanopore sequencing. This effort closed all 29 remaining gaps in the human X chromosome, including new sequences from pseudoautosomal regions and cancer-testis ampliconic gene families. This complete assembly allowed for the mapping of methylation patterns across complex tandem repeats and satellite arrays, demonstrating the feasibility of finishing the human genome5.
Missing Genes and Transcriptome Analysis
RNA-Seq analysis of human brain tissues and mixed cell lines revealed that many transcribed regions are absent from the current reference genome. This study identified 104 RefSeq genes with significant expression levels that were unalignable to the reference genome, suggesting the presence of functional genes not yet represented. Additionally, hundreds of novel transcript contigs were discovered, some conserved among humans, chimpanzees, and macaques, underscoring the need for further refinement of the reference genome6.
Conclusion
The identification and characterization of non-reference sequences are crucial for a comprehensive understanding of human genetic diversity and disease mechanisms. Advances in sequencing technologies and integrative analyses have revealed significant gaps and variations in the current reference genome, highlighting the need for continuous refinement and the inclusion of diverse genetic backgrounds. These efforts will enhance the accuracy of genomic studies and improve our understanding of human biology and disease.
Sources and full results
Most relevant research papers on this topic
Recovery of non-reference sequences missing from the human reference genome
This study identified 6113 non-reference sequences and 3041 alternate alleles in the human genome, enriching the spectrum of genetic variations and suggesting their origin is associated with tandem repeats.
Integrative analysis of 111 reference human epigenomes
The 111 reference human epigenomes provide a valuable resource for understanding gene regulation, cellular differentiation, and human disease.
Towards a reference genome that captures global genetic diversity
The Human Diversity Reference, a comprehensive reference genome that captures global human genetic diversity, can recover 402,573 previously unmapped reads and improve genome annotations.
Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly
This study presents a novel method for generating reference databases for unassembled genomic regions enriched with complex satellite DNA, enabling genomic studies of heterochromatic regions and studying patterns of sequence variation within human populations.
Telomere-to-telomere assembly of a complete human X chromosome
The study presents a de novo human genome assembly that surpasses the continuity of GRCh38 and completes the human X chromosome, demonstrating that finishing the human genome is now within reach.
Revealing the missing expressed genes beyond the human reference genome by RNA-Seq
A significant number of functional human genes are still absent from the incomplete human reference genome, highlighting the need for further refining and curating these genes.
A novel canine reference genome resolves genomic architecture and uncovers transcript complexity
The new canine reference genome, GSD_1.0, improves genomic architecture and reveals previously hidden functional elements, potentially impacting phenotypic modifications.
Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions
This study identified 2,363 new insertion sequences in the human genome, revealing new exons and conserved noncoding sequences not yet represented in the reference genome, and developed a method to accurately genotype these new insertions using next-generation sequencing datasets.
Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype
HISAT2 provides more detailed and accurate variant analyses than other methods, enabling HLA typing and DNA fingerprinting with low memory requirements.
Improved maize reference genome with single-molecule technologies
This study has improved the maize reference genome using single-molecule technologies, resulting in a 52-fold increase in contig length and improved assembly of intergenic spaces and centromeres.
Try another search
what animal is closest to dinosaur
types of acidity in goat, cow and sheep milk
What are the implications of emerging research on the health benefits of different types of tea?
The role of artificial intelligence in advancing environmental science and conservation efforts.
What are effective ways to combat loneliness and isolation?
The impact of microplastics on marine ecosystems and human health.