Research Progress in Area 3 方向 ( 三 ) 課題進展 153 Abstract The current practice for a non-model animal genome assembly is using both long reads and chromosomal linkage information from high throughput chromosome conformation capture (Hi-C). However, Hi-C experiments are costly to perform, involve multiple complex experimental steps and require high quality chromatin samples. Recent studies showed that the Hi-C linkage shares great similarity with DNA modification linage, which enables the possibility of genome scaffolding with whole-genome DNA modification information. In this project, they propose to develop a DNA DNA modification based scaffolding toolkit for genome scaffolding with Oxford Nanopore sequencing technology and test the toolkit in three marine invertebrate genomes. Thewell-annotated chromosomal level genomes, together with DNA modification information, would shed new light on the evolutionary analysis of chromosomal synteny analysis in marine invertebrates. Research Activities and Progress • Collected the genome-wide 5mC modification data from three mussel species using in house megalodon model; • Developed the modscaf (modification based scaffolding) pipeline, which is using binned modification linkagematrix to scaffold the contigs; • Published a pipeline called “trackcluster” and used it to improve genome annotation from the full-length Nanopore long-read RNA-seq data. Key Findings • Genome annotation can be significantly improved by long RNA-seq reads using “trackcluster” pipeline; • Chromosome level assemblies can be achieved using “modscaf” pipeline with Nanopore only data. Over 97% of the contigs in scaly foot snail genome assembly project can be scaffolded into chromosome by using modscaf. Research Output Publication 0 Trained personnel 2 Developing a Novel Genomic Scaffolding Toolkit with the Nanopore DNA Modification Information and its Applications on Deep-sea Invertebrate Genomes Prof. Runsheng Li City University of Hong Kong Fig.1 The scaffolding pipeline using methylation information in solving highly repetitive genomic region.
RkJQdWJsaXNoZXIy NDk5Njg=