Our reference dog: Dauphine!
The first most obvious part of creating a reference genome is to identify a dog who can serve as a quality reference animal! We were very fortunate to be able to work with members of the Irish Wolfhound community in our area to identify Dauphine, who is a very sweet pure-bred female Irish Wolfhound. Using a female for this work is very helpful because it provides two copies of an X-chromosome, meaning that we can do a good job creating a reference for all of the chromosomes including the X chromosome. This obviously leaves out the Y-chromosome, which is oddly very hard to sequence because of the small size and a chromosome that is only just starting to be assembled in dogs in general. In the future we can consider finding a male dog to assemble a Y-chromosome map, but there are still a lot of kinks that need to be worked out regarding the Y-chromosome in dogs in general.
The other good news about Dauphine is that we were able to take a small skin biopsy from her last year and cultured those cells. This is important because it gives us a long-term way of getting DNA if needed over time even after she passes away.
With some extra funding the first thing we did was get more Pacific Biosciences long read sequencing of Dauphine’s DNA. Even just a few years ago long read sequencing had a rather unfortunately downside, in that although it could read long sequences of DNA code, it wasn’t always very accurate. Newer technologies have since arrived, and now long read sequencing is far more accurate and reliable.
What’s the plan for making the assembly?
What can make a genome of very high quality has to do with what is referred to as “read depth.” In short, read depth refers to how many times a given part of DNA has been sequenced, from which a consensus for a given base pair is determined. So having more sequencing data really improves the accuracy of the genome assembly and helps to make sure that the genome is being put together in the correct order. To be even more sure that we have every part of Dauphine’s DNA in the right place, we are also going to use another technology referred to as optical mapping. The optical mapping for this project is being undertaken by our collaborator, Dr. Brian Davis at Texas A&M University. He’s very interested in reference genomes and has a particular interest in osteosarcoma, so it’s great to have him on our team!
Where are we at this point?
We have added more sequencing data from Dauphine’s genome to further optimize the final genome assembly. We are now awaiting Bionano Saphyr optical mapping data. Currently, we have a very high-quality scaffolded draft reference assembly. There are several measures used to evaluate the completeness and quality of a given genome assembly. The N50 of the current draft scaffolded assembly is 64.5Mb and the N75 is 50.0Mb and the assembly consists of only 140 contigs. The Busco score for gene completeness is 89.2% for the mammalia gene set. These values indicate an excellent assembly at this stage. After optical mapping is complete, we still have to form gap filling and polishing to finish the assembly process.
To be blunt, Dauphine’s current genome assembly, even in draft form, is potentially one of the best dog genomes that has been sequenced to date. We were really amazed at the quality and quantity of long-read sequencing reads that the UW-Madison Biotechnology Center was able to provide for Dauphine, and the additional funds provided by the Irish Wolfhound Foundation and Irish Wolfhound Club of America has a lot to do with this. The optical mapping of this work is going to make the final product even better, but at this point we can say with confidence that this work will result in a state-of-the-art reference genome. It will take some time to get the final reference genome constructed; these are gigantic data sets that require very powerful and specialized computers and a lot of time to construct. However, once it is more fully complete and we have been able to use this for the osteosarcoma project, we will be releasing the genome assembly onto public databases for the world to be able to access. This may be a couple of years down the road, but this genome reference will challenge even the most utilized reference genomes available in dogs with its quality. More importantly, the entire hound clade, and Irish Wolfhounds in particular, will have a reference genome that can be used for future studies.
If interested, here is some technical jargon:
This assembly under development is being assembled from using PacBio Sequel II, Oxford Nanopore ProMethion and Illumina NovaSeq sequencing. We have leveraged use of PacBio high fidelity circular consensus sequencing reads for the draft assembly using the hifiasm assembler. We have also now undertaken chromosomal scaffolding using a program called Salsa. At this point, we are ready to undertake a final round of scaffolding using optical mapping followed by gap-filling and genome polishing to generate a reference quality genome assembly for the IW breed.
This work was supported by: The Irish Wolfhound Foundation, the Irish Wolfhound Club of America, a grant from the American Kennel Club Canine Health Foundation (02782), and a Companion Animal Grant (University of Wisconsin – Madison School of Veterinary Medicine).