This is part of a two-part blog; Part 1 can be found here. Okay, you’ve created your Hi-C library, the analytical QC checks discussed in part 1 look good so now you are ready to sequence. What do you need to know?
Again, we will focus on the requirements of the Dovetail® Omni-C® and Micro-C Kits and this time delve into the computational QC process enabling you to build confidence in the quality of the data generated.
With the assay completed and your library in hand, you are now ready to assess the quality of your library. The following four steps enable assessment of library quality.
Sequence the library using paired-end sequencing (2 x 75 bp, 2 x 100 bp, or 2 x 150 bp). Given the Dovetail® computational QC process requirements, you’ll need to sequence between 1 and 2 million total read pairs. Whether you perform sequence yourself or submit to a service provider, your Omni-C or Micro-C libraries can be treated like any other next generation sequencing library of high diversity. No special sequencing considerations are required.
TIP: check the quality of your sequencing run and if needed, reach out to your sequencing service provider.
Dovetail’s readthedocs page has easy-to-use guidelines for QC analysis that takes a few hours to run, depending on the sample genome and the sequencing depth. The readthedocs pages for Dovetail® Omni-C®and Micro-C Kits contain a transparent description of the workflow and the tools used to process the read pairs. You can rest easy knowing that our analysis approach is aligned with the 4D Nucleome Consortium best practices. To access these pages, click on the links below:
Here’s a quick run through of the pipeline:
Once the workflow is completed, a simple python script counts and summarizes the key QC metrics of a proximity-ligation library and display them in a table. We have created a guide that walks you through how each QC metric was computed and what it means.
The summary table reports key QC metrics: library complexity and percentages of PCR duplicates, valid interaction read pairs, and long-range interactions captured in the library.
It is now time to put to work the cut-off values which we determined for the QC metrics and assess the quality of your library before you move forward with deep sequencing. The cut-off values for the QC metrics are included in the readthedocs pages for each product. Based on the expected Omni-C and Micro-C library complexity, we would not recommend sequencing the library beyond a maximum of 300 M read pairs. If you require deeper sequencing, multiple libraries can be generated from a single proximity ligation assay. If your library QC metrics meet all cutoffs, you are ready to move onto deep sequencing and data interpretation. That will have to be a topic of a future blog.