SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500
By
Yanqiu Zhou,
Chen Liu,
Rongfang Zhou,
Anzhi Lu,
Biao Huang,
Liling Liu,
Ling Chen,
Bei Luo,
Jin Huang,
Zhijian Tian
Posted 30 May 2019
bioRxiv DOI: 10.1101/652347
(published DOI: 10.1186/s13040-019-0209-9)
Background BGISEQ-500 is based on DNBSEQ™ technology and superior in providing high outputs and requiring less cost. This sequencer has been widely used in various areas of scientific and clinical research. A better understanding of the sequencing process and sequencer performance is essential for stabilizing sequencing process, accurately interpreting sequencing results and efficiently solving sequencing troubles. To solve these problems, a comprehensive database SEQdata-BEACON was constructed to accumulate sequencing performance data in BGISEQ-500. Methods Totally 60 BGISEQ-500 sequencers in BGI-Wuhan lab were used to collect the sequencing performance data. Those lanes in paired-end 100 sequencing using 10bp barcode were chosen, and each lane containing 66 metrics was assigned a unique entry number as ID. The database was constructed in MySQL server 8.0 and the website was built on Apache (2.4.33 win64 VC15 server). The statistical analysis and linear regression models were generated by R program based on the data from November 2018 to April 2019. Results A total of 2236 entries were recorded in the database, including sample ID, yield, quality, machine state and supplies information. According to correlation matrix, the 52 numerical metrics were clustered into three groups signifying yield-quality, machine state and sequencing calibration. The metrics distributions also delivered some patterns and rendered clues for further explanation or analysis of the sequencing process. Using the data of total 200 cycles, the linear regression model well simulated the final outputs. Moreover, the predicted final yield could be provided in the 15th cycle of the early stage of sequencing and the corresponding coefficient of determination R2 of the 200th and 15th cycle models were 0.97 and 0.81 respectively. The data source, statistical findings and application tools were all available in our website <http://seqBEACON.genomics.cn:443/home.html>. These resources can be used as a constantly updated reference for BGISEQ-500 users to comprehensively understand DNBSEQ™ technology, solve sequencing problems and optimize the sequencing process.
Download data
- Downloaded 657 times
- Download rankings, all-time:
- Site-wide: 59,575
- In bioinformatics: 5,664
- Year to date:
- Site-wide: 124,078
- Since beginning of last month:
- Site-wide: 113,174
Altmetric data
Downloads over time
Distribution of downloads per paper, site-wide
PanLingua
News
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!