1. Introduction
Liquid biopsies have gradually become a rapidly developing technique of in vitro diagnostic (IVD) due to its low invasiveness and availability [1]. Most body fluids including serum, urine, saliva all can be the carriers of disease biomarkers [2] [3]. In the 1940s, extracellular nucleic acids (exDNA and exRNA) were found in the blood [4]. The nucleic acids were passively or actively released into the blood from various tissues and cells [5]. Therefore, the nucleic acid in the serum provides a potential window into the health status of the individual [5] [6]. For example, exDNA testing found in serum has greatly promoted the development of non-invasive testing and diagnosis [7], including prenatal testing and fetal genome sequencing for fetal aneuploidy, as well as cancer diagnosis and monitoring [8].
Since the discovery of extracellular RNAs (exRNAs) in the circulatory system and other body fluids, many studies have been working hard to classify and evaluate whether exRNAs can be used as biomarkers of diseases [5] [6] [7] [8]. There have many exRNAs been identified, such as mRMA, long non-coding RNA, PiwiRNA (piRNA), small nuclear RNA (snRNA), small nuclear RNA (snoRNA), ribosomal RNA (rRNA), transport RNA (tRNA), Y-RNA, microRNA(miRNA) and circular RNA (circRNA), etc. [9] [10]. Because exRNAs in serum are exposed to the environment of RNase R, the exRNAs, especially mRNA, are very unstable in serum, and RNA are usually in a fragmented state, it is a challenge to characterize RNA expression profiles in serum [10] [11].
In this study, we perform the RNA-seq library preparation in different volumes of serum, making the global measurements of RNA species, including linear RNA and circRNA, and evaluate the sequencing results of exRNA in different volumes of serum.
2. Materials and Methods
2.1. Study Population and Serum Sample Collection
We collected peripheral blood (about 2 ml) from 16 individuals, then the blood samples were centrifugated at 3000 rpm for 10 min at 4˚C. The serum samples were collected and 500 μl serum of every individual were mixed, then the mixed serum samples were divided into 500 μl, 1.5 ml, 2.5 ml, 3.5 ml, stored at −80˚C until use [12]. All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of Jiangsu Province Hospital.
2.2. Serum exRNA Extraction
The total RNA from serum (extracellular RNA) was extracted by using miRNeasy Serum/Plasma Kit according to the manufacturer’s instructions with modifications. Simply, every 500 μl serum add 1 ml TRIzol reagent, mixed by vortex and incubated at room temperature for 5 min, followed 250 μl chloroform were added to each tube, vortexed and incubated at room temperature for 15 min [13]. After centrifuge the sample at 12,000 g for 15 minutes at 4˚C, the aqueous phase was placed into a new tube. Then we added equal volume of 100% isopropanol to the aqueous phase and incubated at room temperature for 40 min. After that, the aqueous phase transferred to one spin column according to the instructions. Finally, exRNA was dissolved 25 μl RNase-free water. The quality of total exRNA was determined by NanoDrop ND-1000.
2.3. Total exRNA Sequencing Library Preparation
DNA contaminations from extracted total RNA were removed by Recombination DNase I according to the manufacturer’s instructions. After incubated at 37˚C for 25 min, RNA was purified by VAHTS RNA clean beads following the protocol. Because the concentration of extracted RNA is too low, we adopted the exRNA library preparation without rRNA depletion. The exRNA-seq library constructed following manufacturer instructions using the NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina®. And exRNAlibraries were amplified by PCR cycles using Illumina compatible index primers. The amplified libraries were resolved on a 2% agarose gel and recollected the 250 - 500 bp cDNA fragments by using TIANgel Midi Purification Kit. Finally, 4 sequencing libraries were pooled into a single sequencing lane and sequenced by Hiseq-PE150 (Illumina).
2.4. Analysis of Sequencing Data
Raw data of fastq format were firstly processed through in-house perl scripts. In this step, reads containing ploy-N, adapter and low-quality reads were filtered. Q20, Q30 and GC content the clean data were calculated. All the downstream analyses were based on the clean data with high quality. HISAT2 (version 2.1.0) was used to align exRNA-seq paired-end clean reads to the reference genome (hg38). The number RNA transcripts were counted by feature Counts [14] (version 1.6.3) using the aligned reads and the gene annotation file (gencode.v30.annotation.gtf) as the input file. Finally, using CIRI (CircRNA Identifier) software [15] to predict the “junction read” of the sequence, and identify circRNA from these “junction read”, and use circbase to annotate and count circRNA.
3. Results
3.1. The Results of Serum exRNA Library Preparation
We prepared 4 different volumes (500 μl, 1.5 ml, 2.5 ml, 3.5 ml) of serum samples, then the exRNAs were extracted and the RNA-seq library were prepared. Using Qubit2.0 to measure the concentration of the extracted RNA, but we found that the RNA concentration was too low to be detected. And using agarose gel electrophoresis (Figure 1(A)) and Agilent 2100 (Figure 1(B)) to show the quality of cDNA library, and we found the library fragment size was 250 - 500 bp, the average concentration of the cDNA library is 1.84 nmol/l.
3.2. The Quality of Sequencing Data
We use fastQC software to analyze the quality of sequencing data. In the exRNA sequencing data, most of the reads reached the Q30 standard (the error rate of each base is 0.1%) (Figure 2(A)). The average GC content of each read also tends to the standard level (Figure 2(B)).
3.3. Summary of the Sequencing Alignments
We evaluated the results of serum exRNA sequencing of different volumes (500 μl, 1.5 ml, 2.5 ml, 3.5 ml) and the circRNA prediction. We found that the
Figure 1. Preparation of four sample libraries. (A) Agarose gel electrophoresis was used to detect 4 libraries to prepare amplification products; (B) 4 samples mixed samples were prepared from Agilent 2100 detection library.
Figure 2. FastQC report of sequencing data. (A) Per base sequence quality; (B) Per sequence GC content.
number of genes, the reads of gene, circRNA number, and circRNA counts obtained from the analysis of different volumes of serumexRNA sequencing did not change with the volume (Table 1).
3.4. ExRNA Biotypes Distribution Analysis
To get a better understanding about different volumes of serum’s exRNA biotypes distribution, we analyzed the total aligned reads of each biotype between different volumes of serum (Figure 3(A)) and visualized this using a donut graph where each layer represents the reads and numbers of RNA biotype for each volume (Figure 3(B)). The results show the reads and numbers of the main biotypes of different volumes serum did not change with the volume.
3.5. Distribution of Highly Expressed Transcripts
We select the transcripts that have over 20 reads in each sample and analyze their expression level in different volumes of serum. The results show that there
Table 1. Overview of exRNA sequencing of different volumes of serum.
Figure 3. The changes of transcript biotype in different volumes of serum. (A) Percentage of total aligned reads for RNA from different volumes of serum by ENSEMBL biotype; (B) The ratio of transcripts reads and numbers of RNA biotype for each volume.
is no obvious connection between gene expression and serum volume, and most genes are averagely distributed among the 4 samples (Figure 4).
3.6. Distribution of circRNA in Serum
We used CIRI software to predict the circRNA of RNA sequencing data from 4 samples. In total, we found that the total reading of 48 circRNA in serum was greater than 3 (Figure 5(A)). In general, most circRNAs have very low abundance, and only a few circRNAs are detected with higher counts (Figure 5(B)). However, the highly expressed circRNA expressed the similar counts in the 4 samples.
4. Discussion
RNA expression profiles can reveal individual health status. Peripheral blood serum is an ideal material for studying gene expression because of its easy to access and low invasiveness. But due to the small amount and fragmentation of
Figure 4. The heat map of distribution of highly expressed transcripts. The green part indicates the transcript is expressed averagely in the 4 groups.
Figure 5. CircRNA in serum. (A) Analysis of the reads of circRNA in serum by sequencing; (B) Distribution of circRNA in serum.
exRNA in serum, the exRNA only 10 pg in 5 - 7 µL serum, comparable to the amount of RNA in a single cell, so the amount of exRNA is under 1 ng in 500 µL serum [6]. It is difficult to build the library of exRNA.
In this study, we explored the method of total exRNA sequencing in low volume serum and we discuss the types and distribution of exRNA in serum, as well as the effect of different serum volumes on the results of exRNA sequencing. About 20 million clean reads were acquired by RNA-seq technology per sample and 50% - 80% were mapped to the hg38. In short, we acquired 4000 - 6000 transcripts and about hundreds of thousands counts in each individual sample from serumex RNA-seq.
Besides, according to the sequencing results, we found that most of the exRNA in the serum is lncRNA, protein_coding and MT-rRNA, of which lncRNA and protein_coding account for a relatively high proportion in terms of expression level and type, while MT-rRNA only has 2 transcripts, but the expression is very high. In general, when the volume of serum is between 0.5 - 3.5 ml, the exRNA sequencing results, including the amount of circRNA, have not changed significantly, maybe the key is the depth of sequencing, and it requires further experimentation.
It cannot be denied that there still have limits in our study. The main weakness is the small number of samples. However, serum exRNA sequencing technology is still an important technology for us to explore disease biomarkers. The experiment learned that at the same depth of sequencing, the initial volume of serum does not affect the sequencing results, and the volume of serum can be reduced in subsequent experiments.
Acknowledgements
Thanks to the grant from the National Natural Science Foundation of China [No.81827901].