Cost-Effective Method of Gene Synthesis by Sequencing from Microchip-Derived Oligos for Droplet Cloning ()
1. Introduction
Demand for gene synthesis has been increasing over the past few decades, especially for uses and applications in engineering biology [1]-[3]. A majority of manufactured genes are used for research purposes, especially research focused on gene therapies for cancer treatment. Prostate cancer is the leading cancer diagnosis in men and the second leading cause of cancer-related deaths among American males [4]. Gene therapies have been identified as a frontier in cancer treatment, but their potential has yet to be fully realized [4]. Efficient and cost-effective gene synthesis is a critical component of the development of novel gene therapy processes as it allows for the manufacturing of the desired genes with key alterations [4]-[6].
Currently, gene synthesis involves the assembly of multiple oligonucleotide (oligo) fragments into a gene construct. High error rates in longer sequences are constantly being improved by assembling smaller strands of DNA and correcting the errors in between. These procedures commonly utilize polymerase cycling assembly (PCA) and polymerase chain reaction (PCR) to manufacture the desired genes of interest [6]. Applications of synthetic genes include engineering desired gene characteristics and utilizing them for therapeutic antibodies. Research into alternative methods for gene synthesis has the potential to contribute to the development of personalized medicine and introduce novel methods for gene therapy research.
Gene synthesis allows for mutagenesis and molecular cloning without the need for an original template, which has great potential to facilitate high-throughput screening of genetic variants. However, the current methods used for gene synthesis, especially for longer sequences, are expensive and time-consuming [3] [5] [6]. Scientists and industry stakeholders have worked to decrease the cost of gene synthesis for multiple decades, potentially reducing the cost down to $0.01/base pair (bp) for large-scale applications. Presently, these genes range from $0.07 to $0.30, depending on the sequencing and assembly methods applied [3]. Therefore, new methods should be developed to progress toward the goal of minimizing the cost of constructing genes with high accuracy and widespread use [3] [5] [6].
One approach with the potential to minimize cost and maximize production is the utilization of microchip-derived oligos. Microchips are one of the cheapest sources of synthetic oligo fragments currently on the market [7] [8]; this source provides a large number of oligo fragments for assembly and future gene synthesis in one pool. Microchip-derived oligo pools are often carried through a polymerase chain reaction (PCR) to achieve the desired outcome [9] [10]. Microchips allow for the production of sequences at scale while reducing reagent use and overall cost [7]-[10]. However, the drawbacks of microarray oligos are the continuous high error rates and low concentrations of individual oligo fragments [7]-[10]. Therefore, it is essential to manage and demand error-prone methods to optimize the use of microarray oligos3.
Existing processes of gene synthesis rely on gene cloning as a crucial step that provides numerous DNA replicas for mass production and use. The first in vitro DNA synthesis was accomplished through a single molecule PCR procedure (smPCR) that acted as a substitute for in vivo DNA cloning, contributing to error-prone synthesis methods. Since its introduction, smPCR has been pivotal for gene synthesis methods as it offers a quicker and cheaper alternative to in vivo cloning methods [3] [11].
Sanger sequencing is an efficient method for validating the results of a gene sequence. This process is a cost-effective way to quickly and accurately read sequence regions to produce unbiased readings [12] [13]. Still a common technology used to verify sequences, Sanger sequencing can read long lengths of DNA for mutational discovery [14]. Due to its cost-effectiveness, it is a promising method for reducing sequencing costs in gene synthesis.
Assembling genes into a larger construct requires precision and accuracy for construct delivery and gene use. Golden gate assembly (GGA) is an assembly technique used for the simultaneous and direct assembly of multiple DNA fragments with Type IIS restriction enzymes and T4 ligation processes. GGA’s popularity in the engineering field rises from its ability to work with both linear and circular DNA and its cost-effectiveness as a quicker and more concise alternative to other complex assembly methods [15]-[19]. The final product of GGA does not have a Type IIS restriction enzyme recognition site and, therefore cannot be cut again by the restriction enzyme, making reverse alterations impossible. GGA’s irreversible assembly properties and high fidelity overhangs make it suitable for assembly, closing up near 100% accuracy when performed [20] [21].
The preparation of oligos and genes has also been time-consuming and labor-intensive, but methods have been explored to reduce the amount of time spent and the work required with advancing technologies and machines [3]. Escherichia coli (E. coli) based colonies provide a way to efficiently engineer and clone small pieces of DNA for higher accuracy and lower prices. This “droplet method” involves cloning the small bacterial colonies after plating instead of cloning the entire fragment (Figure 2). This strategy allows for more precise properties as it is focused on smaller sections of the gene before the gene is sequenced and assembled. Additionally, cloning these droplets makes the method cost-effective and improves the speed of PCR for faster construct delivery [22] [23].
This research paper aims to perform the first step of the droplet method to reduce reagent costs for gene synthesis, while also leveraging the rapid speed of PCR. Using microchip-derived oligos allows for gene sequence production in mass amounts through an oligo pool. Additionally, the study will introduce the process of “synthesis by sequencing” through Sanger sequencing, to determine the efficacy of oligo synthesis and various lengths of oligo assembly. This novel manual protocol for gene synthesis will open new avenues for bioengineering and applications in the future, making gene production even cheaper towards the desired price.
2. Materials and Methods
2.1. Materials
A 15,000-oligo microchip was purchased from Agilent (0.02 c/bp). Phanta Max and FlashTaq master mixes were purchased from Vazyme and Empirical Bioscience respectively. T7bsamut and T3bsamut primers were used in the preliminary experiment. The ClonExpress (CE) Kit was purchased from Vazyme including the CE buffer and T7A vector. Gel electrophoresis agarose gel and ethidium bromide were used to visualize PCR products. For bacteria transformation, DH5 alpha cells were used. Shrimp Alkaline Phosphatase (SAP) was stored at −20˚C and utilized in the TSAP program. Sanger sequencing was performed using technology by Hitachi Applied Biosystems to analyze the final cloned constructs. All trials were performed at Quintara BioSciences company in Cambridge, MA.
2.2. Oligo Assembly
A preliminary experiment to determine the best fragment length for the microchip involved testing different-length (1 - 4 fragments) fragments through oligo assembly. The number of oligos for each length was 1, 67, 243, and 3584 respectively. Oligos were assembled with overlapping varying from 16 - 20 bases (Figure 1). Polymerase Cycling Assembly (PCA) and PCR were performed with the universal primers T7bsamut and T3bsamut. The products underwent Gibson Assembly cloning into the destination vector and sequenced into 8 × 96 well plates.
Figure 1. Oligo assembly map. Different oligo lengths (shown from top to bottom starting at one oligo) are displayed with the appropriate overlaps and primer lengths.
2.3. Polymerase Cycling Assembly
For trial one, the microchip was diluted with 25 μL distilled water (dd H2O), and 5 μL of the mixture was transferred and mixed with 5 μL of 2× Phanta Max master mix then thermocycling for 20 cycles (98˚C 10 sec, 68˚C 1:30 min) for PCA.
For trial two, 4.5 μL of the original oligo mix was mixed with 4.5 μL of FlashTaq hot start master mix, assuming that there is no 3’ exonuclease activity. PCA thermocycling was performed for 20 cycles (96˚C 10 sec, 68˚C 1:00 min).
2.4. Amplification and Cleanup of DNA
For trial one, 10 μL of PCA product was mixed with 50 μL of 2× Phanta Max master mix, 49 μL of dd H2O, and 1 μL T7T3 primer. The mixture was distributed into two PCR tubes with 55 μL each and thermocycled for 25 cycles (96˚C 1:30 min, 98˚C 15 sec, 55˚C 20 sec, 72˚C 3:00 min). Gel electrophoresis was performed with 2.5 μL of PCR product on 1.5% agarose gel for 20 minutes and checked with FluorChem 5 under EtBr.
For trial two, 10 μL of PCA product was used for a 100 μL PCR mixed with 45 μL master mix and 45 μL dd H2O and thermocycling for 30 cycles (60C). The other 90 μL of the PCA product was used for cleanup with a control. Magnetic beads were added to the experimental group to make sure small molecules wouldn’t precipitate on the beads. The mixture was washed with 300 μL of 75% EtOH and DNA concentration was analyzed (60 μg/μL).
2.5. Amplified DNA Insertion into Vector Site
For trial one, the CE Kit was used with 150 μL CE, 50 μL PCR product, and 100 μL digested T7A vector in two PCR tubes and incubated at 50˚C for 30 minutes.
For trial two, 40 μL CE, 20 μL PCR product, and 20 μL digested T7A vector were mixed and incubated at 50˚C for 30 minutes.
2.6. Bacteria Transformation of DH5-Alpha (a) Cells
For trial one, 100 μL of CE product and 1 mL of DH5a cells were mixed into a tube and distributed evenly among 10 small PCR tubes with 100 μL each. The product was placed in an ice stand for 20 minutes at 0˚C and then 42˚C for 45 seconds. 100 μL of each PCR tube was added into the super optimal broth with catabolite repression (SOC) medium and incubated at 37˚C for 10 minutes. 30 plates incubated at 37˚C were seeded with 350 μL of the final mixture.
For trial two, 800 μL DH5a cells and 80 μL of CE product were mixed and placed on an ice stand at 0˚C for 20 minutes. The mixture was split into four tubes for a faster reaction and incubated at 42˚C for 45 seconds. SOC was then added with 2× the amount to each tube and incubated at 37˚C for 10 minutes. 20 plates incubated at 37˚C were seeded with 110 μL each.
2.7. Sequence Preparation
384 colonies were picked in four 96-well plates and diluted in 30 μL of dd H2O (Figure 2). PCR was performed with a master mix (high fidelity for trial one and non-high fidelity for trial two) and diluted in 30 μL of dd H2O for 35 cycles (96˚C 1 min, 98˚C 10 sec, 60˚C 20 sec, 72˚C 2:30 min). 4 μL of Shrimp Alkaline Phosphatase (SAP) with exonuclease I stored at −20˚C was then added and mixed to run the TSAP program (37˚C 15 min, 80˚C 5 min). The TSAP product was diluted in 150 μL dd H2O and 2 μL of the mixture was transferred to a ready-to-use plate stored at −80˚C and then ran under the XZ3 program.
Figure 2. Potential method of gene synthesis through the droplet method. An introduction to the study to show synthesis by sequencing through the droplet method to reduce overall costs.
2.8. Washing and Sequencing
The XZ3 product (4 plates each trial) was washed with 70 μL beads + 85% alcohol and vortexed for 30 seconds. On a magnetic plate, the plates were spun with a centrifuge (Beckman Coulter Allegra 6) washed with 100 μL of 85% alcohol, and spun with a centrifuge again. The product was diluted in 70 μL dd H2O, vortexed, and quickly spun with a centrifuge (Beckman Coulter Allegra x22r). The plates were transferred to a white lid cassette and sequenced with Sanger Sequencing (Hitachi Applied Biosystems) (Figure 3).
2.9. Data Analysis and Cost Calculation
All sequences were processed through BioPython, which output the sequence data from both trials in cassette plates. Perfect matches between the microchip oligos and the sequenced oligos were determined and highlighted for calculating the percentage of correctly sequenced oligos.
The cost was calculated using the prices of the microchips and sequencing. Colony prices were determined through the company’s usual rate of 20 cents.
3. Results
3.1. Determination of Efficiency in Varying Length Oligo Fragment Assembly
Sequencing results displayed that one 2-fragment oligo was sequenced correctly. The 1-fragment stayed as a control, and both 3-fragment and 4-fragment oligos were not sequenced (Figure 4).
3.2. Potential of Cost-Effective Gene Synthesis through Sequencing
The first step of “synthesis by sequencing” was trialed with a specific methodology.
Figure 3. Gene synthesis protocol for trials. Trial one and two process shown above with trial two alterations being highlighted in blue by replacing the high-fidelity polymerase in the PCR step with a non-high-fidelity polymerase as well as increasing the annealing temperature and adding a cleanup step.
In trial one, the plates used a high-fidelity polymerase which resulted in 3% accuracy. The second trial replaced the polymerase with a lower-fidelity polymerase and resulted in a higher accuracy (8%) (Figure 5). The first set of cassettes had an accuracy of 10% in trial two, and both trials maintained a guanine-cytosine (GC) content of 50%.
Figure 4. Viability of oligo assembly. The percent correctness of the different oligo lengths is presented. *p < 0.05.
Figure 5. Sample representations of sequenced two overlapping oligos in trial 1. (a) Sample of correct sequence (b) Sample of sequence with errors.
3.3. Cost Calculation for Cost-Effective Gene Synthesis Method
Calculating the overall price for synthetic genes, the microchip consists of 2.25 million bases for the price of $3800. If all the sequences on the microchip are sequenced correctly, the final cost of each base pair is around $0.0017. However, considering the accuracies completed after oligo cloning and assembly, accuracy rates were relatively low to normal sequencing outcomes. To sequence the 7500 fragments on the microchip, we divide 7500 by 8% due to the accuracy rate and obtain around 94000 colonies for the 7500 fragments. When each colony is around 20 cents, this amount means that it takes around $19000 to sequence. To calculate the bp cost, divide the total sequencing cost by the total number of bases (2.25 million); the final cost is 0.8 c/bp total for sequencing and bases.
4. Discussion
The various oligo fragment lengths were assembled to test the efficiency of a different method. Out of all 3895 fragments, only 2-fragment oligos were assembled correctly: the three and four overlapping oligos resulted in high errors. This protocol determines the optimal length for fragment assembly of two overlapping oligo fragments to be applied to the droplet method.
In the first trial, a high-fidelity polymerase (Phanta Max) was used; the sequences were mismatched and misprimed at the 3’ end. Because the first trial resulted in a 3% accuracy, the second trial utilized a non-high fidelity polymerase (FlashTaq Hot start) because there is a possibility that there is no 3’ exonuclease activity. Additionally, the annealing temperature was increased for more specific amplification. One last alteration made to the second trial’s protocol is including an experimental oligo cleanup with Ethidium Bromide after PCR was performed (for efficient removal of primers, enzymes, nucleotides, etc.). When running gel electrophoresis, the experimental mix that underwent the cleanup was more clear and easier to see than the control group without the cleanup. The desired percentage was around 20% to initiate the golden gate assembly, but these two experiments supported the potential of sequencing in between gene synthesis. Both trials 1 and 2 maintained a GC content of 50%, indicating that the sequences are moderately stable. Out of the eight cassette plates used in sequencing in the second trial, the first set of four cassettes resulted in a 10% accuracy, meaning that the second batch greatly affected the total percentage. Due to the low fraction of correct sequences, the correctly synthesized oligos were scattered across the plate, making it difficult to progress to the Golden Gate assembly stage. If accuracy rates can improve, golden gate assembly can be used to assemble the fragments into constructs. Synthesis, sequencing, and assembly will be able to come together with this protocol while exemplifying rapid manufacturing to make fast deliveries of genes.
The cost of gene synthesis in this study aims to reach the goal of under $0.01/bp. Here, we demonstrate that with low viabilities of accurate sequences manually, future methods of gene synthesis can perform these methods digitally or incorporate more machines. Additionally, this method exceeds expectations of current gene synthesis costs. Compared to present costs of 7 - 30 cents/base, this method with 0.8 cents/base is cheaper and more efficient. There were a few limitations to this study. Due to the low accuracy rate, the synthesized oligos are well spread out on the plate, therefore not allowing sequential steps like golden gate assembly to take place. Secondly, because this research was conducted by hand, future applications of this method can be performed digitally or with better machinery, thus improving the quickness and accuracy of synthesizing the genes manually. This sequencing protocol has a high potential for future applications due to its cost, quickness, and feasibility.
This novel approach to gene synthesis reduces the reagent cost required and introduces the droplet method to allow manufacturers to clone and sequence in a more precise manner. The preliminary experiment resulted in only 2-fragment oligos being accurately sequenced, indicating the use of 2-fragment oligos being used for the microchip. The newly proposed protocol allowed for effective sequencing despite the low succession rate. The final calculated cost of sequencing and gene synthesis is around 0.8 cents and has the potential to become cheaper with the correct machinery and execution. Moreover, leveraging the rapid speed of PCR allows for future applications of 40 cycles in 1 minute with 30 seconds per cycle compared to the current 20-minute doubling time of E. coli. The contribution of new cost-effective protocols in the gene synthesis field has the potential to benefit future research conducts and gene therapy costs to be more widespread and provide new opportunities for gene construction [22] [24] [25].
Acknowledgements
The author would like to thank Quintara Biosciences for providing the resources and laboratory settings to conduct this study.