MiYA, An Efficient Machine-Learning Workflow in Conjunction With The YeastFab Assembly Strategy For Combinatorial Optimization of Heterologous Metabolic Pathways in Saccharomyces Cerevisiae

Metabolic Engineering 47 (2018) 294–302
Contents lists available at ScienceDirect
Metabolic Engineering
journal homepage: www.elsevier.com/locate/meteng
MiYA, an efficient machine-learning workflow in conjunction with the T

YeastFab assembly strategy for combinatorial optimization of heterologous
metabolic pathways in Saccharomyces cerevisiae
⁎ ⁎⁎
Yikang Zhoua,1, Gang Lia,1,2, Junkai Dongb,1, Xin-hui Xinga, Junbiao Daib,c, , Chong Zhanga,
a
Key Laboratory for Industrial Biocatalysis, Ministry of Education, Department of Chemical Engineering, Center for Synthetic & Systems Biology, Tsinghua University,
Beijing, China
b
Center for Synthetic & Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China
c
Center for Synthetic Genomics, Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
A R T I C LE I N FO A B S T R A C T
Keywords: Facing boosting ability to construct combinatorial metabolic pathways, how to search the metabolic sweet spot
Microbial cell factory has become the rate-limiting step. We here reported an efficient Machine-learning workflow in conjunction with
Combinatorial optimization YeastFab Assembly strategy (MiYA) for combinatorial optimizing the large biosynthetic genotypic space of
Machine learning heterologous metabolic pathways in Saccharomyces cerevisiae. Using β-carotene biosynthetic pathway as ex-
YeastFab
ample, we first demonstrated that MiYA has the power to search only a small fraction (2–5%) of combinatorial
space to precisely tune the expression level of each gene with a machine-learning algorithm of an artificial neural
network (ANN) ensemble to avoid over-fitting problem when dealing with a small number of training samples.
We then applied MiYA to improve the biosynthesis of violacein. Feed with initial data from a colorimetric plate-
based, pre-screened pool of 24 strains producing violacein, MiYA successfully predicted, and verified experi-
mentally, the existence of a strain that showed a 2.42-fold titer improvement in violacein production among
3125 possible designs. Furthermore, MiYA was able to largely avoid the branch pathway of violacein bio-
synthesis that makes deoxyviolacein, and produces very pure violacein. Together, MiYA combines the ad-
vantages of standardized building blocks and machine learning to accelerate the Design-Build-Test-Learn (DBTL)
cycle for combinatorial optimization of metabolic pathways, which could significantly accelerate the develop-
ment of microbial cell factories.
1. Introduction activity in heterologous biosynthetic pathways (Yokobayashi et al.,

2002).
Advances in metabolic engineering have enabled the modification In past decades, rapid development of synthetic biology techniques
of biosynthetic pathways in microorganisms to produce a wide variety has improved the ability to construct synthetic combinatorial metabolic
of valuable compounds, including pharmaceuticals, nutraceuticals, pathways. Chip-based oligo synthesis (Hughes and Ellington, 2017) and
biofuels, and compounds in bulk (Chen et al., 2017). One major step in DNA assembly technologies (Casini et al., 2013; Engler et al., 2009,
metabolic engineering is to overexpress a heterologous enzyme(s) to 2008; Gibson et al., 2009; Quan and Tian, 2014; Shao et al., 2009) have
boost the yield of an intermediate(s) or the desired product directly. substantially reduced the cost associated with the syntheses of large
However, unbalanced heterologous pathway would result in potential DNA molecules. Given the availability of these technologies, many
metabolic burdens or accumulation of intermediates (Xu et al., 2016). combinatorial methods have been developed to construct synthetic
Therefore, a major challenge is to find the optimal expression levels of pathways with various expression levels of constituent enzymes, such as
the constituent enzymes. Experimental optimization, which attempts to the YeastFab assembly (Guo et al., 2015; Yuan et al., 2017) and Oligo-
survey all possible combinations of expression levels of enzymes, has linker Mediated Assembly (Fang et al., 2016; Zhang et al., 2015). These
generally been used to determine the precise balance of enzymatic protocols enable the design and construction of standardized biological
⁎
Corresponding author at: Center for Synthetic Genomics, Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
⁎⁎
Corresponding author.
E-mail addresses: junbiao.dai@siat.ac.cn (J. Dai), chongzhang@tsinghua.edu.cn (C. Zhang).
1
These authors contributed equally to this work.
2
Current address: Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96 Göteborg, Sweden.
https://doi.org/10.1016/j.ymben.2018.03.020
Received 17 January 2018; Received in revised form 20 March 2018; Accepted 31 March 2018
Available online 05 April 2018
1096-7176/ © 2018 International Metabolic Engineering Society. Published by Elsevier Inc. All rights reserved.
Y. Zhou et al. Metabolic Engineering 47 (2018) 294–302
Fig. 1. The schematic workflow of MiYA. 1. Standardization: The promoters (PROs), open-reading frames (ORFs), terminators (TERs) were standardized to allow
assembly of the sequences into PRO-ORF-TER (POTs). 2. Construction of the initial pools of strains. Two initial libraries were constructed: one that was composed of
strains containing randomly chosen POTs and one that was composed of strains screened for their product titers. 3. Phenotype testing. Strains were individually
cultured in flasks to determine their product titers. 4. ANN machine learning, prediction, and evaluation. The capability of designed strains to produce desired
product were predicted by machine-learning algorithm with the training dataset from initial library. The results will determine the strains that should be constructed
in the next iteration.
parts that can then be assembled into heterologous pathways under the et al., 2014; Long and Antoniewicz, 2014). The principle component
control of various promoters or ribosome binding sites. analysis method (Alonso-Gutierrez et al., 2015), the Plackett–Burman
Given the presence of large genotypic space, how to effectively and Box–Behnken design procedures (Xu et al., 2016), and linear re-
capture it is the rate-limiting step during optimization of a heterologous gression models (Lee et al., 2013; Zhou et al., 2015) have been fre-
pathway. Exhaustive searching is one way to find the optimal combi- quently used. However, given the complexities of biological systems
nation of enzyme activities in a pathway. However, due to the limita- and the need to precisely balance enzymatic activities for pathway
tion on the number of combinations that can be screened within a optimization, more reliable and standardized data-driven learning
reasonable amount of time, it is nearly impossible to undertake ex- strategies are needed. Here, we proposed a machine-learning workflow
haustive screening using conventional phenotyping methods. Potential to assist the development of optimized heterologous biosynthetic
methods to reduce the size of combinatorial space include splitting the pathways with the YeastFab assembly procedure termed MiYA. It starts
targeted network into distinct subnetworks (Ajikumar et al., 2010; with a small library of representative strains. Genotypic and phenotypic
Biggs et al., 2014; Chen et al., 2016; Juminaga et al., 2012; Wu et al., information of these strains are then compiled and used as the training
2014, 2013a, 2013b; Xu et al., 2013) or constructing small libraries dataset for prediction of new genotypic designs with an artificial neural
(Jeschek et al., 2016) that cover a broad range of expression levels network (ANN) ensemble. The predicted “best” designs were measured
while maintaining a practical throughput of tests. Nonetheless, even experimentally and fed into ANN until no new optimized designs are
such protocols require to test nearly all constructed genotypes to found.
identify the optimal combination. Consequently, statistical model-based To test the workflow, we first used β-carotene biosynthesis as a
design of experiments procedures, which can develop a stochastic model. To reduce the workload, we examined how the initial library
model based on a limited amount of experimental data, have been used generated by different procedures, i.e., a randomly generated library
to optimize multivariate combinatorial metabolic pathways (Bordbar and a pre-screened library, affected the prediction power of the ANN
295
procedure and the minimal number of iterations needed to achieve constructed. For the pre-screened pool, because the color of each cul-
better predictions. After that, we assessed the ability of our workflow to tured colony could be used as a qualitative indicator of the carotenoid
improve the production of violacein as it is a more complicated titer and the extracted POTs were mixed with equal amount of an au-
pathway containing five enzymes. Based on the data derived from a totrophic marker and homologous recombination arms, and co-trans-
colorimetric plate-based, pre-screened initial pool of 24 strains produ- formed into the yeast host strain. After culturing the colonies, we
cing violacein, we successfully predicted and verified the existence of a picked colonies colored orange (for β-carotene biosynthesis) or purple
strain that showed a 2.42-fold titer improvement in violacein produc- (for violacein biosynthesis), and then patched them on selective plate to
tion among 3125 designs. Furthermore, we were able to maintain the confirm the presence of the marker. The presence of each combination
purity of violacein while keeping good titer. was determined by diagnostic PCR and sequencing as described above.
Finally, strains with different genotypes were included in the initial
2. Materials and methods pool, tested for their phenotype and then used for ANN assessment.
2.1. Compounds and cultivation of strains 2.2.3. Phenotype testing

To reduce scale effects on the apparent phenotypes, we cultivated
The β-carotene standard was purchased from Sigma–Aldrich. each strain in a 100-mL flask for 48 h. The titer and purity of the tar-
Escherichia coli DH5α was used for the propagation of the recombinant geted product produced by each strain was measured according to the
plasmids, and the yeast strain JDY52 (MATa his3Δ200 leu2Δ0 lys2Δ0 appropriate analytical method which usually included extraction and
trp1Δ63 ura3Δ0 met15Δ0) was used as the host for the β-carotene and HPLC analysis.
violacein heterologous pathways (Guo et al., 2015). The E. coli strains
were cultured at 37 °C in lysogeny broth (OXOID) containing 100 mg/L 2.2.4. Machine learning, prediction, and evaluation
ampicillin (Ameresco) or 50 mg/L kanamycin (Ameresco) for plasmid Given a set of training samples of the form {(x1,y1),…, (xm,ym)}, a
maintenance. For construction of the heterologous pathways, yeast learning algorithm outputs a hypothesis about the true unknown
strains were cultured at 30 °C on synthetic complete medium (SC) function, y = f(x) (Xie et al., 2014). We used a linlog-scaled promoter
lacking appropriate amino acids for selection. For fermentation, yeast activity to represent each gene in the vector xi. Each associated phe-
strains were incubated in 5 mL of yeast extract peptone dextrose notype, i.e., the purity and/or the titer of the target metabolite, was set
medium (all components from BD Diagnostics) with 300 mg/L trypto- as a vector yi. We developed a modified ANN ensemble to determine the
phan (SIGMA) in 14-mL Round Bottom High Clarity PP Test Tubes (BD priority of all possible designs in combinatorial space according to the
Falcon) at 30 °C and 220 rpm for 24 h after being picked up from the predicted results. Then ~ 20–30 of these designs with the highest
selective plate. Then 1% of each inoculum was individually added into priorities were selected as candidates for the next iteration. Because the
20 mL of fresh yeast extract peptone dextrose medium containing standardized POTs were already available, we were able to quickly
700 mg/L tryptophan for violacein synthesis or 300 mg/L of tryptophan construct and test these candidates by culturing them individually in
for β-carotene synthesis in a 100-mL flask and subsequently each yeast flasks, to determine the accuracies of our predictions. In addition, we
strain was cultured at 30 °C and 220 rpm for 48 h. inputted the data from each round of phenotype characterization into
the next set of training samples to improve the predictive capacity of
2.2. ANN and YeastFab workflow the ANN.
The workflow (Fig. 1) is composed of the following four steps. 2.3. ANN ensembles
2.2.1. Standardization Generation of a final ANN ensemble based on a common back-

The standardization of gene parts and pathway assembly were propagation ANN was used (Fig. 2). Our ANN structure is composed of
performed according to the YeastFab (Guo et al., 2015). HCKan_P, three layers with a neuron for each input variable in the input layer,
HCKan_O, and HCKan_T plasmids were respectively designed to host two neurons in the hidden layer and one (for the relative titer value) or
the promoters (PROs), open-reading frames (ORFs), and terminators two neurons (for relative titer and purity values) in the output layer.
(TERs) so that each part could be cloned and released using two dif- Each input variable refers to the relative activity of each gene promoter,
ferent type II restriction enzymes, namely BsaI and BsmBI, but with the which was linlog transformed into [− 1,1]. Owing to the finite number
same sticky ends. For each PRO-ORF-TER (POT) construct, a set of the of samples used in the training set, we set the number of nodes in the
targeted standardized PRO, ORF, TER constructs, and the appropriate hidden layer as two. Concerning the outputted titer values, each was
vector were mixed together with BsmBI and buffers to assemble the normalized to that of the largest titer in the inputted training-set and
POT in one-pot reaction. The assembled POTs were then co-transformed then scaled by a factor of 0.7. We used the ANN model implemented in
with an autotrophic marker and homologous recombination arms into MATLAB (2015). The training function was the Levenberg-Marquardt
the yeast host strain for integration. The genotype of each strain was back-propagation function, repeated at most 100 times, at a learning
confirmed by selective marker, diagnostic PCR, and sequencing. The rate of 0.01. In this ANN protocol, data flows forward from the input
promoters and terminators can be found in https://yeastfab.cailab.org/ layer to the hidden layer via the log-sigmoid activation function and
wizard/parts/home/. The plasmids and primers used in this work are then to the output layer via the linear activation function. The training-
listed in Supplementary File 1. For β-carotene biosynthesis, crtE, crtI, prediction procedure was repeated at least 1000 times. The Top1, Top5,
and crtYB from Xanthophyllomyces dendrorhous (NCBI: http://www. and Top10 files represent the frequency that each strain appeared as the
ncbi.nlm.nih.gov/, NC_020903.1) were used (Verwaal et al., 2007), and top producer, one of the top five producers, and top 10 producers, re-
for violacein biosynthesis, vioA, vioB, vioC, vioD, and vioE from Chro- spectively. We selected for further study strains that might potentially
mobacterium violaceum (NC_005085.1) were used (Lee et al., 2013). The be the best producers according to a threshold defined as half the
POTs constructed for this work are listed in Table S1. number of times that the strain that most often appeared in the en-
semble (0.5fmax).
2.2.2. Construction of initial pool
We used two different pools of strains to construct initial libraries 2.4. Analytical methods
for the β-carotene biosynthesis, which were a randomly constructed
(denoted random pool) and a pre-screened pool. For the random pool, The β-carotene assay was modified from Xie et al. (2014). 1 mL of
24 combinations were randomly designed in silico and then physically each culture was centrifuged for 5 min at 14,680 rpm, and then the cell
296
Fig. 2. ANN ensemble. The structure of the

ANN algorithm three layers with one neuron
for each input variable in the input layer, two
neurons in the hidden layer and one (relative
titer values) or two neurons (relative titer and
purity values) in the output layer. Owing to
the limited data in the training set, the
training-prediction procedure was repeated
1000 times with randomly assigned initial
weights. Each output predicted a strain that
was found most often (the top 1) or would be
found as one of the top 5 or top 10 in terms of
its titer. Three ranking lists, Top1, Top5,
Top10 were produced by counting the number of times each strain was found as a “top” producer. The strains with the predicted optimum potential to produce
product were selected using a threshold frequency > 0.5fmax, were tested, and then were added into the next ANN iteration.
pellet was suspended in 1 mL of 3 N HCl, heated in a boiling water bath plates, which lead to a throughput of < 102. Because the size of com-
for 3 min, and then cooled in an ice-bath for 3 min. Each pellet was binatorial space was much larger than that of the tested throughput, we
washed once with Milli-Q-purified water (Millipore, Milli-QBioal) and adopted the machine-learning algorithm to improve the predictive
centrifuged. After removing the supernatants, pellets were suspended in value of the small dataset(s), which allowed us to make data-driven
1 mL acetone and then sonicated for 20 min (Scientz, KQ250B). The prediction concerning the optimal solution by increasing the efficiency
extracts were centrifuged for 5 min at 14,680 rpm and then 700 μL of of an optimal search combination.
each supernatant was removed and centrifuged again. Finally, 500 μL of As noted above we used an ANN model for the machine-learning
each supernatant was analyzed for its β-carotene content (see below). algorithm as it has been applied to model non-linear systems with great
The violacein assay was modified from Jones et al. (2015). 500 μL of accuracy and has been used for many different systems (Desai et al.,
each culture was centrifuged for 5 min at 14,680 rpm. After removal of 2005; Silva et al., 2012; Teresa Caldeira et al., 2011). However, ANNs
the supernatants, each cell pellet was suspended in 1 mL of ethanol and normally require a large number of training samples to establish an
then sonicated for 10 min. The ethanol extracts were heated in a water accurate model (Peng et al., 2014). When the size of the training dataset
bath at 94 °C for 15 min. Then, an additional 1000 μL of ethanol was is too small compared with the size of the hypothesis space, overfitting
added into each tube, and the solutions were pipetted several times to may occur which has serious statistical consequences. We, therefore,
mix the contents. The ethanol extracts were centrifuged for 5 min at considered an ANN ensemble of models instead of using a single model
14,680 rpm after which 700 μL of each supernatant was removed and to avoid overfitting. We trained more than 1000 ANN models with
centrifuged again. For the assay, 500 μL of each supernatant was used. random initial weights. Each of these models returned a list of predic-
The cell preparations were chromatographed through an Agilent tions concerning the strains that might be the best (top 1), or in the top
C18 column (4.6 × 150 mm) with elution controlled by a Shimadzu LC- 5 or top 10 in terms of their titers. The three final ranked lists of strains,
20 AT system to determine the β-carotene and violacein concentrations, denoted Top1, Top5, and Top10, were produced by counting the
which were detected at 450 and 568 nm, respectively. The mobile phase number of times each strain appeared in an ensemble (Fig. 2).
for the β-carotene assay was acetonitrile-methanol-isopropanol The quality of a training dataset will affect the accuracy of the
(50:30:20 v/v); the flow rate was 1 mL/min; and the column tempera- prediction made by the learning procedure. We, therefore, designed
ture was at 40 °C. The mobile phase for the violacein assay was me- two initial libraries, one of which was a pool formed by randomly
thanol-water (75:25 v/v); the flow rate was 1 mL/min; and the column chosen POTs (random pool) and the other formed by a pool containing
temperature was at 40 °C. The violacein content was expressed as the pre-screened POTs (pre-screened pool), to test their impact on our
relative concentration compared to the strain 5–5. workflow. For the random pool, after an in silico design with the pro-
moter composition distributed uniformly based on the χ2 test (Table
2.5. Data availability S3), 24 strains were constructed containing the assembled POTs. Over
50% of the colonies in the random pool were not colored, indicating
In addition to data presented in this article, additional data is that they did not produce as substantial amounts of carotenoids (Fig.
available in the Supplementary Data files, which also contains the S1a). This observation was confirmed by phenotype testing, which
program code. showed that most combinations of the POTs in the random pool did not
efficiently drive β-carotene production (Fig. S1c). Of the strains in the
3. Results random pool, C14 (BS998) had the greatest β-carotene titer
(0.57 ± 0.06 mg/L). Because the colors of the colonies could serve as
3.1. Developing the machine-learning algorithm with a small number of rough indicators for the carotenoid titer, we evenly mixed the POTs in
training samples the library to assemble strains each containing the β-carotene pathway,
then selected 19 of the strains that when cultured were orange; these 19
We first demonstrated our workflow using the biosynthetic pathway strains formed the pre-screened pool (Fig. S1b). The β-carotene titers in
for β-carotene by introducing three genes crtE, crtI, and crtYB from X. the strains from the initial pre-screened pool ranged from 0.04 mg/L to
dendrorhous as reported previously (Guo et al., 2015; Verwaal et al., 1.39 mg/L, and strain D12 (BS797) had the largest titer
2007). 10 promoters with various relative activities (between 0.09 and (1.39 ± 0.25 mg/L). In addition, six additional strains from the pre-
54.89; Table S2) were chosen to drive expression of the three genes, screened pool had a comparable or greater titer than C14 from the
which created a combinatorial space of 103 possible designs, named random pool.
BS1 to BS1000 (all designs are listed in Supplementary File 2). Each When the training data set was composed of data from the random
PRO, ORF, and TER were designed following the YeastFab standard and pool dataset, the number of strains that were predicted to be better
kept in their corresponding host vectors. Because scaling the volume of producers than C14 in the resulting Top1, Top5, and Top10 lists were
a culture might affect the absolute titer and purity of the target meta- one, four, and nine, respectively. When experimentally tested, the
bolite, we generally preferred to cultivate the yeast in flasks for the strains in the Top1 file did not have a better titer than C14, and only
phenotype tests rather than as single cells or in the wells of 96-well one strain in the Top5 and three strains in the Top10 lists had
297
Fig. 3. Optimization of the β-carotene biosynthetic pathway in S. cerevisiae. (a) The β-carotene titers of the strains (C1–24) that each contained a randomly
generated POT sequence used in the initial ANN input library (left). The titers of the best producers predicted by the ANN workflow based on the data obtained from
the C1–24 titers (right). (b) The β-carotene titers of the pre-screened strains used as the initial ANN input library and the titer of the C14 strain (left). The titers of the
best producers predicted by the ANN workflow based on the data obtained from the titers of the pre-screened strains. (c) The titers of the best producers predicted by
the ANN workflow based on the input data of pre-screened pool that excluded the D10 (right). (d) The titers of the best producers predicted by the ANN workflow
based on the on the input data of pre-screened pool that excluded strains containing the weak promoters for crtE and crtI. The promoter strengths for the genes are
listed at the bottom of the panel and are color-coded according to their linlog activities from − 1 to 1. The values are the average ± standard deviation and were
calculated from duplicate experiments without additional statements.
comparable or greater titers than C14 (Fig. 3a). The strain BS999 had a found in the solution space. When we excluded the data for D12 in the
54.2% increase in the β-carotene titer compared with that C14, which pre-screened pool to test the robustness of the training set, the predicted
was the greatest improvement found in the ANN ensemble. When we stains remained nearly the same, except for the addition of BS799, with
added the information gained from the first round of prediction into the β-carotene titer of 1.85 ± 0.18 mg/L, which was not significantly dif-
training dataset, the prediction accuracy was substantially improved. ferent from the titer of D12 (1.63 ± 0.30 mg/L, determined from eight
After this round of machine learning, the strain in the Top1 list, two of biological replicates; Fig. 3c). In addition, the information obtained
five strains in the Top5 list, and four of seven strains in the Top10 list from the first pre-screened iteration was of use in the optimization steps
had comparable or better titers than did C14 (Fig. 3a). After a second that followed. For example, we noticed a significant non-uniform dis-
round of prediction and evaluation, two additional strains (BS798 and tribution of the promoters in the pre-screened pool (Fig. S1f). Specifi-
BS997) were found to have greater β-carotene titers than C14, (64.4% cally, the promoters YLL067C and YLL044W, for crtE POTs and the
and 22.6% increased titers, respectively; Fig. 3a). When the additional promoters, YLL067C, YLL044W, YGR233C, and YGR271C-A for the crtI
information obtained from the second iteration was added into the POTs were absent, meaning that POTs containing promoters that too
training set, no new strains predicted to have greater titers than those weakly induced crtE and crtI expression would hardly produce any β-
already found were found in the output. When we excluded C14 from carotene. With the aforementioned results in hand, we then excluded all
the random-pool training dataset to test the robustness of the initial designs incorporating these weak promoters for crtE and crtI in the
library, predicted strains between BS1 and BS100 differed substantially combinatorial space, improving the prediction accuracy (Fig. 3d).
(Table S4). For strains BS1–100, the predicted POTs containing crtE The aforementioned results showed the usefulness and accuracy of
included the weakest promoter pYLL067C, which probably led to a our workflow in finding an improved solution in combinatorial space
small GGPP supply. We constructed certain of these strains, including with standardized building and learning reiterations. We also found
BS95, BS96, BS97, for which their titers were expected to be rather that the quality of the training dataset would largely affect the pre-
small, and in all cases, the predictions were accurate (data not shown). diction results. By pre-screening the pool of POT-mixing strains, we
By pre-screening the pool of POT-mixing strains, the quality of the could not only improve the quality of the training dataset, but also
initial training dataset was substantially improved. Six of the 15 strains exclude some inappropriate promoters for constituent enzymes.
in the Top5 and four of the ten strains in the Top10 had comparable or
better titers than C14, when the pre-screened pool was used as training
3.2. Optimization of violacein biosynthesis
dataset (Fig. 3b). Among them, D12 (BS797) and D10 (BS697), the two
best strains in the pre-screened pool were present. However, no strains
To provide additional proof concerning the validity and accuracy of
with a titer greater than that of D12 was found for this iteration, which
our workflow, its ability to optimize the violacein biosynthetic pathway
may have been because D12 was the strain with the greatest titer to be
was assessed. Violacein, an indolocarbazole produced in bacteria, has a
298
Fig. 4. Optimization of the violacein biosynthesis pathway in S. cerevisiae. (a) Five genes, vioA, B, C, D, and E, from C. violaceum were integrated into S. cerevisiae
to construct a violacein biosynthetic pathway. (b) Production and purity of violacein by the pre-screened strains in the initial library. Violacein production by each of
the initial strains was normalized to the amount produced by strain 5–5, which was the best producer. Violacein purity was defined as p = Avio/(Avio + Advio). (c) The
ANN structure containing two outputs. (d) The strains in the output ANN ensemble predicted to have the highest levels of violacein production based on the data
obtained for the strains in the initial library. (e) The strains in the output ANN ensemble predicted to have the highest levels of violacein production and purity based
on the data obtained for the strains in the initial library. The strengths of the promoters, pCUP1, pADH1, pCYC1, pTEF2, and pTDH3, are ordered from weakest to
strongest as −1, −0.5, 0, 0.5, 1 according to their linlog activities. The values are the average ± standard deviation and were calculated using data of duplicate
experiments without additional statements.
wide range of pharmaceutically important properties (Bromberg and expression, which reduced the combinatorial space to 500 (Fig. S2c,
Durán, 2001; Durán et al., 2007, 1994; Durán and Menck, 2001; Jiang 500 designs are listed in Supplementary File 2) with 24 strains re-
et al., 2010; Rettori and Durán, 1998; Ryan and Drennan, 2009). The maining in the initial library. The violacein and deoxyviolacein titers of
violacein biosynthetic pathway contains five enzymes VioA, B, C, D and these 24 strains were determined after culturing the strains individually
E (Fig. 4a) that transform tryptophan into violacein and deoxyviolacein in 100-mL flasks. The violacein titers were normalized to the titer of
as the primary and secondary (byproduct) products, respectively. The strain 5–5 which was the best violacein producer in the initial library
promiscuous activity of VioC results in production of deoxyviolacein, (Fig. 4b). We also defined violacein purity as: p = Avio/(Avio + Advio)
which is difficult to be separated from violacein. We considered this (A: chromatographic peak area).
pathway a good test bed to demonstrate the power of MiYA dealing At first, we tried to improve the production of violacein. We ana-
with dual objective optimization. lyzed the distribution of the selection frequency, as before, i.e., by
Five known promoters, pTDH3, pTEF2, pCYC1, pADH1 and pCUP1, generating Top1, Top5, and Top10 lists. Notably, 80% of the Top1
which cover a large range of promoter strengths, were chosen for predicted strains have a greater titer than strain 5–5 (Fig. 4d). For
creation of the violacein biosynthetic pathway in S. cerevisiae, (Table strains in the Top5 and Top10, the percentages were 63.6% and 53.8%,
S5), which created a combinatorial space of 3125 possible designs. The respectively. Among the strains with comparable or higher titers than
relative activities of the five promoters were normalized as 1, 0.5, 0, − 5–5 were V1, V6, V7, V251, V256, V257, and V376 with V6 having the
0.5, and − 1 according to the known promoter strength rank. Because largest titer (2.42-fold greater than that of 5–5). These results indicated
we found that the initial library obtained by the colorimetric plate- that expression of all five genes must be substantially increased to
based β-carotene screen was more informative than the one randomly maximize production of violacein, though the V6 strain still produced
generated, 27 purple colonies were chosen for the pre-screened pool 10.6% deoxyviolacein.
(Fig. S2a). The POT promoter for each strain was validated by DNA Subsequently, we tried to fine-tune VioA, B, C, D and E expression to
sequencing. In most strains, vioA and vioB expression was driven by obtain a strain that would produce a large amount of violacein and no
pTDH3 or pTEF2, which indicated that substantial expression of vioA deoxyviolacein. Strain 5–12 was chosen as the initial strain because it
and vioB might benefit production of violacein (Fig. S2b). Therefore, had the highest purity in the initial library. We modified the ANN
only those two strong promoters were used for vioA and vioB model to have two output nodes, violacein titer and purity, (Fig. 4c). To
299
obtain the best resolution in the maximum purity region, the output
(Lee et al., 2013)
(Xu et al., 2016)

data (purity value) was normalized according to p' = Avio/(Avio +
(Jeschek et al.,
(Jones et al., 10Advio) (Fig. S2d). We trained the ANN dataset using the aforemen-
This study
Reference
2015) tioned promoter data and, the violacein titers and purity of the first
2016)
ANN ensemble to provide a new list of ranked strains according to their
selection frequencies that had predicted values of p' > 0.9 and pre-
> 99%
dicted violacein titers in the top 1, top 5, or top 10. Given the threshold
Purity
91%
of 0.5fmax, a strain in the Top1 list and six strains in the Top5 list were
selected for further testing, and the strain in the Top1 list was included
in the Top5 list (Fig. S2e). We noted that five of the six assayed strains
Fold change
had a purity value > 0.989 (Fig. 4e) with V426 showing the best purity
1.35 folds
improved
improved
2.42-fold of 0.9958, (Fig. S2f). Strain V401 had the largest violacein titer among
the six strains, which was 92% greater than that of the initial strain
5–12, although the purity of its violacein (0.9825) was slightly less than
525.4 mg/L in shake flask, 1.31 g/L in a controlled benchtop
that of produced by V426. The composition of the promoters in the six

strains showed that the flux optimization around the branching point,
the intermediate protodeoxyviolaceinic acid, in the violacein bio-
synthesis pathway required a weaker upstream expression and that the
promoter of vioC should be weaker than that of vioD (Fig. 4e).
4. Discussion
141.0 ± 14.5 relative product titer
In this study, we designed a workflow of MiYA for combinatorial

optimization of heterologous biosynthetic pathways in yeast. This
workflow combines the advantages of standardized building blocks and
machine learning to accelerate a DBTL cycle (Nielsen and Keasling,
1829 ± 46 mg/L
2016) for optimization of metabolic pathways. Theoretically, a brute-

force search can find the optimal solution, but to do so may require
bioreactor
excessive time and resources (Dietrich et al., 2010). Instead, our

Titer
workflow enables us to search a small fraction (2–5%) of combinatorial

space to precisely tune the expression level of each gene in a pathway
(2)279/64
107/3125
within several weeks and with no need of any substantial knowledge

(1)12/32
(2)13/27
96/3125
24/3125
(1)372/
13824
about the pathway.

Pool
ANN model was adopted because it had the ability to represent the
non-linear interactions expected among the expressed genes. Given
S. cerevisiae
S. cerevisiae
only a small training dataset, we found that generation of a re-

E. coli
E. coli
E. coli
producible ANN model for the prediction of yeast strains that produce a
Host
large amount of β-carotene or violacein was not possible, although we

did find that the frequency of being selected as the top 1, 5 and 10
Regression modeling
(1)Plackett-Burman
strains was stable. Therefore, we took advantage of the ensemble

(2)Box-Behnken
ANN Ensemble
method to construct the Top1, Top5, and Top10 list that included
strains with a greater than threshold frequency of being selected as the
Design
top 1, 5, and 10 strains, i.e. with a frequency > 0.5 fmax when con-
Model
design
sidered individually. This procedure reduced the noise in the training

data and allowed us to obtain a strain that produced a relatively large
amount of product. We also tested the performance of the multivariate
48-well plates
96-deep-well
250 mL flask
100 mL flask
96-deepwell
linear regression method, which is the simplest way to represent gene

Culture
interactions (Tominaga et al., 2016), and the supporting vector re-

plates
block
gression method, which is good at handling finite samples (Smola et al.,

Methods used for optimization of violacein biosynthesis.
2004), using the data of the two initial pools for β-carotene biosynth-
esis. However, the performance was not stable, especially for library
Statistical Model-Based Multivariate
with the pre-screened pool as initial training dataset (Fig. S1g, h).
Regulatory Metabolic Engineering
Another advantage of the ANN ensemble protocol compared with

Expression-level optimization
Reduced Libraries (RedLibs)
more commonly used algorithms is that it can produce several different

outputs simultaneously. This allowed us to optimize the violacein
pathway on the basis of the titers and purities at the same time, which is
extremely useful when attempting to eliminate a branch pathway(s). In
ePathOptimize
fact, many, if not most, enzymes are promiscuous as they can catalyze
different reactions and act on various substrates (Tawfik and
Method
Khersonsky, 2010). Promiscuous enzymes are generally found to op-

MiYA
erate at branch points in biosynthetic pathways, therefore making it

essential to avoid pathway intermediates from entering the undesired
Model based
branch point (Lo et al., 2013; Solomon and Prather, 2011; Thodey et al.,
2014). Generally, the activities of promiscuous enzymes produce by-
Screen
Table 1
products with structures that are very similar to the principal ones, and,
except for structurally related differences including substrate
300
positioning, their mechanism of formation is largely the same Learn step used to inform future designs.
(Khersonsky et al., 2006). Given that the inherent catalytic mechanism
of a promiscuous enzyme determines the type of products produced, it Acknowledgments
is difficult to eliminate byproduct production by protein engineering.
For example, for the violacein biosynthetic pathway, by modifying the This work was supported by the National Natural Science
ANN model to report titers and purities, we improved the titer level and Foundation of China (NSFC 21627812, 31725002 and 21676156) and
found the best combination of larger titer and a negligible amount of partially by Bureau of International Cooperation, Chinese Academy of
deoxyviolacein (Fig. S2f). Compared with previous methods that at- Sciences (172644KYSB20170042).
tempted to optimize this pathway (Table 1), our workflow identified
the desired strains with the least amount of testing (24/3125), mini- Appendix A. Supporting information
mized the experimental effort, and assured a high resolution, covering
more expression levels for each enzyme. Specifically, we identified the Supplementary data associated with this article can be found in the
strain that produced pure violacein (> 99% purity) without sacrificing online version at http://dx.doi.org/10.1016/j.ymben.2018.03.020.
harming the maximum amount of violacein that could be produced. By
exhaustively screening a library, it is always possible to find the best References
result, although the library size must be relatively limited to achieve an
exhaustive screen (Jeschek et al., 2016). Other model-based methods Ajikumar, P.K., Xiao, W.-H., Tyo, K.E.J., Wang, Y., Simeon, F., Leonard, E., Mucha, O.,
can reduce the amount of time required to some extent (Lee et al., 2013; Phon, T.H., Pfeifer, B., Stephanopoulos, G., 2010. Isoprenoid pathway optimization
for taxol precursor overproduction in Escherichia coli. Science 330 (80), 70–74.
Xu et al., 2016), but could not find the optimum pathway when two http://dx.doi.org/10.1126/science.1191652.
targets, the titer, and purity of violacein, were specified owing to Alonso-Gutierrez, J., Kim, E.M., Batth, T.S., Cho, N., Hu, Q., Chan, L.J.G., Petzold, C.J.,
characteristics of the model. Hillson, N.J., Adams, P.D., Keasling, J.D., Garcia Martin, H., Lee, T.S., 2015. Principal
component analysis of proteomics (PCAP) as a tool to direct metabolic engineering.
The components of the initial library largely affects the machine- Metab. Eng. 28, 123–133. http://dx.doi.org/10.1016/j.ymben.2014.11.011.
learning algorithm predictions (Sarawagi and Bhamidipaty, 2002; Syed Biggs, B.W., De Paepe, B., Santos, C.N.S., De Mey, M., Kumaran Ajikumar, P., 2014.
et al., 1999) as shown by the different predictions resulting from the Multivariate modular metabolic engineering for pathway and strain optimization.
Curr. Opin. Biotechnol. 29, 156–162. http://dx.doi.org/10.1016/j.copbio.2014.05.
random and pre-screened pools used for the β-carotene pathway. Given 005.
that many possible combinations of enzyme expression levels in solu- Boock, J.T., Gupta, A., Prather, K.L.J., 2015. Screening and modular design for metabolic
tion space produce a less-than-robust result, the risk that the initial li- pathway optimization. Curr. Opin. Biotechnol. 36, 189–198. http://dx.doi.org/10.
1016/j.copbio.2015.08.013.
brary consists of many poorly producing strains is great and accounts
Bordbar, A., Monk, J.M., King, Z.A., Palsson, B.O., 2014. Constraint-based models predict
for the poor robustness of the random pool when strain C14 was ex- metabolic and associated cellular functions. Nat. Rev. Genet. 15, 107–120. http://dx.
cluded. With a pre-screening process, we ensure that several strains doi.org/10.1038/nrg3643.
with high titer could be included in the initial pool. Nevertheless, our Bromberg, N., Durán, N., 2001. Violacein transformation by peroxidases and oxidases:
implications on its biological properties. J. Mol. Catal. - B Enzym. 11, 463–467.
workflow allows for the addition of information gained from prior http://dx.doi.org/10.1016/S1381-1177(00)00171-5.
iterations into the next training dataset, which improves the quality of Burk, M.J., Dien, S. Van, 2016. Biotechnology for chemical production: challenges and
the input library. However, we cannot guarantee that the solution opportunities. Trends Biotechnol. 34, 187–190. http://dx.doi.org/10.1016/j.tibtech.
2015.10.007.
found by our workflow will be the global optimal solution. After several Casini, A., MacDonald, J.T., Jonghe, J. De, Christodoulou, G., Freemont, P.S., Baldwin,
DBTL cycles, the ANN protocol did not return additional new data. G.S., Ellis, T., 2013. One-pot DNA construction for synthetic biology: the modular
Expanding the database so that failures and successes are documented overlap-directed assembly with linkers (MODAL) strategy. Nucleic Acids Res. 42,
1–13. http://dx.doi.org/10.1093/nar/gkt915.
might improve the robustness of a DBTL cycle (Burk and Dien, 2016; Chen, X., Gao, C., Guo, L., Hu, G., Luo, Q., Liu, J., Nielsen, J., Chen, J., Liu, L., 2017.
Poust et al., 2014; Smanski et al., 2014). Alternatively, as we show DCEO biotechnology: tools to design, construct, evaluate, and optimize the metabolic
herein, a colorimetric plate-based pre-screening method would improve pathway for biosynthesis of chemicals. Chem. Rev. http://dx.doi.org/10.1021/acs.
chemrev.6b00804.
the quality of the strains in the initial library. What's more, using the Chen, X., Zhu, P., Liu, L., 2016. Modular optimization of multi-gene pathways for fu-
information concerning the strains in the prescreened pool also helped marate production. Metab. Eng. 33, 76–85. http://dx.doi.org/10.1016/j.ymben.
to reduce the size of the combinatorial space, so that we could avoid 2015.07.007.
Desai, K.M., Vaidya, B.K., Singhal, R.S., Bhagwat, S.S., 2005. Use of an artificial neural
selecting several weak designs, bringing stronger predictions of good
network in modeling yeast biomass and yield of β-glucan. Process Biochem. 40,
designs. 1617–1626. http://dx.doi.org/10.1016/j.procbio.2004.06.015.
Conversely, the sensitivity of colorimetric plate-based screens can Dietrich, J.A., McKee, A.E., Keasling, J.D., 2010. High-throughput metabolic engineering:
vary according to the assays used to determine the product levels, i.e., advances in small-molecule screening and selection. Annu. Rev. Biochem. http://dx.
doi.org/10.1146/annurev-biochem-062608-095938.
low sensitivity when colonies are assayed on plates containing solid Durán, N., Antonio, R.V., Haun, M., Pilli, R.A., 1994. Biosynthesis of a trypanocide by
medium to high sensitivity when products are extracted and their Chromobacterium violaceum. World J. Microbiol. Biotechnol. 10, 686–690. http://dx.
concentrations are measured (Dietrich et al., 2010). In our study, doi.org/10.1007/BF00327960.
Durán, N., Justo, G.Z., Ferreira, C.V., Melo, P.S., Cordi, L., Martins, D., 2007. Violacein:
strains with higher violacein titers than those in the initial library were properties and biological activities. Biotechnol. Appl. Biochem. 48, 127. http://dx.
found via the machine-learning step. Screening technologies with doi.org/10.1042/BA20070115.
greater sensitivity include those that couple a high-throughput readout Durán, N., Menck, C.F., 2001. Chromobacterium violaceum: a review of pharmacological
and industrial perspectives. Crit. Rev. Microbiol. 27, 201–222. http://dx.doi.org/10.
of the titer (i.e., using an absorbance or fluorescence, cell-sorting pro- 1080/20014091096747.
tocol) with the underlying genotype to sample a larger combinatorial Eggeling, L., Bott, M., Marienhagen, J., 2015. Novel screening methods-biosensors. Curr.
space (Boock et al., 2015; Eggeling et al., 2015; Libis et al., 2016; Liu Opin. Biotechnol. 35, 30–36. http://dx.doi.org/10.1016/j.copbio.2014.12.021.
Engler, C., Gruetzner, R., Kandzia, R., Marillonnet, S., 2009. Golden gate shuffling: a one-
et al., 2015; Qian and Cirino, 2016; Williams et al., 2016). In the future,
pot DNA shuffling method based on type ils restriction enzymes. PLoS One 4. http://
incorporation of well-developed high-throughput methods into our dx.doi.org/10.1371/journal.pone.0005553.
workflow would be helpful in high quality genotype and phenotype Engler, C., Kandzia, R., Marillonnet, S., 2008. A one pot, one step, precision cloning
method with high throughput capability. PLoS One 3. http://dx.doi.org/10.1371/
association data, thereby accelerating the associated DBTL cycles.
journal.pone.0003647.
In summary, we envision our workflow that incorporates standar- Fang, M., Wang, T., Zhang, C., Bai, J., Zheng, X., Zhao, X., Lou, C., Xing, X.H., 2016.
dized building blocks and an ensemble machine-learning strategy, will Intermediate-sensor assisted push-pull strategy and its application in heterologous
generally aid efforts that improve metabolite production in S. cerevisiae deoxyviolacein production in Escherichia coli. Metab. Eng. 33, 41–51. http://dx.doi.
org/10.1016/j.ymben.2015.10.006.
and that reduce or eliminate accumulation of byproducts and/or in- Gibson, D.G., Young, L., Chuang, R.-Y., Venter, J.C., Hutchison, C.A., Smith, H.O., Iii,
termediates. Standardization of the building blocks reduces the effort C.A.H., America, N., 2009. Enzymatic assembly of DNA molecules up to several
expended in the Build step, and machine learning contributes to the hundred kilobases. Nat. Methods 6, 343–345. http://dx.doi.org/10.1038/nmeth.
301
1318. Optimization of fermentation conditions for the production of human soluble ca-
Guo, Y., Dong, J., Zhou, T., Auxillos, J., Li, T., Zhang, W., Wang, L., Shen, Y., Luo, Y., techol-O-methyltransferase by Escherichia coli using artificial neural network. J.
Zheng, Y., Lin, J., Chen, G.Q., Wu, Q., Cai, Y., Dai, J., 2015. YeastFab: the design and Biotechnol. 160, 161–168. http://dx.doi.org/10.1016/j.jbiotec.2012.03.025.
construction of standard biological parts for metabolic engineering in Saccharomyces Smanski, M.J., Bhatia, S., Zhao, D., Park, Y., Woodruff, L.B.A., Giannoukos, G., Ciulla, D.,
cerevisiae. Nucleic Acids Res. 43, e88. http://dx.doi.org/10.1093/nar/gkv464. Busby, M., Calderon, J., Nicol, R., Gordon, D.B., Densmore, D., Voigt, C.A., 2014.
Hughes, R.A., Ellington, A.D., 2017. Synthetic DNA synthesis and assembly: putting the Functional optimization of gene clusters by combinatorial design and assembly. Nat.
synthetic in synthetic biology. Cold Spring Harb. Perspect. Biol. 9. http://dx.doi.org/ Biotechnol. 32, 1241–1249. http://dx.doi.org/10.1038/nbt.3063.
10.1101/cshperspect.a023812. Smola, A.J., Sch, B., Schölkopf, B., 2004. A tutorial on support vector regression. Stat.
Jeschek, M., Gerngross, D., Panke, S., 2016. Rationally reduced libraries for combina- Comput. 14, 199–222. http://dx.doi.org/10.1023/B:STCO.0000035301.49549.88.
torial pathway optimization minimizing experimental effort. Nat. Commun. 7, Solomon, K.V., Prather, K.L.J., 2011. The zero-sum game of pathway optimization:
11163. http://dx.doi.org/10.1038/ncomms11163. emerging paradigms for tuning gene expression. Biotechnol. J. 6, 1064–1070. http://
Jiang, P.X., Wang, H.S., Zhang, C., Lou, K., Xing, X.H., 2010. Reconstruction of the vio- dx.doi.org/10.1002/biot.201100086.
lacein biosynthetic pathway from Duganella sp. B2 in different heterologous hosts. Syed, N.A., Liu, H., Sung, K.K., 1999. Handling concept drifts in incremental learning with
Appl. Microbiol. Biotechnol. 86, 1077–1088. http://dx.doi.org/10.1007/s00253- support vector machines. In: Proceedings of the Fifth ACM SIGKDD Int. Conf. Knowl.
009-2375-z. Discov. data Min. - KDD ’99, pp. 317–321. 〈http://dx.doi.org10.1145/312129.
Jones, J.A., Vernacchio, V.R., Lachance, D.M., Lebovich, M., Fu, L., Shirke, A.N., Schultz, 312267〉.
V.L., Cress, B., Linhardt, R.J., Koffas, M.A.G., 2015. ePathOptimize: a combinatorial Tawfik, S.D., Khersonsky, O., 2010. Enzyme promiscuity: a mechanistic and evolutionary
approach for transcriptional balancing of metabolic pathways. Sci. Rep. 5, 11301. perspective. Annu. Rev. Biochem. 79, 471–505. http://dx.doi.org/10.1146/annurev-
http://dx.doi.org/10.1038/srep11301. biochem-030409-143718.
Juminaga, D., Baidoo, E.E.K., Redding-Johanson, A.M., Batth, T.S., Burd, H., Teresa Caldeira, A., Arteiro, J.M., Roseiro, J.C., Neves, J., Vicente, H., 2011. An artificial
Mukhopadhyay, A., Petzold, C.J., Keasling, J.D., 2012. Modular engineering of L- intelligence approach to Bacillus amyloliquefaciens CCMI 1051 cultures: application
tyrosine production in Escherichia coli. Appl. Environ. Microbiol. 78, 89–98. http:// to the production of anti-fungal compounds. Bioresour. Technol. 102, 1496–1502.
dx.doi.org/10.1128/AEM.06017-11. http://dx.doi.org/10.1016/j.biortech.2010.07.080.
Khersonsky, O., Roodveldt, C., Tawfik, D.S., 2006. Enzyme promiscuity: evolutionary and Thodey, K., Galanie, S., Smolke, C.D., 2014. A microbial biomanufacturing platform for
mechanistic aspects. Curr. Opin. Chem. Biol. 10, 498–508. http://dx.doi.org/10. natural and semisynthetic opioids. Nat. Chem. Biol. 10, 1–10. http://dx.doi.org/10.
1016/j.cbpa.2006.08.011. 1038/nchembio.1613.
Lee, M.E., Aswani, A., Han, A.S., Tomlin, C.J., Dueber, J.E., 2013. Expression-level op- Tominaga, D., Mori, K., Aburatani, S., 2016. Linear and nonlinear regression for combi-
timization of a multi-enzyme pathway in the absence of a high-throughput assay. natorial optimization problem of multiple transgenesis. IPSJ Trans. Bioinform. 9,
Nucleic Acids Res. 41, 10668–10678. http://dx.doi.org/10.1093/nar/gkt809. 7–11. http://dx.doi.org/10.2197/ipsjtbio.9.7.
Libis, V., Delépine, B., Faulon, J.L., 2016. Sensing new chemicals with bacterial tran- Verwaal, R., Wang, J., Meijnen, J.P., Visser, H., Sandmann, G., Van Den Berg, J.A., Van
scription factors. Curr. Opin. Microbiol. 33, 105–112. http://dx.doi.org/10.1016/j. Ooyen, A.J.J., 2007. High-level production of beta-carotene in Saccharomyces cere-
mib.2016.07.006. visiae by successive transformation with carotenogenic genes from
Liu, D., Evans, T., Zhang, F., 2015. Applications and advances of metabolite biosensors for Xanthophyllomyces dendrorhous. Appl. Environ. Microbiol. 73, 4342–4350. http://
metabolic engineering. Metab. Eng. 31, 15–22. http://dx.doi.org/10.1016/j.ymben. dx.doi.org/10.1128/AEM.02759-06.
2015.06.008. Williams, T.C., Pretorius, I.S., Paulsen, I.T., 2016. Synthetic evolution of metabolic pro-
Lo, T.M., Teo, W.S., Ling, H., Chen, B., Kang, A., Chang, M.W., 2013. Microbial en- ductivity using biosensors. Trends Biotechnol. 34, 371–381. http://dx.doi.org/10.
gineering strategies to improve cell viability for biochemical production. Biotechnol. 1016/j.tibtech.2016.02.002.
Adv. 31, 903–914. http://dx.doi.org/10.1016/j.biotechadv.2013.02.001. Wu, J., Du, G., Zhou, J., Chen, J., 2013a. Metabolic engineering of Escherichia coli for
Long, C.P., Antoniewicz, M.R., 2014. Metabolic flux analysis of Escherichia coli knockouts: (2S)-pinocembrin production from glucose by a modular metabolic strategy. Metab.
lessons from the Keio collection and future outlook. Curr. Opin. Biotechnol. 28, Eng. 16, 48–55. http://dx.doi.org/10.1016/j.ymben.2012.11.009.
127–133. http://dx.doi.org/10.1016/j.copbio.2014.02.006. Wu, J., Liu, P., Fan, Y., Bao, H., Du, G., Zhou, J., Chen, J., 2013b. Multivariate modular
MATLAB, 2015. MATLAB and Statistics Toolbox Release R2015b. Natick, Massachusetts, metabolic engineering of Escherichia coli to produce resveratrol from L-tyrosine. J.
The MathWorks Inc. Biotechnol. 167, 404–411. http://dx.doi.org/10.1016/j.jbiotec.2013.07.030.
Nielsen, J., Keasling, J.D., 2016. Engineering cellular metabolism. Cell. http://dx.doi.org/ Wu, J., Zhou, T., Du, G., Zhou, J., Chen, J., 2014. Modular optimization of heterologous
10.1016/j.cell.2016.02.004. pathways for de Novo synthesis of (2S)-Naringenin in Escherichia coli. PLoS One 9,
Peng, W., Zhong, J., Yang, J., Ren, Y., Xu, T., Xiao, S., Zhou, J., Tan, H., 2014. The 1–10. http://dx.doi.org/10.1371/journal.pone.0101492.
artificial neural network approach based on uniform design to optimize the fed-batch Xie, W., Liu, M., Lv, X., Lu, W., Gu, J., Yu, H., 2014. Construction of a controllable β-
fermentation condition: application to the production of iturin A. Microb. Cell Fact. carotene biosynthetic pathway by decentralized assembly strategy in Saccharomyces
13, 54. http://dx.doi.org/10.1186/1475-2859-13-54. cerevisiae. Biotechnol. Bioeng. 111, 125–133. http://dx.doi.org/10.1002/bit.25002.
Poust, S., Hagen, A., Katz, L., Keasling, J.D., 2014. Narrowing the gap between the pro- Xu, P., Gu, Q., Wang, W., Wong, L., Bower, A.G.W., Collins, C.H., Koffas, M.A.G., 2013.
mise and reality of polyketide synthases as a synthetic biology platform. Curr. Opin. Modular optimization of multi-gene pathways for fatty acids production in E. coli.
Biotechnol. 30, 32–39. http://dx.doi.org/10.1016/j.copbio.2014.04.011. Nat. Commun. 4, 1408–1409. http://dx.doi.org/10.1038/ncomms2425.
Qian, S., Cirino, P.C., 2016. Using metabolite-responsive gene regulators to improve Xu, P., Rizzoni, E.A., Sul, S.-Y., Stephanopoulos, G., 2016. Improving metabolic pathway
microbial biosynthesis. Curr. Opin. Chem. Eng. 14, 93–102. http://dx.doi.org/10. efficiency by statistical model-based multivariate regulatory metabolic engineering.
1016/j.coche.2016.08.020. ACS Synth. Biol. http://dx.doi.org/10.1021/acssynbio.6b00187. (acssynbio.
Quan, J., Tian, J., 2014. Circular polymerase extension cloning. Methods Mol. Biol. 1116, 6b00187).
103–117. http://dx.doi.org/10.1007/978-1-62703-764-8_8. Yokobayashi, Y., Weiss, R., Arnold, F.H., 2002. Directed evolution of a genetic circuit.
Rettori, D., Durán, N., 1998. Production, extraction and purification of violacein: an Proc. Natl. Acad. Sci. USA 99, 16587–16591.
antibiotic pigment produced by Chromobacterium violaceum. World J. Microbiol. Yuan, T., Guo, Y., Dong, J., Li, T., Zhou, T., Sun, K., Zhang, M., Wu, Q., Xie, Z., Cai, Y.,
Biotechnol. 14, 685–688. http://dx.doi.org/10.1023/A:1008809504504. Cao, L., Dai, J., 2017. Construction, characterization and application of a genome-
Ryan, K.S., Drennan, C.L., 2009. Divergent pathways in the biosynthesis of bisindole wide promoter library in Saccharomyces cerevisiae. Front. Chem. Sci. Eng. 11,
natural products. Chem. Biol. 16, 351–364. http://dx.doi.org/10.1016/j.chembiol. 107–116. http://dx.doi.org/10.1007/s11705-017-1621-7.
2009.01.017. Zhang, S., Zhao, X., Tao, Y., Lou, C., 2015. A novel approach for metabolic pathway
Sarawagi, S., Bhamidipaty, A., 2002. Interactive deduplication using active learning. In: optimization: oligo-linker mediated assembly (OLMA) method. J. Biol. Eng. 9, 23.
Proceedings of the Eighth ACM SIGKDD Int. Conf. Knowl. Discov. data Min. - KDD http://dx.doi.org/10.1186/s13036-015-0021-0.
’02, 269. 〈http://dx.doi.org10.1145/775047.775087〉. Zhou, H., Vonk, B., Roubos, J.A., Bovenberg, R.A.L., Voigt, C.A., 2015. Algorithmic co-
Shao, Z., Zhao, H., Zhao, H., 2009. DNA assembler, an in vivo genetic method for rapid optimization of genetic constructs and growth conditions: application to 6-ACA, a
construction of biochemical pathways. Nucleic Acids Res. 37, 1–10. http://dx.doi. potential nylon-6 precursor. Nucleic Acids Res. 43, 10560–10570. http://dx.doi.org/
org/10.1093/nar/gkn991. 10.1093/nar/gkv1071.
Silva, R., Ferreira, S., Bonifácio, M.J., Dias, J.M.L., Queiroz, J.A., Passarinha, L.A., 2012.
302

MiYA, An Efficient Machine-Learning Workflow in Conjunction With The YeastFab Assembly Strategy For Combinatorial Optimization of Heterologous Metabolic Pathways in Saccharomyces Cerevisiae

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

MiYA, An Efficient Machine-Learning Workflow in Conjunction With The YeastFab Assembly Strategy For Combinatorial Optimization of Heterologous Metabolic Pathways in Saccharomyces Cerevisiae

Enviado por

Direitos autorais:

Formatos disponíveis

Metabolic Engineering 47 (2018) 294–302

Contents lists available at ScienceDirect

MiYA, an eﬃcient machine-learning workﬂow in conjunction with the T

1. Introduction activity in heterologous biosynthetic pathways (Yokobayashi et al.,

2.1. Compounds and cultivation of strains 2.2.3. Phenotype testing

2.2.1. Standardization Generation of a ﬁnal ANN ensemble based on a common back-

Fig. 2. ANN ensemble. The structure of the

(Lee et al., 2013)

(Xu et al., 2016)

that of produced by V426. The composition of the promoters in the six

In this study, we designed a workﬂow of MiYA for combinatorial

2016) for optimization of metabolic pathways. Theoretically, a brute-

excessive time and resources (Dietrich et al., 2010). Instead, our

workﬂow enables us to search a small fraction (2–5%) of combinatorial

within several weeks and with no need of any substantial knowledge

about the pathway.

only a small training dataset, we found that generation of a re-

large amount of β-carotene or violacein was not possible, although we

strains was stable. Therefore, we took advantage of the ensemble

sidered individually. This procedure reduced the noise in the training

linear regression method, which is the simplest way to represent gene

interactions (Tominaga et al., 2016), and the supporting vector re-

gression method, which is good at handling ﬁnite samples (Smola et al.,

Another advantage of the ANN ensemble protocol compared with

more commonly used algorithms is that it can produce several diﬀerent

Khersonsky, 2010). Promiscuous enzymes are generally found to op-

erate at branch points in biosynthetic pathways, therefore making it

Você também pode gostar