Escolar Documentos
Profissional Documentos
Cultura Documentos
Simon X. Han2, Marshall J. Levesque2, Kohei Ichikawa3, Susumu Date1, Jason H. Haga2 1 Cybermedia Center, Osaka University, Osaka, Japan 2 Department of Bioengineering, University of California, San Diego, La Jolla, CA 3 Research Institute of Socionetwork Strategies, Kansai University, Osaka, Japan xhan@ucsd.edu, mlevesqu@ucsd.edu, ichikawa@ycss.kansai-u.ac.jp, date@ais.cmc.osakau.ac.jp, jhaga@bioeng.ucsd.edu
growth and death, it has implications in the progression of different diseases such as Alzheimers disease, diabetes, and cancer [6]; however, the complexity of SHP-2 function makes it difficult to elucidate the signaling pathways that are regulated by SHP-2. The objective of this study was to identify several potential inhibitors of SHP-2 function by performing grid-enabled virtual screening experiments with the crystal structure of SHP-2. Although the methods employed in this study were similar to that reported in a companion paper [20], the details of the results obtained and their biological significance are different. Complications of performing multiple, routine virtual screenings on the grid are also described and their potential solutions are discussed accordingly. The results of this study will provide important pharmacologic tools that will help to better understand SHP-2 function and provide promising leads to clinical treatments for various diseases.
Abstract
SHP-2 is a protein tyrosine phosphatase (PTP) that plays an important role in many cellular functions such as development, growth, and death; thus SHP-2 has been hypothesized to play an important role in various diseases such as diabetes, neurodegeneration, and cancer. The importance of the individual roles of different PTPs is not well understood and this is complicated by the lack of specific inhibitors. In this study, we have utilized the multi-institutional PRAGMA Grid computation resources to virtually screen the ZINC 7 database using virtual docking software DOCK 6.2. Preliminary results suggest several SHP-2 specific inhibitors that can be further tested and validated under laboratory conditions. Complications during these multiple, virtual screenings on the grid as well as potential improvements are also discussed. These findings have future clinical significance in the creation of new drug therapies for the treatment of different diseases.
2. Methods
This study employed virtual screening experiments to identify potential inhibitor compounds for a specific enzymatic target using molecular docking software. This method has proven to work successfully in drug discovery [12]. DOCK 6.2 was the software used to go through a database of small compound structures and simulate the molecular interactions with the target protein structure [15]. A number of different scoring algorithms included in DOCK, such as the grid energy and AMBER scoring methods, were used since it has been shown that the most successful docking results are those that consult different scoring algorithms. The docking algorithms orient compound structures in the binding pocket of the protein molecule and energy scores are calculated and assigned to the paired complex. These scores are used to rank the database of compounds, creating a list of potential inhibitor compounds ordered best to worst. An idealized experiment would screen an extensive chemical library of compounds with the most accurate docking and scoring methods available. However, more accurate scoring and larger databases
1. Introduction
SHP-2 is a ubiquitously expressed cytoplasmic PTP that contains two Src-homology-2 (SH2) domains, a catalytic PTP domain, and a C-terminal domain [6, 10, 11, 13]. Under non-stimulated conditions, the protein is in an inactive state, where the N-terminal SH2 domain blocks the catalytic site from being accessible [11]. When SHP-2 becomes active, the protein structure changes, exposing the catalytic site and allows it to dephosphorylate other substrates [11]. Dephosphorylation of specific proteins can modulate cellular functions. SHP-2 has been found to be primarily a positive regulator in many different cellular functions including growth, death, and development [6, 8]. Some examples include enhancing the process of programmed cell death by dephosphorylating the STAT5 protein and promoting neural cell growth [6]. There is also evidence that SHP-2 plays a negative role in cellular functions [6]. Because of the important role of SHP-2 in cellular
require increasing amounts of computational resources. Deploying DOCK over grid resources makes this type of experiment a viable strategy for laboratories of any size.
grid-enabled DOCK services tied together and automated with Perl scripts, an entire virtual screening experiment can be distributed from a central (master) cluster to be executed across the remote clusters making up the PRAGMA Grid [3,18]. The simple and standardized software tools offer a highly flexible and customizable docking platform where the tremendous power and cost-efficiency of the grid can be utilized. However, the sheer number of compounds in the ZINC database and advanced docking methods can still take considerable resources. With this in mind, the screening was split into two phases to screen the database exhaustively and efficiently. Table 1. Resources used Cluster Processors Location Rocks-52 28 SDSC, US Tea01 80 Osaka U, JP Cafe01 64 Osaka U, JP Ocikbpra 32 U of Zurich, CH Lzu 22 LanZhou U, CN
The parameters for each screen were also adjusted with respect to the availability of grid resources and experiment deadlines.
3. Results
In this experiment, the catalytic site of protein tyrosine phosphatase SHP-2 was successfully screened against the drug-like and lead-like databases and used up to 137 processors from 5 clusters.
drug-like
database
are
Figure 2. Visualization of the fifth ranked compound (ZINC 4025466) from the druglike screening. high score for this interaction. The next compound that reasonably interacted with the SHP-2 catalytic pocket is shown in Figure 2. This compound was ranked fifth and fit well in the binding pocket of SHP-2. Intensive interaction was demonstrated by the numerous hydrogen bonds (green lines) connecting oxygen atoms (red) of the binding compound to amino acid residues (orange sticks) within the catalytic site of SHP-2. Similar to the drug-like database screening, 22,938 of the 972,608 lead-like compounds were ranked and the best 20 binding compounds are presented in Table 6. It is interesting to note that two of these compounds (ZINC3097907 and ZINC1532056) appear in both of these lists with the same energy scores, but have different rankings. This is due to the fact that these compounds have both drug-like and lead-like properties. Table 6. Top 20 lead-like compounds
RANKINGS Rank 1 2 3 4 5 6 7 8 9 10 11 12 ZINC ID 5518020 3097907 0405809 1532056 5478334 5413470 5413467 3953252 3115745 2116249 8030102 2431301 Total 20 25 30 169 249 418 450 496 587 702 719 799 Dock 16 23 27 169 238 401 437 495 400 153 667 86 Amber 4 2 3 n/a 11 17 13 1 187 549 52 713 SCORES Dock -95.9 -90.1 -82.8 -66.2 -64.8 -62.8 -62.5 -62.1 -62.8 -66.4 -60.9 -68.7 Amber -4x10 -5x10 -2x10 n/a -868 -214 -231 -2x10
6 4 5 5
Figure 1 is an image of the first ranked compound from the drug-like screening. It appears to interact very well with the catalytic pocket (denoted by the purple box) of SHP-2, however, careful inspection revealed a subtle irregularity where one atom intersected with the SHP-2 surface, as indicated in the red circle. This may have contributed to the relatively
Figure 1. Visualization of the first ranked compound (ZINC 1717339) from the druglike screening.
13 14 15 16 17 18 19 20
Visual inspection of the top five ranked compounds again showed erroneous or no interaction with SHP-2 catalytic pocket. When ZINC3097907 was visualized as shown in Figure 3A and 3B, it was apparent that the compound was partially embedded in the protein, again causing the extraordinarily high AMBER score (-5x105). The sixth compound had reasonable binding to SHP-2 as illustrated in Figure 4. Again, good interaction was evident with the presence of numerous
Figure 4. Visualization of the sixth ranked compound (ZINC 5413470) from the leadlike screening hydrogen bonds (green lines) connecting oxygen atoms (red) of the compound to various amino acids in the catalytic site (orange sticks). This provides a good alternative compound with a different structure and chemical properties that may inhibit SHP-2 effectively. The diverse interaction locations of ZINC5413470 suggest it may have a different level of specificity with SHP-2, compared to the compounds identified in the drug-like screen. Our results showed that of the top 20 compounds from each database, sulfonic acid (Fig 5A) motifs stand out in the rankings. There are 9 sulfonic acids compared to 5 carboxylic acids (the next frequent) in Fig 5C. It is interesting to note that phosphinic acids, shown in Fig 5F, are ranked the highest but have a lower frequency (4 excluding duplicates). Other compounds include 4 propanoic acids (Fig 5D), and 4 phosphonic acids (Fig 5E). The highest ranked visually confirmed compounds in the drug-like and lead-like databases are phosphonic acids, suggesting that phosphinic and sulfonic acids are more prone to generate false positives if docked with phosphatases.
B Figure 3a and 3b. Visualization of the second ranked compound (ZINC 3097907) from the lead-like screening A B
F Figure 5. Chemical motifs of the top ranked drug-like and/or lead-like compounds
increase in AMBER parameters was not implemented. It was also found that emphasizing the initial score minimization (before md) resulted in better score optimization. DOCK issues: Segmentation faults of DOCK were observed in this experiment, however, contrary to the previous study by Levesque, et al. [3], compounds causing segmentation faults in this experiment did not share a common trait and only occurred during AMBER screen. This error was found to be independent of grid resources because the same fault occurred when the job was re-screened on different clusters. Removing the problematic compounds eliminated the fault and the other compounds completed without errors. Currently, the input file preparation cannot start again right after removing the faulty ligand and must be restarted manually. These compounds (30/3,039,514) represent less than 0.001% of the three databases screened and should therefore not be thought of as a deterrent to using virtual screening to identify potential inhibitors. Disk storage issues: The screening of SHP-2 against two databases totaling 3,039,514 compounds produced a great deal of data and results. The total amount of disk space used is summarized in Table 10. AMBER screening requires many input files and these files can take as long to generate as it did to compute the AMBER energy score. Table 10. Summary of disk space used
Cluster Rocks-52 Tea01 Cafe-01 Ocikbpra Lzu Total Space Used 38GB 94GB 111GB 30GB+ (compressed to 11GB) 52GB 325GB+
Energy and AMBER scores calculated using the rescreen parameters in Tables 3 and 4 produced satisfactory results without consuming large amounts of grid resources. Test sets have shown that energy scores produced by the second phase energy parameters were very similar to more stringent parameters (four-fold increase in first screen parameters) while completed docking in 2/5 the time. Comparison of scores produced by AMBER parameters to scores produced by a four-fold increase in the same parameters showed that not only did Table 4s parameters produce much more minimized scores, but also only took the time. Thus, the four-fold
In a slice of 573 compounds, the input files amounted to 1.5 GB. For three AMBER screens totaling 56,980 compounds, at least 150GB of data can be expected. This can interfere with the logistics of a run, especially on older clusters where disk limitations may prevent data gathering. Additionally, since all users share the same disk allocation on each cluster, a single user with unrestricted disk usage can inconvenience other users. Thus, in some cases, the data was compressed using standard zip commands. For instance, the data for a slice of 243 compounds required 620 MB of uncompressed space, but after compression the data only required 115 MB. Although the compression reduced the disk space usage, it added a layer of complexity to the collecting, ranking and overall organization of the data.
4. Conclusions
Virtual docking on the grid is an effective and efficient method to screen compound databases for biomedical purposes such as drug discovery. Our experiment has produced a list of potential SHP-2 specific inhibitors that are in the process of being validated in wet-bench experiments and will be used to study SHP-2 signaling pathways in cells. These compound have potential clinical applications. However, further testing must be performed to verify that these compounds are membrane permeable i.e. be able to enter the cell and are effective inhibitors of SHP-2 inside the cell. Although DOCK is an established program capable of delivering results, Figures 2 and 4 show that DOCK is not foolproof. It is ultimately a tool aimed in aiding scientific discoveries and requires further experimental verification. In the AMBER run, a small portion of the input ligands failed to prepare and caused a segmentation fault. The preparation generally fails because of missing force field parameters, charge issues, bond issues, or atom issues. In the case of the missing parameters, they are recoverable with the auxiliary program Antechamber. In the remaining three cases, the problem may lie in an improper structure of the ligand file. Because of the very small number of problem molecules, it was not considered a significant issue during the screening process. Moreover, a complete diagnosis of the problem would have required substantial knowledge in the workings of AMBER auxiliary programs [19] and is beyond the scope of this project. From Tables 7 and 8 we can see that all clusters contributed greatly to the virtual screening experiment and the absence of even of one of these clusters can significantly increase the virtual screening time. This highlights the importance of the collaborative nature of grid computing. Consideration of other grid users becomes an issue during routine virtual screening experiments, especially with regard to disk space requirements. During the virtual screening experiment, the data cannot exist in compressed form since the AMBER input files must be read by DOCK. Additionally, the current scripts designed for the collection and ranking of the screening results are not compatible with compressed data, hence requiring the data to be uncompressed manually in order to locate the files of interest. During the course of these virtual screens, an updated version of the ZINC drug-like database was released with over 5 million compounds [1]. The input files alone would require an estimated 13 TB of total disk space. With the increasing size of chemical compound databases, other solutions should
be considered such as the addition of more disk space to the grid environment. If multiple users are performing virtual screenings using the same chemical database, placement of one set of DOCK input files on a cluster system that could be accessed by each user is another option, but security and access privileges would have to be addressed in this scenario. Deployment of DOCK on a grid environment remains feasible, but still offers some challenges. The drug-like AMBER screen, for example, was expected to take 4 days to complete (2 days of docking and 2 days of input file generation). However, the screening required 11 days, with much of the time lost to restarting and continuing the screening due to cluster specific, grid resource errors. A Java error on the rocks-52 cluster related to zombie processes resulted in the discontinuation of the managing Perl script on the master cluster and required manually restarting the screening. This is discussed in a companion paper [20]. There were also user uncontrollable grid resource related situations. Cluster maintenance for example, did disconnect a cluster, affecting the jobs running there. An intriguing aspect of grid resource availability was the high number of users. It was observed that some DOCK jobs were in queue for many days before finally running or being cancelled. On these busy clusters, such as Rocks-52 and Cafe01, the minimum number of processors to be used in the virtual screen was lowered to take advantage of any processors that may become available. Since DOCK is an inherently processor-heavy application, busy clusters may not have enough free processors to allow screenings, as freed processors can get immediately taken by the more versatile one processor jobs. This suggests the need of more advanced schedulers where the relative queuing times of different jobs are taken into consideration [21]. It can be expected that improvements will continue be built on the current platform to further advance the role of the Grid and computer technology in the field of biomedical sciences.
5. Acknowledgments
The authors would like to acknowledge support from the UCSD Pacific Rim Experiences for Undergraduates program (PRIME NSF INT 0407508 and NSF OISE 0710726), the California Institute for Telecommunication and Information Technology (Calit2), and Osaka University's Fostering of Globallyleading Researchers in Integrated Sciences program funded by MEXT. We appreciate PRAGMA for the use of the Grid testbed and technical support. Molecular graphics images were produced using
the UCSF Chimera package from the Resource for Biocomputing, Visualization, and Informatics at the UC San Francisco (supported by NIH P41 RR-01081).
Dual-Specificity Enzyme SSH-2 via Docking Experiments on the Grid, 4th IEEE International Conference on eScience, 2008, In press. [21] Personal communication, Blair Bethwaite, 2008.
6. References
[1] J.J. Irwin and B.K. Shoichet, ZINC - A Free Database for Virtual Screening, 2006, http://zinc.docking.org/. [2] PRAMGA Grid Resources http://pragma-goc.rocks clusters.org/pragma-doc/resources.html. [3] M.J. Levesque, K. Ichikawa, S. Date, J.H. Haga, Bringing Flexibility to Virtual Screening for Enzymatic Inhibitors on the Grid, Grid 2008, In press. [4] Protein Data Bank http://www.rcsb.org/pdb /explore.do?structureId=3B7O. [5] P.T. Lang and S. Brozell, Preparing Molecules for DOCKing, 2007, http://dock.compbio.ucsf.edu/DOCK_6/ tutorials/struct_prep/prepping_molecules.htm. [6] Z.Z. Chong and K. Maiese, The Src homology 2 domain tyrosine phosphatase SHP-1 and SHP-2: diversified control of cell growth, inflammation, and injury, Histo Histopathol., Cellular and Molecular Biology, pp. 1-3. [7] UCSF Chimera http://www.cgl.ucsf.edu/chimera/. [8] D. Barford and B.G. Neel, Revealing mechanisms for SH2 domain mediated regulation of the protein tyrosine phosphatase SHP-2, Structure, Current Biology Ltd, 1998, pp. 1. [9] S. Makino and I. D. Kuntz, Automated flexible ligand docking method and its application for database search, J. Comp. Chem., 1997, pp. 1812-1825. [10] H. Wheadon, N.R.D. Paling, and M.J. Welham, Molecular interactions of SHP1 and SHP2 in IL-3signalling, Cell. Signalling, Elsevier, 2002, pp. 1. [11] N.K. Tonks, Protein tyrosine phosphatase: from genes, to function, to disease, Mol. Cell Biol., Nature Publishing Group, 2006, pp. 1-11. [12] W.L. Jorgensen et al., The Many Roles of Computation in Drug Discovery, Science, AAAS, Washington DC, 2004, pp. 3. [13] M. Stein-Gerlach, C. Wallasch, and A. Ullrich, SHP-2, SH2-containing protein tyrosine phosphatase-2, The International Journal of Biochemistry & Cell Biology, Pergamon, 1998, pp. 1. [14] P.T. Lang, D. Moustakas et al., DOCK 6.1 Users Manual, 2007, http://dock.compbio.ucsf.edu/DOCK_6/ dock6_manual.htm. [15] UCSF DOCK http://dock.compbio.ucsd.edu/. [16] P.T. Lang, D. Moustakas et al., DOCK 6.1 Users Manual, 2007, http://dock.compbio.ucsf.edu/DOCK_6/ tutorials/ligand_sampling_dock/ligand_sampling_dock.html. [17] P.T. Lang, D. Moustakas et al., DOCK 6.1 Users Manual, 2007, http://dock.compbio.ucsf.edu/DOCK_6/ tutorials/grid_generation/generating_grid.html. [18] M.J. Levesque, K. Ichikawa, S. Date, and J.H. Haga, Design of a Grid Service-based Platform for In Silico Protein-Ligand Screenings, Comp. Meth. Prog. Biomed., 2008, doi:10.1016/j.cmpb.2008.07.005. [19] Personal communication, Scott Bozell, Scripps Research Institute, 2008. [20] P.D. Pham, M.J. Levesque, K. Ichikawa, S Date, and J.H. Haga, Identification of a Specific Inhibitor for the