Escolar Documentos
Profissional Documentos
Cultura Documentos
/ Accelrys
PP-Fun_Lec - 1
Outline
onlyTheBest
PP-Fun_Lec - 2
Cl
Clean
Suspect
S
tD
Data
t
Real-Time Calculations
Multiple Data Sources
Data Reduction
Write Output
onlyTheBest
onlyTheBest
PP-Fun_Lec - 3
Sequence object
What is a Component?
A component is the building block used
to create workflows
Each component performs a task like
reading, writing or manipulating data
Components can have one input and up
to two output ports
Highlighting a component displays its
parameter panel which control its
behavior
onlyTheBest
What is a Pipeline?
A series of components connected through pipes
through which data flows.
Each component acts on the data and passes it on to
subsequent components.
onlyTheBest
PP-Fun_Lec - 4
Seq
File Reader
Molecular Weight
Notepad Viewer
Calculates MW and
adds to record
Displays records as
rows in NotePad
onlyTheBest
What is a Protocol?
A protocol consists of one or
multiple pipelines that are
run sequentially.
Components are executed
left to right, top to bottom.
Pipeline 2 will not start
processing data until
pipeline 1 has completely
finished processing all the
records.
onlyTheBest
MolW
>gi|2695850|emb
>gi|2695846|emb
>gi|2695852|emb
TGGTTACAACACTTTCT
TCTGCTGGTTACAACAC
CAAGAACCACAATACTG
TCTTTCAATAACCACAA
TTTCTTCTTTCAATAAC
CAGTACAATGGGGATTT
TACTGCAGTACAATGGG
CACAATACTGCAGTACA
TAACAGCTCTCTGTATA
GATTTTAACAGCTCTCT
ATGGGGATTTTAACAGC
ATAATGACAGCTCTATC
GTATAATAATGA...
TCTCTGTATAAT...
AAGTGTCCGGTC...
PP-Fun_Lec - 5
Favorites
Pipeline
Explorer Window
Component
Protocol
Task bar
Status bar
Parameter Panel
Help Text
onlyTheBest
Finding Information in
Protocols
p
Scale drop-down
Zoom by Navigating
onlyTheBest
PP-Fun_Lec - 6
onlyTheBest
Google-like searching
Quick access to glossary
Quick access to
reference help (e.g.
PilotScript)
onlyTheBest
PP-Fun_Lec - 7
Explorer Window
Consists of four tabs
User, e.g. Joe
Contains all protocols and components
for an individual user
Protocols
> 900 realistic examples provided by SciTegic
To be used as-is or as the basis for more complex
protocols
Shared by all users
Components
> 1600 different components provided by SciTegic
Grouped in distinct categories
Shared by all users
All
Everything from User, Protocols and Components
tabs
onlyTheBest
Client-Server Architecture
Server runs protocols
Server accesses databases and
files
Visualization apps open
temporary files on server
Limited transfer of data
between client and server
Files that are not on server
need
Excel
Cli t
Client
UNC
File server
UNC path
Be made sharable
ODBC
Recommendation
Database server
PP-Fun_Lec - 8
onlyTheBest
PP-Fun_Lec - 9
Executing a Protocol
Run
Start execution (F5)
Stop
Interrupt execution
Errors
Flag indicates where
execution stopped
Last error message will be
available until the protocol
is edited
onlyTheBest
onlyTheBest
PP-Fun_Lec - 10
Handling Components
Components can be dragged into a
pipeline
It will try to connect when placed to the
right of an existing component
onlyTheBest
Handling Connections
Connections can be made by dragging
from a fail or pass port of component
1 to the input port of component 2
Connections can be deleted by
selecting them and pressing del
Double-clicking a connection
toggles between pass and fail
Multiple inputs/outputs are allowed
(branching)
onlyTheBest
PP-Fun_Lec - 11
If a Pipe is selected
Anatomy of a Component
onlyTheBest
PP-Fun_Lec - 12
Parameter Grouping
A Group is a way to organize
parameters into categories
Parameters in a group behave
as standard parameters
The parameter that is the
group heading can either
accept a value or not
depending on the parameter
type (GroupType accepts no
value))
onlyTheBest
Documenting a Protocol
1. Editable captions for
each component (highly
recommended)
2. Sticky Notes
3 Documentation for individual
3.
components accessed by rightclicking on the component and
choosing Edit
Purpose
Description
onlyTheBest
PP-Fun_Lec - 13
onlyTheBest
Sequential Execution
Non-connected pipelines are executed one after another
Last record in pipeline 1 will be written to the output file before
the first record is read in pipeline 2
When records are read in, they move as far to the right as possible
Data can be shared between pipelines using either files or global
variables
onlyTheBest
PP-Fun_Lec - 14
File Browse
Server Defined
Shortcuts (additional shortcuts
can be defined by the administrator)
onlyTheBest
Component Disabling
Component disabling
Right-click on
component and
choose Disable from
menu
Options
Pass Data
Fail Data
Halt Pipeline
U
Use F10 k
key tto ttoggle
l
between options
onlyTheBest
PP-Fun_Lec - 15
Exercise 1: Introduction
A. Find the example Aligning Sequences and run it
B. Use a FASTA reader to read in O43291.fa. Calculate
the sequence molecular weight and display in an HTML
table viewer. Place a checkpoint on one of the
components.
C. Read the first 100 records of the
NRDB_nucleotide_10K.fa file and display the results in
an HTML table viewer.
onlyTheBest
PP-Fun_Lec - 16
Component Collections
Components are organized in folders based on their
component collection:
onlyTheBest
Component Types
Within each collection,
components are organized in
folders based on functionality:
Reading
Writing
Viewing
Filtering
Manipulating
Calculating
Etc.
onlyTheBest
PP-Fun_Lec - 17
Data Reading
Generic file readers include the Delimited Text Reader,
Excel Reader, XML Reader, HTML reader
File readers are available for most popular molecular
and sequence formats: SD, MOL2, SMILES, PDB, FASTA,
etc.
Database readers can read from any database format
via ODBC
File readers support: zip files, multiple files (use Ctrlclick to select), wildcards (users\myname\*.txt).
onlyTheBest
File Browse
Server Defined
Shortcuts (additional
y the
shortcuts can be defined by
administrator)
PP-Fun_Lec - 18
Data Writing
Generic file writers include the Delimited Text Writer,
XML Writer, HTML Writer
File writers are available for most popular molecular
and sequence formats: SD, MOL2, SMILES, PDB, FASTA
Database writers can write to any database type that
supports ODBC connections
onlyTheBest
Data Viewing
Viewers run on the client, and third party applications
need to be installed on the client machine: Excel,
Internet Explorer,
Explorer Spotfire
Charting viewers are available using Excel and/or the
Reporting Collection
Dialogs can be used to ask for user input at run time
onlyTheBest
PP-Fun_Lec - 19
Pattern matching
EMBL
FASTA
GCG
GenBank
Swiss-Prot
Online Fetching (from NCBI)
Similarity search
Local BLAST search
Online BLAST search (at NCBI)
Prepare BLAST database
(formatdb)
Smith-Waterman
HMMER
The best way
onlyTheBest
PROSITE
Regular expression
GC rich regions
Open reading frames (getorf)
Masking (seg, xnu)
Signal peptide sites
Restriction enzyme sites
Proteolytic sites (digest)
Alignment
Multiple Sequence Alignment
(ClustalW)
Pairwise local alignment
(water)
Utilities
GC Content
Molecular Weight
g
Isoelectric Point
Protein structure
Secondary structure
prediction (garnier)
Predict transmembrane
proteins (transmem)
Reverse complement
Transcribe
Simple translate
Six-frame translate
Back translate
Generate subsequence
fragments
Viewers
Sequence
Oligos
siRNA duplexes
Find & match primers
Alignments
JalView, plain text, or
custom report
BLAST results
The best way
onlyTheBest
PP-Fun_Lec - 20
Applications
Note: All the above listed third-party tools are included as part of the
Sequence Analysis component collection. You do not need to install
any additional software. For EMBOSS, BioPERL, and BioJAVA, while we
only expose parts in Pipeline Pilot, the entire suites are included.
onlyTheBest
Copyright2008, Accelrys Software Inc. All rights reserved.
onlyTheBest
PP-Fun_Lec - 21
Data Records
A data record is the smallest
unit of data flowing through a
pipeline.
pipeline
It is a collection of data
properties and may include a
chemical structure (molecule).
For simplicity, it can be
thought of as a row in a table.
In general,
general Pipeline Pilot
components process 1 data
record at a time.
onlyTheBest
PP-Fun_Lec - 22
Data Properties
A data property is an attribute of a data record.
It consists of a property name and a property value.
It is preferred that property names contain only
alphanumeric characters and underscores.
Data property values can be numbers, strings, Booleans,
molecular fingerprints and arrays.
onlyTheBest
onlyTheBest
PP-Fun_Lec - 23
onlyTheBest
onlyTheBest
PP-Fun_Lec - 24
onlyTheBest
Sorting Records
Sort Data
Sorts the data records based on the value found in the data
property specified in the parameter panel
onlyTheBest
PP-Fun_Lec - 25
onlyTheBest
Tag Data
Tag Data
Creates a new data property for each incoming record based on
the value entered into the parameter panel
Value for each new data property is set to true
onlyTheBest
PP-Fun_Lec - 26
onlyTheBest
Basic Math
Perform single or multiple property math
Property2 only required if the operation requires two
properties
onlyTheBest
PP-Fun_Lec - 27
Statistic Components
Replace MultiValue Stats and
Moving Average.
Average
Output results as a summary
or on the original data
onlyTheBest
onlyTheBest
PP-Fun_Lec - 28
Property Filters
Property Value Threshold Filter
This component allows a user to specify
a property
property, a threshold value and a
condition to filter each data record
onlyTheBest
onlyTheBest
PP-Fun_Lec - 29
Why SubProtocols?
Encapsulate complex logic (one or more components)
Implemented with many components
Represented
R
t d as a single
i l componentt
onlyTheBest
PP-Fun_Lec - 30
Creating a subprotocol
1.
2.
3
3.
4.
5.
6.
onlyTheBest
Creating a subprotocol
1.
2.
3
3.
4.
5.
6.
onlyTheBest
PP-Fun_Lec - 31
Parameter Promotion
Promoted parameter becomes a parameter of
subprotocol
Parameter can be renamed if necessary
From the Promote tab of the Edit dialog:
1. Navigate to the component of interest (Prev / Next buttons)
2. Highlight the parameter of interest and click Promote
3. Parameter now exposed in component parameter list of the
subprotocol
b
l using
i string
i token
k
Creating a subprotocol
1.
2.
3
3.
4.
5.
6.
onlyTheBest
PP-Fun_Lec - 32
Creating a subprotocol
1.
2.
3
3.
4.
5.
6.
onlyTheBest
Creating a subprotocol
1.
2.
3
3.
4.
5.
6.
onlyTheBest
PP-Fun_Lec - 33
Subprotocol:
Pass
2500 records
onlyTheBest
Fail
1518 records
982 records
Subprotocol Utilities
Component
Description
Use the component in a subprotocol to direct records coming out of a Pass port of a
component to the Fail port of the sub-protocol
Use this component in a subprotocol to direct records coming out of a Fail port of a
component to the Pass port of the sub-protocol
Use this component in a subprotocol to keep records coming out of an internal component
from being passed out the subprotocol. (You can also turn off output ports to achieve this
effect.)
No-Op
Use this component to pass all incoming records to Pass port. This is useful inside
subprotocols to capture the point of input. For example, to run an initialization pipeline
before accepting input, use the No-Op component as the first component on the second
pipeline.
Subprotocol
Use this component to define a subprotocol in a pipeline. When you are creating a pipeline
that requires a subprotocol as a component you can drag this subprotocol component into
the pipeline, open it, and add components into it.
onlyTheBest
PP-Fun_Lec - 34
Example
Complex filter (HTS Filter)
Records from pass/fail ports
exit subprotocol through its
pass/fail ports
Pass streams can be turned
into fail streams and vice
versa
Output ports can be
terminated
onlyTheBest
onlyTheBest
PP-Fun_Lec - 35
Webport
Running Webport Protocols
onlyTheBest
PP-Fun_Lec - 36
Webport
Single sign-on
Log-in once
http://localhost:9944/webport/mai
n.htm?protocol=Protocols/Collect
data
Jobs window
Performance improvements
Library Tab
The Library tab is displayed when you log in.
Select the protocol you would like to run. The protocols are now in
a tree view to make it easy to see all available protocols at once.
once
onlyTheBest
PP-Fun_Lec - 37
Protocol Tab
The Protocol tab is where you set the parameters for the protocol
and where you can see displayed results files (Each file will have a
tab on the right side of the screen)
The File browsing is greatly improved. The dialog is similar to the
dialog in the client.
onlyTheBest
Jobs Tab
The Jobs list is now sortable, by clicking on any column
header. Click the column header again to sort in the
opposite direction.
direction
onlyTheBest
PP-Fun_Lec - 38
Webport
Creating Webport Protocols
onlyTheBest
Use Writers
Writers (no Viewers)
Write to the $(runDirectory), $(jobDir) or $(userDir)
No dialogs or pop-ups
onlyTheBest
PP-Fun_Lec - 39
Promote Parameters
Promote parameters that you want visible in Webport
Component Parameters
onlyTheBest
onlyTheBest
PP-Fun_Lec - 40
onlyTheBest
Accelrys Community
accelrys.org
MSC Support
msc-support@molsci.com.tw
02-27132977
onlyTheBest
PP-Fun_Lec - 41
Outline
Sequence Analysis Collection Components and Protocols
Data Record Structure
SAC Example Protocols
onlyTheBest
PP-SAC_Lec - 1
SAC Readers
Reader components available for
sequences, alignments and profiles.
Many popular formats are supported.
supported
There are also Generic readers for
sequences and alignments, that infer
the format from the file extension.
Users can specify the number of records
to read.
Online sequence Fetchers allow access
to databases even if local copies do not
exist.
For sequence formats that include
features, the user can choose not to
read features.
onlyTheBest
PP-SAC_Lec - 2
SAC Writers
Writers available for
sequences, alignments and
profiles.
profiles
Many popular formats are
supported.
There are also Generic writers
for sequences and alignments.
Users can specify the number
of records to write.
onlyTheBest
Reading/Writing Example
Converting a GenBank sequence to FASTA format.
onlyTheBest
PP-SAC_Lec - 3
SAC Viewers
Several options for viewing
sequences, alignments and
HMM (text,
(text Java,
Java HTML,
HTML PDF)
Similarity Search Viewer
output is a PDF
FASTA Entry Prompt
Similarity Search Table Viewer
Reporting plotting tools are
included in the Viewers
onlyTheBest
Viewer Examples
Artemis Viewer
Sequence Viewer
JalView
onlyTheBest
PP-SAC_Lec - 4
Aligning Example
Aligning sequences and viewing the alignment using
JalView or JalView 2
onlyTheBest
SAC Manipulators
Manipulator component
functions include:
Extracting sequence features
Creating sequence fragments
Producing open reading frames
Translate/transcribe
And more!
onlyTheBest
PP-SAC_Lec - 5
SAC Annotators
Annotator components allow
sequence annotation and
pattern searching
Matches are added as
features to the sequence
data record
Act as filters - Sequences
without the feature of
i
interest
are passed
d out the
h
fail port
onlyTheBest
Annotation Example
Annotation of a protein sequence using a variety of
components. This protocol uses BioPerl, BioJava,
PROSITE and EMBOSS
PROSITE,
EMBOSS.
BioPerl
onlyTheBest
BioJava
BioPerl
BioPerl
PROSITE
PP-SAC_Lec - 6
EMBOSS
SAC Calculators
Property calculators add
sequences physical
properties to the data
stream.
onlyTheBest
Plots
Charge
Hydrophobic Moment
GC Content
onlyTheBest
PP-SAC_Lec - 7
Plots
These plots require the Reporting Collection.
These plots can be embedded in a report similar to
other reporting components.
You cannot link reporting plots sequentially in a
pipeline, however, you can combine them using other
Reporting Elements (e.g., Tile Horizontal).
onlyTheBest
onlyTheBest
PP-SAC_Lec - 8
AnnotationGroup
(no properties)
onlyTheBest
PP-SAC_Lec - 9
onlyTheBest
onlyTheBest
PP-SAC_Lec - 10
onlyTheBest
Annotation Group
Annotation[1,2,3n]
(alignment start and end)
FeatureGroup
Feature[1,2,3n]
(alignment gaps)
onlyTheBest
PP-SAC_Lec - 11
ClustalW
HMMER
BLAST
Smith-Waterman
Muscle
Sim4
PP-SAC_Lec - 12
onlyTheBest
onlyTheBest
PP-SAC_Lec - 13
2. Use BLASTp to search O43291.fa against the non-EGFrelated BLAST database ((created in #1).
) Display
p y the
results in the Similarity Search Viewer.
onlyTheBest
PP-SAC_Lec - 14
onlyTheBest
onlyTheBest
PP-SAC_Lec - 15
BLAST Search
onlyTheBest
onlyTheBest
PP-SAC_Lec - 16
Position
Data Structure
MAQLCGLRRSRAFLALLGSLLLSGVLAADRERSIHDFCLVSKVVGRCRASMPRWWYNVTDGSC
QLFVYGGCDGNSNNYLTKEECLKKCATVTENATGDLATSRNAADSSVPSAPRRQDSEDHSSDMF
NYEEYCTANAVTGPCRASFPRWYFDVERNSCNNFIYGGCRGNKNSYRSEEACMLRCFRQQENPP
LPLGSKVVVLAGLFVMVLILFLGASMVYLIRVARRNQERALRTVWSSGDDKEQLVKNTYVL
SEQUENCE
ANNOTATION GROUP
FEATURE GROUP
SEQUENCE GROUP
GPCRASFPRWYFDVER
NSCNNFIYGGCRGNKN
SEQUENCE(n)
SYRSEEACMLRCFRQ
SEQUENCE/HMM GROUP
FEATURE
ANNOTATION
SEQUENCE/HMM (n)
ANNOTATION
FEATURE
HIGH SCORING PAIR GROUP
2READ
STRUCTURE
PREDICTION
READ
SIMILARITY
ALIGN
SwissProt
FASTA
SEQUENCES
SEQUENCE
SEARCH
SEQUENCE
onlyTheBest
HIGH
SCORING
PAIR
PP-SAC_Lec - 17
Data Structure
MAQLCGLRRSRAFLALLGSLLLSGVLAADRERSIHDFCLVSKVVGRCRASMPRWWYNVTDGSC
QLFVYGGCDGNSNNYLTKEECLKKCATVTENATGDLATSRNAADSSVPSAPRRQDSEDHSSDMF
NYEEYCTANAVTGPCRASFPRWYFDVERNSCNNFIYGGCRGNKNSYRSEEACMLRCFRQQENPP
LPLGSKVVVLAGLFVMVLILFLGASMVYLIRVARRNQERALRTVWSSGDDKEQLVKNTYVL
SEQUENCE
SEQUENCE GROUP
GPCRASFPRWYFDVER
NSCNNFIYGGCRGNKN
SEQUENCE(n)
SYRSEEACMLRCFRQ
SEQUENCE/HMM GROUP
SEQUENCE/HMM (n)
REASSEMBLE
SIMILARITY
EXTRACT
DELETE
EXTRACT
DELETE
SIMILARITY
SIMILARITY
ALIGNMENTS
ALIGNMENTS
SEARCH
SEARCH HITS
HITS
REASSEMBLE
ALIGNMENTS
SEARCH
RESULTS
onlyTheBest
HIGH
SCORING
PAIR
onlyTheBest
PP-SAC_Lec - 18
onlyTheBest
PP-SAC_Lec - 19
onlyTheBest
onlyTheBest
PP-SAC_Lec - 20
Rat
Sequences
Compare
using
BLAST,
then
Filter and
Score
High Match
Medium Match
Human
Sequences
onlyTheBest
Low Match
Copyright2008, Accelrys Software Inc. All rights reserved.
PP-SAC_Lec - 21
target
gene
region
Off-target
siRNA
regions
i
siRNA
regions
g
Target cDNA
Generate
siRNA
Search predictions
sites
against
genomic
sequence
siRNA
predictions
onlyTheBest
Identify and
filter
Genomic DB
siRNA
site locations
PP-SAC_Lec - 22
On Target sites
onlyTheBest
onlyTheBest
PP-SAC_Lec - 23
DAB Subprotocol:
onlyTheBest
http://www.genome.jp/kegg/
Copyright2008, Accelrys Software Inc. All rights reserved.
PP-SAC_Lec - 24
GENE
onlyTheBest
onlyTheBest
PP-SAC_Lec - 25
COMPOUND
tynA
Histamine Oxidase
onlyTheBest
onlyTheBest
PP-SAC_Lec - 26
Accelrys Community
accelrys.org
MSC Support
msc-support@molsci.com.tw
02-27132977
onlyTheBest
PP-SAC_Lec - 27