Você está na página 1de 68

Copyright2008, Accelrys Software Inc. All rights reserved.

Biological Sequence Analysis Using


Accelrys Pipeline Pilot
Pei-Li Li
onlyTheBest

/ Accelrys

Pipeline Pilot Training Course


Fundamentals of Pipeline Pilot
onlyTheBest

PP-Fun_Lec - 1

Copyright2008, Accelrys Software Inc. All rights reserved.

Outline

Introduction and Overview of Pipeline Pilot


Components and Protocols
General Data Manipulation and Filtering
Introduction to Subprotocol
Introduction to Web Port Interface
Introduction to Sequence Analysis Collection
Q&A

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Fundamentals of Pipeline Pilot


Overview
onlyTheBest

PP-Fun_Lec - 2

Copyright2008, Accelrys Software Inc. All rights reserved.

The Power of Pipeline Pilot

Cl
Clean
Suspect
S
tD
Data
t
Real-Time Calculations
Multiple Data Sources
Data Reduction
Write Output

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

What is a data record?


A data record is the
smallest data unit in
Pipeline Pilot.
Pilot
It consists of a
hierarchical structure of
property name-value pairs
and may include a
molecule object.
data properties

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 3

Sequence object

Copyright2008, Accelrys Software Inc. All rights reserved.

What is a Component?
A component is the building block used
to create workflows
Each component performs a task like
reading, writing or manipulating data
Components can have one input and up
to two output ports
Highlighting a component displays its
parameter panel which control its
behavior

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

What is a Pipeline?
A series of components connected through pipes
through which data flows.
Each component acts on the data and passes it on to
subsequent components.

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 4

Copyright2008, Accelrys Software Inc. All rights reserved.

Data Flow in a Pipeline


access.

Seq

File Reader

Molecular Weight

Notepad Viewer

Reads data records


from flat file

Calculates MW and
adds to record

Displays records as
rows in NotePad

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

What is a Protocol?
A protocol consists of one or
multiple pipelines that are
run sequentially.
Components are executed
left to right, top to bottom.
Pipeline 2 will not start
processing data until
pipeline 1 has completely
finished processing all the
records.

onlyTheBest

MolW

Y13255 TGGTTA 187216.8


Y13260 TCTGCT 177200.4
Y13263 CAAGAA164098.6

>gi|2695850|emb
>gi|2695846|emb
>gi|2695852|emb
TGGTTACAACACTTTCT
TCTGCTGGTTACAACAC
CAAGAACCACAATACTG
TCTTTCAATAACCACAA
TTTCTTCTTTCAATAAC
CAGTACAATGGGGATTT
TACTGCAGTACAATGGG
CACAATACTGCAGTACA
TAACAGCTCTCTGTATA
GATTTTAACAGCTCTCT
ATGGGGATTTTAACAGC
ATAATGACAGCTCTATC
GTATAATAATGA...
TCTCTGTATAAT...
AAGTGTCCGGTC...

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 5

Copyright2008, Accelrys Software Inc. All rights reserved.

Pipeline Pilot Main Window


Additional tools
Search Bar

Favorites
Pipeline

Explorer Window

Component

Protocol
Task bar

Status bar
Parameter Panel

Help Text
onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Menus and Toolbars

Finding Information in
Protocols

p
Scale drop-down

Zoom by Navigating
onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 6

Copyright2008, Accelrys Software Inc. All rights reserved.

Protocol Task Bar


Quick access to open protocols

Quick access to subprotocols

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

SciTegic Help Center


Single point access for
users/admins/developers
But, customized pages
for different types of
users

Google-like searching
Quick access to glossary
Quick access to
reference help (e.g.
PilotScript)

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 7

Copyright2008, Accelrys Software Inc. All rights reserved.

Explorer Window
Consists of four tabs
User, e.g. Joe
Contains all protocols and components
for an individual user

Protocols
> 900 realistic examples provided by SciTegic
To be used as-is or as the basis for more complex
protocols
Shared by all users

Components
> 1600 different components provided by SciTegic
Grouped in distinct categories
Shared by all users

All
Everything from User, Protocols and Components
tabs
onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Client-Server Architecture
Server runs protocols
Server accesses databases and
files
Visualization apps open
temporary files on server
Limited transfer of data
between client and server
Files that are not on server
need

Excel
Cli t
Client

UNC

File server

UNC path
Be made sharable

ODBC

Recommendation

Database server

Create sharable folder on


client
onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 8

Pipeline Pilot server

Copyright2008, Accelrys Software Inc. All rights reserved.

Fundamentals of Pipeline Pilot


Developing a Protocol
onlyTheBest

Running an Existing Protocol


Examples available in Protocols tab
To run an example protocol
Double
Double-click
click a protocol in explorer
window
Execute by clicking the green Run
button

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 9

Copyright2008, Accelrys Software Inc. All rights reserved.

Executing a Protocol
Run
Start execution (F5)

Stop
Interrupt execution

Errors
Flag indicates where
execution stopped
Last error message will be
available until the protocol
is edited

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Creating a New Protocol


1. File/New (Ctrl-N)
2. Add component(s)
3. Connect
components to
form one or
multiple pipelines
4. Run

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 10

Copyright2008, Accelrys Software Inc. All rights reserved.

Handling Components
Components can be dragged into a
pipeline
It will try to connect when placed to the
right of an existing component

Red components have one or more


required parameters that need to be
set
Components can be inserted into
or appended to a pipeline
Double
Double-clicking
clicking a component in the
hierarchy window will automatically
position and connect a component
Right-mouse drag replaces components

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Handling Connections
Connections can be made by dragging
from a fail or pass port of component
1 to the input port of component 2
Connections can be deleted by
selecting them and pressing del
Double-clicking a connection
toggles between pass and fail
Multiple inputs/outputs are allowed
(branching)

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 11

Copyright2008, Accelrys Software Inc. All rights reserved.

Reusing Component Information in Protocols


Using Ctrl + V :
If a Component is
selected

If a Pipe is selected

If multiple Pipe are


selected
onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Anatomy of a Component

Highlighting a component displays its parameter panel


Required parameters shown in red
Optional parameters shown in black
Parameter Groups can be expanded/contracted by
clicking on the + or - icon

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 12

Copyright2008, Accelrys Software Inc. All rights reserved.

Parameter Grouping
A Group is a way to organize
parameters into categories
Parameters in a group behave
as standard parameters
The parameter that is the
group heading can either
accept a value or not
depending on the parameter
type (GroupType accepts no
value))

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Documenting a Protocol
1. Editable captions for
each component (highly
recommended)

2. Sticky Notes
3 Documentation for individual
3.
components accessed by rightclicking on the component and
choosing Edit
Purpose
Description
onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 13

Copyright2008, Accelrys Software Inc. All rights reserved.

Component Help Text


Purpose
One-line component description displayed as fly-over help
p
Description
Paragraph describing usage of the component and data streams

Input, pass, and fail

Help text accessed by right-clicking on a component and choosing the Edit


option
Help text can be changed and saved. Native components must be renamed
when saved

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Sequential Execution
Non-connected pipelines are executed one after another
Last record in pipeline 1 will be written to the output file before
the first record is read in pipeline 2
When records are read in, they move as far to the right as possible
Data can be shared between pipelines using either files or global
variables

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 14

Copyright2008, Accelrys Software Inc. All rights reserved.

File Browse

User Defined Shortcuts


(these can be renamed by user)

Server Defined
Shortcuts (additional shortcuts
can be defined by the administrator)

Collection Example Data

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Component Disabling
Component disabling
Right-click on
component and
choose Disable from
menu
Options
Pass Data
Fail Data
Halt Pipeline

U
Use F10 k
key tto ttoggle
l
between options

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 15

Copyright2008, Accelrys Software Inc. All rights reserved.

Exercise 1: Introduction
A. Find the example Aligning Sequences and run it
B. Use a FASTA reader to read in O43291.fa. Calculate
the sequence molecular weight and display in an HTML
table viewer. Place a checkpoint on one of the
components.
C. Read the first 100 records of the
NRDB_nucleotide_10K.fa file and display the results in
an HTML table viewer.

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Fundamentals of Pipeline Pilot


Components
onlyTheBest

PP-Fun_Lec - 16

Copyright2008, Accelrys Software Inc. All rights reserved.

Component Collections
Components are organized in folders based on their
component collection:

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Component Types
Within each collection,
components are organized in
folders based on functionality:

Reading
Writing
Viewing
Filtering
Manipulating
Calculating
Etc.

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 17

Copyright2008, Accelrys Software Inc. All rights reserved.

Data Reading
Generic file readers include the Delimited Text Reader,
Excel Reader, XML Reader, HTML reader
File readers are available for most popular molecular
and sequence formats: SD, MOL2, SMILES, PDB, FASTA,
etc.
Database readers can read from any database format
via ODBC
File readers support: zip files, multiple files (use Ctrlclick to select), wildcards (users\myname\*.txt).

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

File Browse

User Defined Shortcuts


(these can be renamed by user)

Server Defined
Shortcuts (additional
y the
shortcuts can be defined by
administrator)

Collection Example Data


onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 18

Copyright2008, Accelrys Software Inc. All rights reserved.

Data Writing
Generic file writers include the Delimited Text Writer,
XML Writer, HTML Writer
File writers are available for most popular molecular
and sequence formats: SD, MOL2, SMILES, PDB, FASTA
Database writers can write to any database type that
supports ODBC connections

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Data Viewing
Viewers run on the client, and third party applications
need to be installed on the client machine: Excel,
Internet Explorer,
Explorer Spotfire
Charting viewers are available using Excel and/or the
Reporting Collection
Dialogs can be used to ask for user input at run time

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 19

Copyright2008, Accelrys Software Inc. All rights reserved.

Sequence Analysis Components


File Readers/Writers

Pattern matching

EMBL
FASTA
GCG
GenBank
Swiss-Prot
Online Fetching (from NCBI)

Similarity search
Local BLAST search
Online BLAST search (at NCBI)
Prepare BLAST database
(formatdb)
Smith-Waterman
HMMER
The best way
onlyTheBest

PROSITE
Regular expression
GC rich regions
Open reading frames (getorf)
Masking (seg, xnu)
Signal peptide sites
Restriction enzyme sites
Proteolytic sites (digest)

Alignment
Multiple Sequence Alignment
(ClustalW)
Pairwise local alignment
(water)

to find components is to use the search functionality (Ctrl-F)


Copyright2008, Accelrys Software Inc. All rights reserved.

Sequence Analysis Components


Calculators

Utilities

GC Content
Molecular Weight
g
Isoelectric Point

Protein structure
Secondary structure
prediction (garnier)
Predict transmembrane
proteins (transmem)

Reverse complement
Transcribe
Simple translate
Six-frame translate
Back translate
Generate subsequence
fragments

Viewers
Sequence

Oligos
siRNA duplexes
Find & match primers

Artemis, plain text, or


custom report

Alignments
JalView, plain text, or
custom report

BLAST results
The best way
onlyTheBest

to find components is to use the search functionality (Ctrl-F)


Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 20

Copyright2008, Accelrys Software Inc. All rights reserved.

3rd Party Tools


Integration flexibility illustrated through inclusion of:
Languages
BioPerl (e.g., file readers, writers, program execution)
Perl (e.g., PROSITE)
BioJava (e.g., isoelectric point)

Applications

EMBOSS (e.g., garnier, getorf)


BLAST (NCBI)
ClustalW
GCG (e.g., seg, xnu, & transmem)

Note: All the above listed third-party tools are included as part of the
Sequence Analysis component collection. You do not need to install
any additional software. For EMBOSS, BioPERL, and BioJAVA, while we
only expose parts in Pipeline Pilot, the entire suites are included.
onlyTheBest
Copyright2008, Accelrys Software Inc. All rights reserved.

Exercise 2: Using Components


A. Read first 100 sequences from
NRDB_nucleotide_10K.fa. Compute the sequence
molecular weight and the GC content,
content and view
results in Excel.
B. Select random 5% of sequences from
NRDB_protein_10K.fa. Calculate the Isoelectric point
for the sequences, and sort the sequences from higher
to lower using the isoelectricPoint property. View
results in Excel and display first ten sequences in the
HTML Table Viewer.

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 21

Copyright2008, Accelrys Software Inc. All rights reserved.

Fundamentals of Pipeline Pilot


General Data Manipulation and Filtering
onlyTheBest

Data Records
A data record is the smallest
unit of data flowing through a
pipeline.
pipeline
It is a collection of data
properties and may include a
chemical structure (molecule).
For simplicity, it can be
thought of as a row in a table.
In general,
general Pipeline Pilot
components process 1 data
record at a time.

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 22

Copyright2008, Accelrys Software Inc. All rights reserved.

Data Properties
A data property is an attribute of a data record.
It consists of a property name and a property value.
It is preferred that property names contain only
alphanumeric characters and underscores.
Data property values can be numbers, strings, Booleans,
molecular fingerprints and arrays.

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

General Data Manipulation and


Filtering
General Data Manipulation

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 23

Copyright2008, Accelrys Software Inc. All rights reserved.

Property Manipulation Components


Copy Property, Rename Property, Keep Property,
Remove Property
P
Performs
f
the
h named
d operation
i on the
h property or properties
i
listed in the parameter panel of each component

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Define Your Own Property


Create New Property
Creates a new property and assigns each the value specified in
the 'DefaultValue'
DefaultValue parameter

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 24

Copyright2008, Accelrys Software Inc. All rights reserved.

Counting and Indexing Data


Count and Index Data
Sets an index number (starting at 1 by default) on a data record,
then increments the number so that consecutive records get
sequentially numbered

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Sorting Records
Sort Data
Sorts the data records based on the value found in the data
property specified in the parameter panel

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 25

Copyright2008, Accelrys Software Inc. All rights reserved.

Tagging Data Records


Purpose
Determine source of a data record
Identify
Id if
reference
f
d
data records
d d
downstream
Substructure Search from Tag
Similarity Search from Tag

A tag is a data property and can be added by


Tag component
SourceTag parameter in any reader
Using PilotScript in a Custom Manipulator or Custom Filter

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Tag Data
Tag Data
Creates a new data property for each incoming record based on
the value entered into the parameter panel
Value for each new data property is set to true

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 26

Copyright2008, Accelrys Software Inc. All rights reserved.

Reader Components: SourceTag


Each reader has a
SourceTag parameter
Thi
This creates a new property
called SourceTag which can
be assigned one of the listed
values
This allows the direct
identification of the source

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Basic Math
Perform single or multiple property math
Property2 only required if the operation requires two
properties

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 27

Copyright2008, Accelrys Software Inc. All rights reserved.

Statistic Components
Replace MultiValue Stats and
Moving Average.
Average
Output results as a summary
or on the original data

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

General Data Manipulation and


Filtering
General Data Filtering

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 28

Copyright2008, Accelrys Software Inc. All rights reserved.

Property Filters
Property Value Threshold Filter
This component allows a user to specify
a property
property, a threshold value and a
condition to filter each data record

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Exercise 3: Data Manipulation


A. From Generic data\Tables read in Assay1.txt and Assay2.txt:
1. Tag each record based on the file it came from. View results in an
HTML Table Viewer.
Viewer
2. Filter for only records with the Name A.

B. Filter Generic data\Tables\MicroBeta.txt. Each filter should


generate a separate worksheet in single Excel document:
1. Value greater than 5000
2. Compound equal to Cmpd-2 and conc greater than 2.000e-11
3. Compound equal to Cmpd-2 and conc less than or equal to 2.000e-11

C. Filter the hts_rawdata1.csv file (Generic data\Tables) to display


only records with a Well value greater than P20.

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 29

Copyright2008, Accelrys Software Inc. All rights reserved.

Fundamentals of Pipeline Pilot


SubProtocol Creation and Handling
onlyTheBest

Why SubProtocols?
Encapsulate complex logic (one or more components)
Implemented with many components
Represented
R
t d as a single
i l componentt

Portable, easy to distribute


Treated as a single component
Input, pass, fail ports
Parameters exposed to the outside

Created using Collapse To Subprotocol menu option

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 30

Copyright2008, Accelrys Software Inc. All rights reserved.

Creating a subprotocol
1.

Highlight components to be part of subprotocol

2.

Right-click selected components and select Collapse To Subprotocol

3
3.

Create the interface for your component on Promote tab

4.

Choose appropriate icon and ports on Ports tab

5.

Provide new component description on Help Text tab

6.

Change caption for new SubProtocol

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Creating a subprotocol
1.

Highlight components to be part of subprotocol

2.

Right-click selected components and select Collapse To Subprotocol

3
3.

Create the interface for your component on Promote tab

4.

Choose appropriate icon and ports on Ports tab

5.

Provide new component description on Help Text tab

6.

Change caption for new SubProtocol

Which parameters are useful to users


of this subprotocol?

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 31

Copyright2008, Accelrys Software Inc. All rights reserved.

Parameter Promotion
Promoted parameter becomes a parameter of
subprotocol
Parameter can be renamed if necessary
From the Promote tab of the Edit dialog:
1. Navigate to the component of interest (Prev / Next buttons)
2. Highlight the parameter of interest and click Promote
3. Parameter now exposed in component parameter list of the
subprotocol
b
l using
i string
i token
k

Parameter Source can be referenced inside the


subprotocol (and its components) using the $(Source)
string token
onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Creating a subprotocol
1.

Highlight components to be part of subprotocol

2.

Right-click selected components and select Collapse To Subprotocol

3
3.

Create the interface for your component on Promote tab

4.

Choose appropriate icon and ports on Ports tab

5.

Provide new component description on Help Text tab

6.

Change caption for new SubProtocol

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 32

Copyright2008, Accelrys Software Inc. All rights reserved.

Creating a subprotocol
1.

Highlight components to be part of subprotocol

2.

Right-click selected components and select Collapse To Subprotocol

3
3.

Create the interface for your component on Promote tab

4.

Choose appropriate icon and ports on Ports tab

5.

Provide new component description on Help Text tab

6.

Change caption for new SubProtocol

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Creating a subprotocol
1.

Highlight components to be part of subprotocol

2.

Right-click selected components and select Collapse To Subprotocol

3
3.

Create the interface for your component on Promote tab

4.

Choose appropriate icon and ports on Ports tab

5.

Provide new component description on Help Text tab

6.

Change caption for new SubProtocol

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 33

Copyright2008, Accelrys Software Inc. All rights reserved.

Subprotocol Data Flow


Data enters a subprotocol through the first component
with an input port. Only one such component is allowed
Subprotocol output streams are the sum of:
Pass: output from any components pass port
Fail: output from any components fail port

Subprotocol:

Pass
2500 records

onlyTheBest

Fail

1518 records
982 records

Copyright2008, Accelrys Software Inc. All rights reserved.

Subprotocol Utilities
Component

Description

Data to Fail Port

Use the component in a subprotocol to direct records coming out of a Pass port of a
component to the Fail port of the sub-protocol

Data to Pass Port

Use this component in a subprotocol to direct records coming out of a Fail port of a
component to the Pass port of the sub-protocol

Dont Pass Data

Use this component in a subprotocol to keep records coming out of an internal component
from being passed out the subprotocol. (You can also turn off output ports to achieve this
effect.)

No-Op

Use this component to pass all incoming records to Pass port. This is useful inside
subprotocols to capture the point of input. For example, to run an initialization pipeline
before accepting input, use the No-Op component as the first component on the second
pipeline.

Subprotocol

Use this component to define a subprotocol in a pipeline. When you are creating a pipeline
that requires a subprotocol as a component you can drag this subprotocol component into
the pipeline, open it, and add components into it.

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 34

Copyright2008, Accelrys Software Inc. All rights reserved.

Example
Complex filter (HTS Filter)
Records from pass/fail ports
exit subprotocol through its
pass/fail ports
Pass streams can be turned
into fail streams and vice
versa
Output ports can be
terminated

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Exercise 4: Creating Subprotocol


A. Create a subprotocol-based component that reads a
FastA file and calculates sequence molecular weight
and isoelectric point
B. Extend the component from above to expose the
Delimited Text file reader parameters Source and
Maximum on the parent component

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 35

Copyright2008, Accelrys Software Inc. All rights reserved.

Fundamentals of Pipeline Pilot


Webport
http://servername:9944/webport/main.htm
onlyTheBest

Webport
Running Webport Protocols

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 36

Copyright2008, Accelrys Software Inc. All rights reserved.

Webport
Single sign-on

Log-in once

Store credentials locally

Validity of credentials configurable

Auto-launch protocols using


protocol link

http://localhost:9944/webport/mai
n.htm?protocol=Protocols/Collect
data

Jobs window

Multiple jobs deletion

Performance improvements

Java sketcher (to run out-of-thebox)


onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Library Tab
The Library tab is displayed when you log in.
Select the protocol you would like to run. The protocols are now in
a tree view to make it easy to see all available protocols at once.
once

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 37

Copyright2008, Accelrys Software Inc. All rights reserved.

Protocol Tab
The Protocol tab is where you set the parameters for the protocol
and where you can see displayed results files (Each file will have a
tab on the right side of the screen)
The File browsing is greatly improved. The dialog is similar to the
dialog in the client.

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Jobs Tab
The Jobs list is now sortable, by clicking on any column
header. Click the column header again to sort in the
opposite direction.
direction

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 38

Copyright2008, Accelrys Software Inc. All rights reserved.

Webport
Creating Webport Protocols

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Use Writers
Writers (no Viewers)
Write to the $(runDirectory), $(jobDir) or $(userDir)
No dialogs or pop-ups

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 39

Copyright2008, Accelrys Software Inc. All rights reserved.

Promote Parameters
Promote parameters that you want visible in Webport

Component Parameters

onlyTheBest

Protocol Level Parameters

Copyright2008, Accelrys Software Inc. All rights reserved.

Save to Web Services


Save to Protocols\Web Services

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 40

Copyright2008, Accelrys Software Inc. All rights reserved.

Run via Webport

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Questions & Comments


Accelrys Web Site
www.accelrys.com

Accelrys Community
accelrys.org

Accelrys Advantage Knowledge Base and FAQ


customer.accelrys.com

Molecule Scientific Co., Ltd.


www.molsci.com.tw
www molsci com tw

MSC Support
msc-support@molsci.com.tw
02-27132977
onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-Fun_Lec - 41

Copyright2008 Accelrys Software Inc. All rights reserved.

Pipeline Pilot Training Course


Sequence Analysis Collection
onlyTheBest

Outline
Sequence Analysis Collection Components and Protocols
Data Record Structure
SAC Example Protocols

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 1

Copyright2008 Accelrys Software Inc. All rights reserved.

Sequence Analysis Collection


Sequences, Annotations and Features
onlyTheBest

SAC Readers
Reader components available for
sequences, alignments and profiles.
Many popular formats are supported.
supported
There are also Generic readers for
sequences and alignments, that infer
the format from the file extension.
Users can specify the number of records
to read.
Online sequence Fetchers allow access
to databases even if local copies do not
exist.
For sequence formats that include
features, the user can choose not to
read features.
onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 2

Copyright2008 Accelrys Software Inc. All rights reserved.

SAC Writers
Writers available for
sequences, alignments and
profiles.
profiles
Many popular formats are
supported.
There are also Generic writers
for sequences and alignments.
Users can specify the number
of records to write.

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Reading/Writing Example
Converting a GenBank sequence to FASTA format.

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 3

Copyright2008 Accelrys Software Inc. All rights reserved.

SAC Viewers
Several options for viewing
sequences, alignments and
HMM (text,
(text Java,
Java HTML,
HTML PDF)
Similarity Search Viewer
output is a PDF
FASTA Entry Prompt
Similarity Search Table Viewer
Reporting plotting tools are
included in the Viewers

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Viewer Examples

Artemis Viewer

Sequence Viewer
JalView
onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 4

Copyright2008 Accelrys Software Inc. All rights reserved.

Aligning Example
Aligning sequences and viewing the alignment using
JalView or JalView 2

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

SAC Manipulators
Manipulator component
functions include:
Extracting sequence features
Creating sequence fragments
Producing open reading frames
Translate/transcribe
And more!

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 5

Copyright2008 Accelrys Software Inc. All rights reserved.

SAC Annotators
Annotator components allow
sequence annotation and
pattern searching
Matches are added as
features to the sequence
data record
Act as filters - Sequences
without the feature of
i
interest
are passed
d out the
h
fail port

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Annotation Example
Annotation of a protein sequence using a variety of
components. This protocol uses BioPerl, BioJava,
PROSITE and EMBOSS
PROSITE,
EMBOSS.

BioPerl

onlyTheBest

BioJava

BioPerl

BioPerl

PROSITE

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 6

EMBOSS

Copyright2008 Accelrys Software Inc. All rights reserved.

SAC Calculators
Property calculators add
sequences physical
properties to the data
stream.

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Plots

Charge

Hydrophobic Moment

GC Content
onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 7

Copyright2008 Accelrys Software Inc. All rights reserved.

Plots
These plots require the Reporting Collection.
These plots can be embedded in a report similar to
other reporting components.
You cannot link reporting plots sequentially in a
pipeline, however, you can combine them using other
Reporting Elements (e.g., Tile Horizontal).

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Exercise 1: Basic Pipeline Pilot and SAC


A. Find all of the sequences with shock in the
description of the sequences in the
NRDB protein 10K fa file
NRDB_protein_10K.fa
(in Sequence Analysis Data\Sequence folder)
1. Calculate the molecular weight of these sequences
2. Filter for only protein sequences with molecular weight
greater than 70,000
3. Perform both of the following:
Align
g these sequences
q
and view the results with JalView or the
Alignment Viewer
Predict their secondary structure and view the results in the
Sequence Viewer

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 8

Copyright2008 Accelrys Software Inc. All rights reserved.

Sequence Data Structure


Sequence (with properties for sequence data,
identifiers description
identifiers,
description, length)

AnnotationGroup

(no properties)

Annotation (with properties for


annotation type and named values)

FeatureGroup (no properties)


Feature (with properties for feature
type location
type,
location, and qualifiers)
NOTE: All sequence readers (except FASTA) have a
parameter to include/exclude features and
annotations.
onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Sequence Annotation Properties


AnnotationGroup (no properties)
Annotation[1,2,3.n]

(Itemizes information from sequcence


file)
Keywords
Comments
References
NOTE: FASTA sequence files will not have
annotations.

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 9

Copyright2008 Accelrys Software Inc. All rights reserved.

Sequence Data Structure


Sequence data records have the following hierarchy:
Sequence (with properties for sequence data,
identifiers, description, length)
AnnotationGroup (no properties)
Annotation[1,2,3n] (with properties for annotation type and
named values)

FeatureGroup (no properties)


Feature[1
Feature[1,2,3n]
2 3 n] (with properties for feature type
type, location
location, and
qualifiers)

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Sequence Feature Properties

FeatureGroup (no properties)


Feature[1,2,3n]
(with properties for feature type,
location and qualifiers)
location,

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 10

Copyright2008 Accelrys Software Inc. All rights reserved.

Alignment Data Structure


Alignment data records have the following hierarchy:
Sequence (consensus sequence)
SequenceGroup (no properties)
Sequence[1,2,3n] (ungapped sequence data)
Annotation Group
Annotation[1,2,3n] (with properties for alignment start
and end)
FeatureGroup
p ((no p
properties)
p
)
Feature[1,2,3n] (with properties for gaps)

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Alignment Data Structure


Sequence (consensus sequence)
SequenceGroup (no properties)
Sequence[1,2,3n]
(ungapped sequence data)

Annotation Group
Annotation[1,2,3n]
(alignment start and end)

FeatureGroup
Feature[1,2,3n]
(alignment gaps)

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 11

Copyright2008 Accelrys Software Inc. All rights reserved.

Sequence Analysis Collection


Search and Similarity Tools
onlyTheBest

Search and Similarity Tools


Includes components that wrap the
familiar programs:
-

ClustalW
HMMER
BLAST
Smith-Waterman
Muscle
Sim4

Similarity Search results are added to


the data records hierarchy.
y
Includes components to write databases
from flat files
BLAST DB Sequence Fetcher brings back
a full-length sequence
onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 12

Copyright2008 Accelrys Software Inc. All rights reserved.

Writing BLAST Database Examples

For sequences already in a FASTA file,


use Create BLAST Protein Database
(formatdb) (or its nucleotide
counterpart)

For other sequence file formats, use


BLAST Protein Database Writer (or its
nucleotide counterpart)

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Using BLAST Output


Extract the resulting database hits,
fetch the sequence data using the
BLAST DB Sequence
S
ffetcher,
t h
and write them out in FASTA format
Display the BLAST results using the
Similarity Search Viewer

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 13

Copyright2008 Accelrys Software Inc. All rights reserved.

Exercise 2: Similarity Search


1. Read NRDB_protein_10K.fa:

Create a FASTA file that contains sequences matched by the


PROSITE Calcium-binding
Calcium binding EGF-like
EGF like domain signature (PS01187).
(PS01187)
Create a BLAST database containing the sequences that do not
match the PROSITE Calcium-binding EGF-like domain signature
(PS01187).

2. Use BLASTp to search O43291.fa against the non-EGFrelated BLAST database ((created in #1).
) Display
p y the
results in the Similarity Search Viewer.

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Similarity Search Data Structure


Similarity Search (e.g., BLAST, HMMER) data records
have the following hierarchy:
Sequence (query sequence)
SearchResultGroup (no properties)
SequenceGroup/HMMGroup (with properties for algorithm,
algorithm version, database name, number of hits)
Sequence/HMM[1,2,3n] (the hit, with properties for
description e-value,
description,
e value score)
HighScoringPairGroup (no properties)
HighScoringPair[1,2,3n] (with properties for
e-value, query and subject sequence alignment
information)
onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 14

Copyright2008 Accelrys Software Inc. All rights reserved.

Similarity Search Properties


BLAST Search
HMM Search

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Similarity Search HSP Properties


HMM Search

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 15

BLAST Search

Copyright2008 Accelrys Software Inc. All rights reserved.

HMM Data Structure


An HMM file, and an HMM search (e.g. Search HMM File
with Sequence) have the following data structure:
HMM (with properties for identifiers, description,
length, null model)
PositionGroup (no properties)
Position[1,2,3n] (with properties for begin and end transitions,
match and insert emissions)

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

HMM Data Structure


Properties

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 16

Position

Copyright2008 Accelrys Software Inc. All rights reserved.

Data Structure
MAQLCGLRRSRAFLALLGSLLLSGVLAADRERSIHDFCLVSKVVGRCRASMPRWWYNVTDGSC
QLFVYGGCDGNSNNYLTKEECLKKCATVTENATGDLATSRNAADSSVPSAPRRQDSEDHSSDMF
NYEEYCTANAVTGPCRASFPRWYFDVERNSCNNFIYGGCRGNKNSYRSEEACMLRCFRQQENPP
LPLGSKVVVLAGLFVMVLILFLGASMVYLIRVARRNQERALRTVWSSGDDKEQLVKNTYVL

SEQUENCE

ANNOTATION GROUP

FEATURE GROUP

SEQUENCE GROUP

SEARCH RESULT GROUP

GPCRASFPRWYFDVER
NSCNNFIYGGCRGNKN
SEQUENCE(n)
SYRSEEACMLRCFRQ

SEQUENCE/HMM GROUP

FEATURE

ANNOTATION

SEQUENCE/HMM (n)
ANNOTATION

FEATURE
HIGH SCORING PAIR GROUP

2READ
STRUCTURE
PREDICTION
READ
SIMILARITY
ALIGN
SwissProt
FASTA
SEQUENCES
SEQUENCE
SEARCH
SEQUENCE
onlyTheBest

HIGH
SCORING
PAIR

Copyright2008, Accelrys Software Inc. All rights reserved.

Generic Utilities for Hierarchies


Extract
Delete
Reassemble

Versions of these exist for similarity search hits,


sequence features and annotations
annotations, and aligned
sequences
Several manipulators (e.g., Keep Features by Type) use
this pattern internally.
onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 17

Copyright2008 Accelrys Software Inc. All rights reserved.

Data Structure
MAQLCGLRRSRAFLALLGSLLLSGVLAADRERSIHDFCLVSKVVGRCRASMPRWWYNVTDGSC
QLFVYGGCDGNSNNYLTKEECLKKCATVTENATGDLATSRNAADSSVPSAPRRQDSEDHSSDMF
NYEEYCTANAVTGPCRASFPRWYFDVERNSCNNFIYGGCRGNKNSYRSEEACMLRCFRQQENPP
LPLGSKVVVLAGLFVMVLILFLGASMVYLIRVARRNQERALRTVWSSGDDKEQLVKNTYVL

SEQUENCE

SEQUENCE GROUP

SEARCH RESULT GROUP

GPCRASFPRWYFDVER
NSCNNFIYGGCRGNKN
SEQUENCE(n)
SYRSEEACMLRCFRQ

SEQUENCE/HMM GROUP

SEQUENCE/HMM (n)

HIGH SCORING PAIR GROUP

REASSEMBLE
SIMILARITY
EXTRACT
DELETE
EXTRACT
DELETE
SIMILARITY
SIMILARITY
ALIGNMENTS
ALIGNMENTS
SEARCH
SEARCH HITS
HITS
REASSEMBLE
ALIGNMENTS
SEARCH
RESULTS
onlyTheBest

HIGH
SCORING
PAIR

Copyright2008, Accelrys Software Inc. All rights reserved.

Exercise 4: More Similarity Search


1. Use BLASTp to find sequences similar to the
tyrosineKinase.fa sequence.
a. Fil
Filter out the
h HSP
HSPs with
i h < 60% sequence id
identity
i
b. Obtain the full length sequences for the HSPs.
c. Align the query and the full length hits using ClustalW, and
view the alignment using JalView or the Alignment Viewer.
d. Reassemble the similarity search results and view them in
table format.

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 18

Copyright2008 Accelrys Software Inc. All rights reserved.

Sequence Analysis Collection


Examples
onlyTheBest

Translating and Transcribing Sequences

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 19

Copyright2008 Accelrys Software Inc. All rights reserved.

Iterate BLAST Calls

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Checking for Novel GPCRs

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 20

Copyright2008 Accelrys Software Inc. All rights reserved.

Performing Ortholog Comparisons

Rat
Sequences

Compare
using
BLAST,
then
Filter and
Score

High Match

Medium Match

Human
Sequences
onlyTheBest

Low Match
Copyright2008, Accelrys Software Inc. All rights reserved.

Performing Ortholog Comparisons

Identify ortholog pairs across genomes


onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 21

Copyright2008 Accelrys Software Inc. All rights reserved.

Finding siRNA Off-Target Sites


genomic
sequence
other
gene
region

target
gene
region
Off-target
siRNA
regions
i

siRNA
regions
g

Correctly silenced gene


onlyTheBest

Wrongly silenced gene

Copyright2008, Accelrys Software Inc. All rights reserved.

Finding siRNA Off-Target Sites

Target cDNA
Generate
siRNA
Search predictions
sites
against
genomic
sequence

siRNA
predictions
onlyTheBest

Identify and
filter

Genomic DB

siRNA
site locations

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 22

Off Target sites

On Target sites

Copyright2008 Accelrys Software Inc. All rights reserved.

Finding siRNA Off-Target Sites

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Divide and BLAST (DAB)

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 23

Copyright2008 Accelrys Software Inc. All rights reserved.

Divide and BLAST (DAB)

DAB Subprotocol:

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

KEGG Pathway Database


Comprises current knowledge on molecular interaction
networks
metabolic
b li pathways
h
regulatory pathways
molecular complexes

Can be used by Pipeline Pilot to connect genes with


relevant compounds (or vice versa)
Integrated as a web service using SOAP
onlyTheBest

http://www.genome.jp/kegg/
Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 24

Copyright2008 Accelrys Software Inc. All rights reserved.

KEGG Pathway Database

GENE

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Integrating KEGG with SOAP

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 25

COMPOUND

Copyright2008 Accelrys Software Inc. All rights reserved.

Find all endogenous compounds related


to a gene target of interest

tynA
Histamine Oxidase

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

Find the pathway most related to


each compound of interest

onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 26

Copyright2008 Accelrys Software Inc. All rights reserved.

Questions & Comments


Accelrys Web Site
www.accelrys.com

Accelrys Community
accelrys.org

Accelrys Advantage Knowledge Base and FAQ


customer.accelrys.com

Molecule Scientific Co., Ltd.


www.molsci.com.tw
www molsci com tw

MSC Support
msc-support@molsci.com.tw
02-27132977
onlyTheBest

Copyright2008, Accelrys Software Inc. All rights reserved.

PP-SAC_Lec - 27

Você também pode gostar