Você está na página 1de 41

Reliable Distributed

Systems
Web Services

Today

Web Services Introduction


Remote Procedure Call in WS

Binding, Marshalling

Using TCP as the transport for RPCs

Connectivity Issues: NAT, Firewall

What are Web Services?

Today, we normally use Web browsers


to talk to Web sites

Browser names document via URL (lots of


fun and games can happen here)
Request and reply encoded in HTML,
using HTTP to issue request to the site

Web Services generalize this model so


that computers can talk to computers

What are Web Services?


Client
System

SOAP
Router

Backend
Processes

Web
Service

What are Web Services?

Web Services are software


components described via
WSDL which are capable of
being accessed via standard
network protocols such as
SOAP over HTTP.

SOAP
Router

Backend
Processes

Web
Service

What are Web Services?

Web Services are software


components described via
WSDL which are capable of
being accessed via standard
network protocols such as
SOAP over HTTP.

Today, SOAP is the primary standard.


SOAP provides rules for encoding the
request and its arguments.

SOAP
Router

Backend
Processes

Web
Service

What are Web Services?

Web Services are software


components described via
WSDL which are capable of
being accessed via standard
network protocols such as
SOAP over HTTP.

Similarly, the architecture doesnt assume


that all access will employ HTTP over TCP.
In fact, .NET uses Web Services internally
even on a single machine. But in that case,
communication is over COM

SOAP
Router

Backend
Processes

Web
Service

What are Web Services?


Web Services are software
components described via
WSDL which are capable of
being accessed via standard
network protocols such as
WSDL
SOAP over HTTP.
documents
are used to
drive object
assembly,
code
generation,
and other
development
tools.

SOAP
Router

Backend
Processes

WSDL
document

Web
Service

Web Services are often Front


Ends
COM
App

Web Service
invoker

SAP

C#
App

CORBA
App

Client Platform

WSDLdescribed
Web Service

Web
App
Server

Web
Server
(e.g., IBM
WebSphere,
SOAP
BEA
messaging
WebLogic)

DB2
server
Server Platform

The Web Services stack


BPEL4WS (IBM only, for now)
Transactions

Reliable
Messaging

Security
Coordination
WSDL, UDDI, Inspection
SOAP

XML, Encoding

Business
Processes
Quality
of
Service
Description

Other
Protocols

TCP/IP or other network transport protocols

Messaging
Transport

What are Web Services?

Amazon would hand out


serverlets for 3rd party
developers to use
This connects their
applications directly to
Amazons system

serverlet

SOAP
Router

Backend
Processes

Web
Service

Advantages of web
services?*

Web services provide interoperability between various


software applications running on various platforms.

Web services leverage open standards and protocols.


Protocols and data formats are text based where
possible

vendor, platform, and language agnostic

Easy for developers to understand what is going on.

By piggybacking on HTTP, web services can work


through many common firewall security measures
without requiring changes to their filtering rules.

*: From Wikipedia

How Web Services work

First the client discovers the


service.

More in next lecture!

Typically, client then binds to the


server.

By setting up TCP connection to the


discovered address .
But binding not always needed.

How it works

Next build the SOAP request: (Marshaling)

Fill in what service is needed, and the


arguments. Send it to server side.
XML is the standard for encoding the data (but
is very verbose and results in HUGE overheads)

SOAP router routes the request to the


appropriate server(assuming more than one
available server)

Can do load balancing here.

How it works

Server unpacks the request,


(Demarshaling) handles it,
computes result.
Result sent back in the reverse
direction: from the server to the
SOAP router back to the client.

Marshalling Issues

Data exchanged between client and


server needs to be in a platform
independent format.

Endianness differ between machines.


Data alignment issue (16/32/64 bits)
Multiple floating point representations.
Pointers
(Have to support legacy systems too)

Discovery

This is the problem of finding the


right service

In our example, we saw one way to do it


with a URL
Web Services community favors what they
call a URN: Uniform Resource Name

But the more general approach is to


use an intermediary: a discovery
service

Example of a repository
Name

Type

Publisher

Web Services Performance and


Load Tester

Application

LisaWu

Temperature Service Client

Application

vinuk

Weather Buddy

Application

DreamFactory Client

Toolkit

Language

OS

N/A

Cross-Platform

Glue

Java

Cross-Platform

rdmgh724890

MS .NET

C#

Windows

Application

billappleton

DreamFactory

Javascript

Cross-Platform

Temperature Perl Client

Example Source

gfinke13

Perl

Cross-Platform

Apache SOAP sample source

Example Source

xmethods.net

Apache SOAP

Java

Cross-Platform

ASS 4

Example Source

TVG

SOAPLite

N/A

Cross-Platform

PocketSOAP demo

Example Source

simonfell

PocketSOAP

C++

Windows

easysoap temperature

Example Source

a00

EasySoap++

C++

Windows

Weather Service Client with MSVisual Basic

Example Source

oglimmer

MS SOAP

Visual Basic

Windows

TemperatureClient

Example Source

jgalyan

MS .NET

C#

Windows

Repository summary

A database listing servers


Each is described using the UDDI
language, which is defined over XML

Hence can be searched with XML queries

An extensible standard

Defines some required information about


interfaces available and argument types, etc
But services can provide extra information
too.

Roles?

UDDI is used to write down the


information that became a row in
the repository (I have a
temperature service)
WSDL documents the interfaces
and data types used by the service
But this isnt the whole story

Discovery and naming

The topic raises some tough questions

Many settings, like the big data centers


run by large corporations, have rather
standard structure. Can we automate
discovery?
How to debug if applications might
sometimes bind to the wrong service?
Delegation and migration are very tricky
Should a system automatically launch
services on demand?

Example: Why discovery is


tricky

Client has opinions

Service has opinions

Amazon.com would like requests from Ithaca to


go to the NJ-3 datacenter, and if possible, to the
same server instance within each clustered
service

DNS has opinions

I want current map data for Disneyland showing


line-lengths for the rides right now

Many systems play with name -> IP bindings

Internet has opinions (routing)

So, whats tricky?

Web Services doesnt standardize


these four steps, it just assumes
that people will hack solutions
Hence some are hard to implement,
we lack standards, and in some
cases, solutions are poor ones
UDDI and WSDL are just a corner of
the overall picture!

Network address
translation

Another issue: Often, the internal address


is not addressable from outside!

A tiny bit of security.


But if RPC server is behind a NAT, trouble!

NAT needs the host behind it to start the connection


process.
Need to configure NAT to let specified traffic through.
Generally: (WS traffic)HTTP is let through.

Tough to have a connection in between two


hosts behind NATs.

There are some tricks to bypass this though.

Firewalls

These allow/disallow traffic, depending on source,


destination, protocol used, etc.

Stateful: remember active flows, and disallow


unexpected packets (NAT)

Often only allow connection from the inside to the


outside!

Again, need to configure to ensure server traffic gets


through. (General RPC)
Again, (WS)HTTP does not face as much of a restriction.

Get traffic statistics.


Spam/virus checking, etc.
NAT and firewall typically in the same box.

Demilitarized Zone (DMZ)

DMZ: used to host publicly


accessible services like
company webpages, ftp, dns.
Good place to host the Web
Service!
DMZ situated outside the
private network.
No outgoing connections
from DMZ.
If DMZ attacked, damage
limited to DMZ.

Client talks to eStuff.com

Moving on lets oversimplify and


just assume the client manages to
find the data center
We think of remote method
invocation and Web Services as a
simple chain:

Client
system

Soap RPC

SOAP
router

Web
Web
Service
Web
Service
Services

So suppose we get in

Assuming we can connect to the


data center (to its Web Services
router), then what?
If you just use Visual Studio out of
the box, you end up with a singlemachine Web Server
But massive datacenters are
common!

A glimpse inside
eStuff.com
front-end applications

Pub-sub combined with point-to-point


communication technologies like TCP
LB

LB

LB

LB

LB

LB

service

service

service

service

service

service

Clusters and load


balancing

Idea here is that some form of load


balancer spreads work over a
cluster
And cluster replicates data for
availability and load management
How it does this is a topic we need
to discuss in more detail (not today)

What about legacy


applications?

Some of these Web services are really just


front-ends to older legacy applications

So to talk to an old IBM database, we might

Run the database on some sort of machine, or virtual


machine
Build one of these translator front-ends
And then register it with the Web Services router

This may sound expensive (it is) but it works!


Obviously, our fancy clustering and loadbalancing wont apply to a legacy
application, so those fancy tricks are only for
new code

Discovery in eStuff.com

Data centers are increasingly common


And they raise hard questions!

How can a data center in California control


decisions a client is making in Ithaca?
Services are clustered. How should client
request be routed to the right member
Once you start talking to a server it may
cache data for you. How can you be sure to
get the right one next time?

These are modern


challenges

Web Services can be seen as


evolving from prior work
Most often cited: CORBA, which also
was used in many big data centers
But CORBA didnt assume that clients
came in over the public Internet

More often, CORBA was used between a


hand-built client and the service it talks
to

CORBA approach

CORBA had what are called

Ways to export specialized client stubs

The client stub could include server


provided decision logic, like which data
center to connect with
Gives data center a form of remote control

Factory services: manufacture certain


kinds of objects as needed

Effect was that discovery can also be a


service creation activity

CORBA is object oriented

Seems obvious and it is. CORBA is centered


around the notion of an object

Objects can be passive (data)


active (programs)
persistent (data that gets saved)
volatile (state only while running)

In CORBA the application that manages the


object is inseparable from the object

And the stub on the client side is part of the application


The request per-se is an action by the object on itself
and could even exploit various special protocols
We cant do this in Web Services

Web Services are


document-centric

That is, communication is by sending


documents (like pages) from client to server
and back
And most guarantees or properties are
associated with the document itself, not the
service

For example, WS_RELIABILITY isnt about making


services reliable, it defines rules for writing reliability
requests down and attaching them to documents
In contrast, CORBA fault-tolerance standard tells how
to make a CORBA service into a highly available
clustered service

Will Web Services help


with naming and
discovery?

Web Services tells us how

One client can


find one server and
bind to that server and
send a request that will make sense
and make sense of the response

So sure, WS will help

But Web Services wont

Allow the data center to control decisions


the client makes
Assist us in implementing naming and
discovery in scalable cluster-style services

How to load balance? How to replicate data?


What precisely happens if a node crashes or
one is launched while the service is up?
Help with dynamics. For example, best server
for a given client can be a function of load but
also affinity, recent tasks, etc

How we do it now

Client queries directory to find the service


Server has several options:

Web pages with dynamically created URLs

Server can point to different places, by changing host names


Content hosting companies remap URLs on the fly. E.g.
http://www.akamai.com/www.cs.cornell.edu (reroutes
requests for www.cs.cornell.edu to Akamai)

Server can control mapping from host to IP addr.

Must use short-lived DNS records; overheads are very high!


Can also intercept incoming requests and redirect on the fly

Why this isnt good


enough

The mechanisms arent standard and are


hard to implement

And they are costly

Akamai, for example, does content hosting


using all sorts of proprietary tricks
The DNS control mechanisms force DNS cache
misses and hence many requests do RPC to the
data center

We lack a standard, well supported,


solution!

Coming up?

How content is managed in even


larger systems, that have multiple
data centers
The main example is Akamai

Você também pode gostar