Escolar Documentos
Profissional Documentos
Cultura Documentos
Intel Corporation
www.intel.com/labs
Outline
Section I
Overview of P2P P2P Framework
Overview of distributed computing frameworks Additional P2P framework requirements P2P Middleware
Section II
Taxonomy of P2P applications Research Issues
Section III
Preliminary Performance Modeling
Conclusion
www.intel.com/labs
www.intel.com/labs
P2P Beginnings
Interest kindled by distributed file-sharing applications
Napster: Mediated digital music swapping.
(http://www.napster.com)
Where is X?
Mediator
1
Peer B has it
2
3
Copying X
Peer A
www.intel.com/labs
Peer B
4
P2P Beginnings
Gnutella: Fully distributed file sharing. (http://gnutella.wego.com) Freenet Distributed file sharing with anonymity and key based search. (http://freenet.sourceforge.net)
Peer A
Where is File X? 1
Peer B
C: I have it.
Peer D
2
Where is File (Key) X?
Peer C
C: I have it.
3
www.intel.com/labs
Idle disk and main memory on workstations exploited in a number of network of workstation (NOW) projects.
Master
Peer 1
Data Crunching
Peer 2
Data Crunching
Peer 3
Data Crunching
Peer 4
Data Crunching
www.intel.com/labs
Newer Applications
P2P streaming media distribution
CenterSpan (C-Star Multisource Peer Streaming)
Mediated, Secure P2P platform for distributing digital content. Partition content and encrypt each segment. Distribute segments amongst peers. Redundant distribution for reliability. Download segments from local cache, peers or seed servers. http://www.centerspan.com
vTrails
vtCaster: At stream source. Creates network topology tree based on end users (vtPass client software). Dynamically optimizes tree. Content distributed in a tiered manner.
http://www.vtrails.com
www.intel.com/labs
Newer Applications
P2P Collaboration
Groove (http://www.groove.net)
Real time, small group interaction and collaboration. Fundamental notion around a shared space
Each member of the group owns a copy of the shared space. Changes made to the shared space by one member are propagated to each member of the group (Store and forward if some member is offline). Platform is secure. PKI for user authentication. End to end encryption. Groove components are digitally signed
www.intel.com/labs
Skeptics view: Nothing new, just distributed computing re-discovered or made fashionable. Reality: Distributed computing on a large scale
No longer limited to a single LAN or a single domain. Autonomous nodes, no controlling/managing authority. Heterogeneous nodes intermittently connected via links of varying speed and reliability.
A tentative definition:
A dynamic network (peers can come & go as they please) No central controlling or managing authority. A node can act as both as a client and as a server.
www.intel.com/labs
P2P Platforms
Legion, University of Virginia, Now owned by Avaki Corp. Globe, Vrije Univ., Netherlands Globus, Developed by a consortium including Argonne Natl. Lab and USCs Information Sciences Institute. JXTA, Open source P2P effort started by Sun Microsystems. .NET by Microsoft Corp. WebOS, University of Washington
www.intel.com/labs
10
Avaki (Legion)
Objective: Wide-area O/S functionality via distributed objects. Middleware infrastructure for distributed resource sharing in mutually distrustful environment.. Global O/S services built on top of local O/S
11
Avaki (Legion)
Naming: LOID (location Indep. Object Id), current object address & object name Persistent object space: generalization of file-system (manages files, classes, hosts, etc.) Communication: RPC like except that the results can be forwarded to the real consumer directly. Security: RSA keys a part of LOIDs, Encryption, authentication, digesting provided. Local autonomy: Objects call local O/S services for all management, protection and scheduling. Active objects: objects represent both processes and methods.
Overall: Comprehensive WAN O/S, but not targeted as a general P2P enabler.
www.intel.com/labs
12
Globe
Objective: Another model for WAN O/S. Distributed passive object model. Processes are separate entities that bind to objects. Each object consists of 4 subobjects:
Semantics subobject for functionality. Communication subobject for inter-object communication. Replication subobject for replica handling including consistency maintenance. Control subobject for control flow within the object.
Overall: Similar to Legion, except that processes and objects are not tightly integrated.
www.intel.com/labs
13
Globus
Objective: Grid computing, integration of existing services. Defines a collection of services, e.g.,
Service discovery protocol Resource location & availability protocol Resource replication service Performance monitoring service
Any service can be defined and becomes the part of the system. Higher level services can be built on top of basic ones. Preserves site autonomy. Existing legacy services can be offered unaltered. Overall: Excellent reusability. Unconstrained toolbox approach => Very difficult to join two islands.
www.intel.com/labs
14
JXTA
Objective: A low-level framework to support P2P applications:
Avoids any reference to specific policies or usage models. Not targeted for any specific language, O/S, runtime environment, or networking model. All exchanges are XML based. Identifiers Advertisements Peers Peer Groups Pipes Peer Discovery protocol: Discovery of peers, resources, peer groups etc. Peer Resolver Protocol Peer Information Protocol Peer Membership protocol. Pipe binding protocol Peer endpoint protocol.
www.intel.com/labs
At the highest abstraction defines a set of protocols using the base concepts:
15
JXTA
www.intel.com/labs
16
Emphasizes global user authentication via passport service (user distinct from the device being used).
Hailstorm supports personal services which can be accessed via SOAP from any entity
www.intel.com/labs
17
MAGI
Enabler for collaborative business applications.
18
Magi
Magi: Micro-Apache Generic Interface, an extension of Apache project. Superset of HTTP using
WebDAV: Web distributed authoring & versioning protocol, which provides, locking services, discovery & assignment services, etc. for web documents. SWAP (simple workflow access protocol) that supports interaction between running services (e.g., notification, monitoring, remote stop/synchronization, etc.)
19
WebOS
Objective: WAN O/S that can dynamically push functionality to various nodes depending on loading. Outgrowth of the Berkeley NOW (network of workstations) project.
Project no longer active, parts of it being used elsewhere. Overall: Dynamic configurability useful for P2P environment.
www.intel.com/labs
20
Groove
Groove (http://www.groove.net)
Real time, small group interaction and collaboration. Fundamental notion around a shared space Each member of the group owns a copy of the shared space. Changes made to the shared space by one member are propagated to each member of the group (Store and forward if some member is offline). Platform is secure.
PKI for user authentication. End to end encryption. Groove components are digitally signed
www.intel.com/labs
21
22
P2P Services
Basic.
Network Services. Naming. Event and Exception management services. Storage Services Metadata services Security Services Search and Discovery. Administrative and Auditing. File services akin to a virtual file system. User and group management services. Resource management services. Digital Rights management. Replication and Migration services.
www.intel.com/labs
Advanced.
23
Availability from unreliable components Replication Striping Failover Guaranteed message queuing
Authorization Integrity
Privacy
Standards
Certification DRM
www.intel.com/labs
Policies
Web of trust
Administration, Monitoring
Transport and data protocols for interoperability Common protocols: IP, IPv6, sockets, http, XML, SOAP, . . . NAT and firewall solutions Roaming, intermittent connectivity
24
Self administration reliable whole from unreliable parts Resource monitoring Payment tracking
Capability discovery
Standards
Policies
Administration, Monitoring
User / group identity Authentication Persistence Beyond a session Across multiple devices
Metadata management
Discovery & location of peers, services, resources, users
www.intel.com/labs
25
Questions ???
www.intel.com/labs
27
P2P Taxonomy
Consider two types of properties:
Application characteristics Environmental characteristics
Application Characteristics:
Resource (or data) storage: organized or scattered. Resource control: organized or scattered. Resource usage: isolated or collaborative. Consistency constraints: loose or tight. QoS constraints: loose (e.g., non real-time), moderate (e.g., online transaction processing). query/response), or tight (e.g., streaming media).
www.intel.com/labs
28
www.intel.com/labs
29
Research Issues
Intelligent caching of search results. Intelligent object retrieval
Retrieval by properties rather than URL. Need distributed indexing mechanisms. Directing searches to more promising and less loaded nodes.
Multiparty synchronization and communication that scales to thousands of nodes. For home computers: Utilize idle computing resources w/o significant communication requirements. Unobtrusive use: If the owner wants to use the resources, get out of the way quickly. Low latency service handoff protocols.
www.intel.com/labs
30
31
32
Other Issues
Dynamic changes to the network
Direct modeling not required if rate of change << request rate. Metadata consistency issues still need to be considered.
www.intel.com/labs
34
www.intel.com/labs
35
The new node connects to K other nodes. K: const or an integer-valued RV in range 1..Kmax Each connection targets an undistinguished node with prob qu (this may not be possible for the first Kmax nodes). Dist. Node target: uniform distribution over all dist nodes. Undist. Node target: Zipf(a) over existing undist. nodes. At most one connection allowed between any pair of nodes. a=0 => Uniform dist => Very slow decay. Used here for simplicity.
www.intel.com/labs
36
Topological properties
Some network properties can be analyzed analytically
Outline of Analysis (see http://kkant.ccwebhost.com/download.htm)
Degree distribution:
Distinguished nodes at level 0, each new node defines a new level. Pn(l2,l): Prob(level l node has degree n when current level = l2) Get recurrence eqns for Pn(l2,l) & hence its PGF f(z| l2,l) . Get avg degree Dat(l2,l) at level l when current level = l2. Can be adapted for computing the undistinguished degree of a node.
37
0.05
0.50
0.95
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
5.9 55.2 99.1 100 100 5.9 34.3 91.0 99.9 100 5.9 28.6 76.7 98.5 99.7
3.3 44.5 85.8 90.0 90.0 4.3 23.8 73.9 89.4 89.6 5.3 22.6 63.8 87.4 89.3
4.9 103.6 235.2 238.8 238.8 4.9 61.7 231.7 267.5 267.7 4.9 50.3 194.6 281.8 287.8
6.1 146.5 320.5 328.8 328.8 8.4 82.3 304.0 356.9 357.3 10.6 73.6 258.4 369.2 377.2
38
www.intel.com/labs
0.05
0.50
0.95
1 2 3 4 1 2 3 4 1 2 3 4
6.0 243.7 499.7 500.0 6.0 95.7 483.5 500.0 6.0 35.1 163.5 405.7
3.6 232.7 488.6 490.0 4.7 84.2 465.1 490.0 5.8 29.1 137.1 367.7
5.0 6.2 480.5 711.5 1248.4 1737.0 1249.6 1739.6 5.0 8.5 184.3 264.6 1347.8 1812.4 1413.9 1903.9 5.0 10.7 63.2 91.7 448.3 582.4 1417.2 1782.7
www.intel.com/labs
39
www.intel.com/labs
40
Graph construction
Start with the regular graph of distinguished nodes (as usual). For adding undist nodes, work with only the avg connectivities Kd & Ku for an incoming node. Always connect to the existing node with min connectivity. Kd & Kd can be used successively to handle non-integer Kd values (similarly for Ku).
Characteristics/issues
Simple, only one graph to deal with in simulation. Gives correct avg reachability and nodal utilizations. All queuing metrics (including avg response time) are underestimated.
www.intel.com/labs
41
Constrained Connectivity
Intended environment
To capture most likely scenarios of connectivity. Accommodate both static topology an slowly changing topology.
Characteristics/issues
Avoids highly asymmetric topologies => queuing properties are underestimated. All generated instances are given equal weight. Relative weights can be estimated but very expensive.
www.intel.com/labs
42
Method:
For each node i, estimate relative prob qij of having an edge to node j i. A query coming from node k to node i is sent to node j with prob qij/(1-qik). This virtual topology for the query is used to return responses as well.
Characteristics/Issues
Method dependent on analytic calculation of edge probabilities to neighbors. Single simulation automatically visits various instances in the correct proportion. No explicit control over which instances are visited => Reliable results may take a very long time.
Very expensive and difficult to handle complex operations (e.g., file migration).
www.intel.com/labs
43
Adopted distribution:
Uniform dist in the small-size range 400 bytes to 4 KB. Pareto distribution with a min value of 4KB and mean of 40 KB => a = 1.11. 40 KB mean is typical for web pages, but too small for MP3 files.
File category provides a link between file size and its popularity. Needed to model higher access rate of small files. Chose 9 categories (equally spaced in log domain)
400B, 1.265KB, 4KB, 12.65KB, 40KB, 126.5KB, 400KB, 1.265MB, 4MB, 12.65MB
44
Separate distributions allowed for files allocated to dist & undist nodes.
Assuming a triangular distribution with Cmax = 20, and mode Cmode= 5 for all nodes => Mean no of copies = 8.667.
www.intel.com/labs
45
www.intel.com/labs
46
Query Characteristics
Assumptions:
No queries (searches) started from distinguished nodes since these nodes are essentially servers. Identical query arrival process at each undistinguished node.
Query properties:
Each query specifies a file (category, file_no) w/ given access characteristics. Shown results do not specify copy_no => Multiple hits possible for each query. Query percolates for h hops. (h=3 can cover 95% of nodes for chosen graph). If a query arrives at a node more than once, it is not propagated.
www.intel.com/labs
47
File Retrieval
Query Response:
Query reaching a node generates found/not found response, which travels backwards along the search path. Querying node runs a timer Tu; all responses after the timeout are ignored.
File retrieval:
Randomly choose one of the positively responding nodes for file retrieval.
Requested file(s) are obtained directly (i.e., do not follow the response path).
Retrieved file may be optionally cached at the requesting node.
A cache flush represents a tier3 user disconnecting and replaced by another statistically identical tier-3 node.
No of cycles before cache flushing: Zipf with min=30, max=120 and a =1.0.
www.intel.com/labs
48
Simulation Results
www.intel.com/labs
49
Major Observations
www.intel.com/labs
50
www.intel.com/labs
51
Backup
www.intel.com/labs
Goals
Define Peer-to-Peer. General idea about the Peer-to-Peer applications and frameworks. Identify the requirements of Peer-to-Peer applications.
www.intel.com/labs
53
www.intel.com/labs
54
JXTA
At the highest abstraction defines a set of protocols:
Peers & peer groups: An arbitrary grouping of peers; group members share resources & services. Services: A basic set defined (e.g., discovery, membership, access control, resolver, communication, etc.) Pipes: Unidirectional, asynchronous communication channels. A peer can dynamically connect/disconnect to any existing pipe within the peer group. Messages: Arbitrary sized w/ src and dest addresses in URI form. Advertisements: A properties record needed for name resolution, availability, etc. Specified as a XML document.
www.intel.com/labs
55
P2P Services
Basic.
Network Services.
Core communication functionality. Enable communication on various network topologies such as direct via firewalls. Enable communication in the face of intermittent connectivity.
Storage Services
Low level File services.
Metadata services
Generic mechanism for publishing and obtaining Metadata for Devices Resources (Files, CPU, Memory etc)
www.intel.com/labs
56
P2P Services
Security Services
Identification Authentication Access Control Integrity Confidentiality Audit Trail
User and group management services. Resource management and Placement services.
Advanced.
Naming. Search. Discovery. Administrative. Auditing. File services
www.intel.com/labs
57
58
Legion (http://legion.virginia.edu)
Globe (http://www.cs.vu.nl/~steen/globe) Globus (http://www.globus.org)
www.intel.com/labs
59
CenterSpan (http://www.centerspan.com)
vTrails (http://www.vtrails.com) SETI@Home (http://setiathome.ssl.berkeley.edu)
www.intel.com/labs
60