Você está na página 1de 7

Pepper: An Elastic Web Server Farm for Cloud based on Hadoop

Subramaniam Krishnan, Jean Christophe Counio Yahoo! Inc. subru@yahoo-inc.com, couniojc@yahoo-inc.com

Abstract
Web application based processing is traditionally used to handle high throughput traffic. Web applications are hosted on server farms. However, providing application level scalability and isolation on such server farms has been a challenge. Using cloud-serving infrastructures instead could potentially provide advantages such as scalability, centralized deployment and capacity planning. They also possess attractive qualities such as self-healing as well as ease in isolation and monitoring. The problem with this approach lies in the complicated nature and operational overhead of bootstrapping and operating cloud virtualization infrastructure. We present Pepper, a novel, simple, low cost and elastic web serving cloud platform built leveraging Hadoop and Zookeeper. The design of Pepper demonstrates its ability to run in isolation different web applications and scale dynamically on a cluster of machines. Pepper is being successfully used in Yahoo! to run web applications that acquire and pre-process high frequency web feeds such as breaking news and finance quotes. Pepper processes feeds with low latency with the ability to scale to millions of feeds every day that enables us to retain content freshness.

1. Introduction
Cloud serving infrastructures have radically altered the perception of many organizations on how data should be processed or shared by reducing the operability cost inherent to large server farms [1]. Rather than setting up the hardware, one can choose to adopt infrastructure offered by other organizations such as Google and Amazon. However, such hosted cloud services have their own challenges such as security [4] and protecting intellectual property. Another emerging trend in many organizations is to build their own cloud infrastructure based on open source technologies. Hadoop [2] is one of the more popular frameworks used in cloud infrastructure. At Yahoo!, we have been using Hadoop MapReduce [3] in crunching large data sets typically in the order of terabytes per day. Most practical data processing involves multiple MapReduce jobs. We created a distributed workflow management system called PacMan [16]

(precursor to Oozie [5]), in which the business logic is modeled as a workflow in the form of a directed acyclic graph. The nodes depict control flow and the actions are MapReduce jobs and Pig scripts [15]. While this platform was quite successful with low frequency large feeds having million of entries with Service Level Agreement (SLA) in hours, it was inefficient for high frequency small feeds like news feeds. Typical news feeds sizes are in kilobytes, and the SLA requirements are in sub-seconds, with around 1500 feeds to be processed per minute. In such scenarios, web-processing API seemed a more natural choice than MapReduce. To this end we developed Pepper, a simple platform using Hadoop and ZooKeeper [7], which can run different web applications on the same Hadoop grid where other regular MapReduce jobs are processed. One of the main challenges in web server farms is scalability. Since demand on the web server can fluctuate, it is important to scale in an elastic fashion. News feeds fluctuate over time. Sports feeds have high load during events like the FIFA World Cup and then tapers down on conclusion of the event. Finance feeds are very similar to news in terms of size and SLA but have varying load factors. Another concern while hosting multi-tenant web applications is to provide isolation, i.e. a sudden increase in resource usage like memory should be limited to that particular web application without affecting others. The problem becomes more severe if user application code is also executed in the platform context. It is a non-trivial task to address these issues with traditional web server farms. Hadoop tackles these issues with dynamic allocation of tasks based on data size and providing process isolation per task. Our new platform builds on its Hadoop roots to extend this to web applications. The number of server instances is decided based on the load referred to as elasticity. Hadoop provides isolation since the web application runs in the context of a single process. Consolidating all the hardware into a single Hadoop grid also gives lot of operational efficiencies. This paper is organized as follows: the design of Pepper is described in Section 2, and its features are discussed in Section 3. In Section 4, we compare our approach with conventional server farms and cloud serving systems. We illustrate some of the applications developed in Yahoo! based on Pepper in Section 5,

followed by its performance evaluation in Section 6. We conclude in Section 7 and discuss possible future enhancements in Section 8.

consistent with the number configured in ZooKeeper. If the number of servers is more, it kills the job running the web server. If the number is low, it spawns new jobs.

2. Design
Pepper uses the computing nodes of Hadoop to run web servers on a grid. We choose Java as runtime because Hadoop and ZooKeeper are built on it and also for its well known features like platform independence and sandboxing. Consequently, we pick the Servlet API as our system interface and WebARchive (WAR) for packaging web applications. Web applications are copied on the Hadoop Distributed File System (HDFS) and a web server is started per instance in the form of a Hadoop job. To keep things simple and guarantee isolation, we define that: 1 Hadoop job = 1 Map task = 1 Web server = 1 Web application Figure 1 shows the components of Pepper. Applications are deployed onto the grid using Job Manager. Map Web Engine starts web servers on demand that can come up on any of the Hadoop nodes. On successful bootstrapping, web applications register themselves to a central registry maintained in ZooKeeper. Proxy Router acts as a router, looks up the registry and redirects each request to the appropriate web server that will serve the request in a synchronous manner. The components are described in more detail below.
1 : Register webapp Job Manager Admin User 6 : Send request (synchronous)

3 : Add webapp node

ZooKeeper 2 : Copy webapp 4 : Send job 5 : Create host entry Hadoop JT

7 : Read avail. entries

Proxy Router

8 : Send request (synchronous)

Map Web Engine 9 : rollover logs

HDFS

2.1 ZooKeeper
It is a high performance coordination service for distributed applications. We use it as a central registry; the information of the tasks that run the web applications is maintained here. We choose ZooKeeper because it is distributed, consistent and high performing. ZooKeeper can scale up to hundreds of thousands of transactions per second on a majority read system like ours because of its local reads and lock-free properties.

Figure 1. Important components and system flow in Pepper

2.3 Map Web Engine


This is the implementation of a Hadoop map task. It starts, monitors, and then stops an embedded Jetty web server. We choose Jetty as it is optimized for running on embedded mode, supports graceful shutdown and also because it is already being used in Hadoop. Our design is not tightly coupled to Jetty and it can be swapped with any web server implementing the Servlet API like Tomcat or Grizzly. A ZooKeeper session is opened for the lifetime of the web server. A free port from a predetermined range is selected and the server is started. The server registers itself with the ZooKeeper by creating a hostname:port ephemeral node that is automatically deleted on the termination of the session (Figure 2). The server also registers a shutdown hook for graceful shutdown on receiving a job kill signal. Since the web application can run on any machine in the cluster, the generated log files are spread across the grid. We have developed a custom log4j appender that rolls over the logs onto a common location in HDFS. We then use any of the standard grid based aggregation tools.

2.2 Job Manager


This component exposes an API to deploy a web application and copies it onto a dedicated location in HDFS. Job Manager adds the web application name as a node in the ZooKeeper hierarchy. The number of web servers to run with an optional schedule is associated as metadata of the node (Figure 2). It then launches corresponding number of Hadoop jobs. Job Manager registers a watcher with ZooKeeper to receive notifications in case any of the servers become unresponsive. It also runs a monitor thread that periodically checks that the number of web servers is

2.4 Proxy Router


This is the single entry point of the system where clients send their requests. It listens on a unique port and is seen by clients as a standard web server. On receiving a request, Proxy Router gets the list of available web servers (hostname:port) that can run the web application from ZooKeeper hostnames node (Figure 2). It then forwards the request to one of the available web servers. It has a retry mechanism to handle the race condition if the web server has just gone down.

3.3 High Availability and Self-healing


Map Web Engine periodically sends a heartbeat to web servers and sends progress to Hadoop TaskTracker only on receipt of response. Hadoop JobTracker will restart the map task in case the server becomes unresponsive, thus ensuring self-healing. Configuring more tasks for the web applications achieves reliability. All the components are themselves stateless with the global state maintained in ZooKeeper. ZooKeeper is also used to coordinate between different instances of Job Manager by using a web application lock (Figure 2). Multiple instances of Proxy Router and Job Manager are run behind a load balancer to achieve scalability and high availability.

[{"tasks":"2", "window":"00:00-02:00"} , {"tasks":"1"}]

/web applications

[{"tasks":"1"}] /web application 1 /web application 2 /web application lock

3.4 Isolation
Isolation is critical as we execute user code and hence need to manage any resource contention with other web applications. The Hadoop map provides process isolation for every web application. For example, a memory leak associated with a web application can happen, but this will not affect other web applications as they have their own Virtual Machine (VM). It thus allows web applications to bundle different versions of native libraries. A common problem with server farms is the need to plan for dedicated resources at peak load. Pepper overcomes the problem of starvation by allotting exclusive web servers per web application.

/hostnames

/hostnames

/web application 1

/host1:1040 {"jobid":"1"}

/host2:1041 {"jobid":"3"}

/host1:1041 {"jobid":"2"}

web application 1 is locked for modication

Figure 2: metadata

Zookeeper

node

hierarchy

and

3. Features
We now describe the important features of Pepper.

3.5 Ease of Development


Since we use the standard Servlet API and WAR packaging, the developers have the freedom to use widely available tools and IDEs to build, test and debug the web application. The web application can then be deployed on Pepper without any change.

3.1 Scalability
Pepper provides scalability on two dimensions - a web application handles load in an elastic fashion by configuring more number of instances for it and the system scales simply with the addition of nodes to Hadoop cluster.

3.6 Reuse of Grid Infrastructure


Pepper runs on a Hadoop grid that is shared across several applications. The use of Hadoop leverages its features of load balancing, task monitoring, and resource scheduling.

3.2 Performance
Pepper achieves high throughput by ensuring that the heavy lifting steps of copying application onto HDFS, launching of Hadoop jobs and initialization of the web server are done during deployment. The processing pipeline has minimal overhead, as it only involves performing a registry lookup and redirecting the request to the actual web server.

4. Comparison with other systems


The benefits offered by Pepper are close to traditional cloud system [9]. However, cloud systems rely on bootstrapping a whole OS image in a VM [10]. This requires the creation and management of different images (one per application) which is much more complex than our simple WAR deployment. Besides, the cost of

launching a new instance of server is high and overhead of the VM is significant even though optimizations have been proposed [11]. Server farms on the other hand are easy to setup but require external scheduling, monitoring and cannot scale dynamically. In Pepper, since the WAR file is in HDFS it is automatically available for all machines, enabling hot addition of new machines. In the case of server farms, web applications have to be deployed in all the machines and when a high number of machines are involved, one needs to rely on external deployment tools for synchronization. Selectively scaling web application is complex in server farms. In contrast, increasing the capacity for a web application in Pepper involves only invoking a Job Manager API to update the configuration in ZooKeeper. Moreover, in server farms isolation is not guaranteed as the same machine serves several applications; a memory leak by one of the applications can bring down the node completely thereby affecting rest of the applications; a resource usage spike in one of the application can cause starvation for low throughput applications. The problem amplifies if the application needs native libraries. Conflicts across applications are painful to resolve and crashes of native libraries impact all the applications. Several implementations of cloud systems based on public [12] or private [13] components and tools are available. Despite the fact they provide an interesting set of features, they are rather complex to setup and maintain. Our approach is much simpler and the cost of development is limited. Running a Java Virtual Machine (JVM) simplifies the maintenance, troubleshooting, logging and reporting. Table 1. Feature comparison matrix Feature Elasticity Isolation Low Operability Cost Easy Deployment Resource Sharing Non-Java Runtime Support Pepper X Server Farm X X X X X Serving Cloud X X

platform where the scheduler is part of a coherent set of components, yet easier to install, configure and upgrade with new features such as security. We also close the gap between grid and cloud since the Hadoop cluster can also be used to run MapReduce jobs. We also note that standard MapReduce jobs are data intensive and rely heavily on I/O whereas the maps used by Pepper are CPU bound. There is a complement here that is not found in systems that are targeted solely for either serving or for data processing. In our case, the choice of Hadoop was augmented by the fact that it is already used in PacMan platform and more generally at Yahoo!

5. Applications
We now briefly describe a few real world applications at Yahoo! that utilize Pepper platform.

5.1 Web Feeds Processing


As an early adopter and contributor, Yahoo! has been using Hadoop successfully for many years to process large feeds through the PacMan platform. The current trend however is to process more and more web feeds like breaking news, tweets, quotes etc. and PacMan had to evolve to satisfy this need. Some of the processing was then delegated to server farms to avoid the latency of Hadoop job submission and bootstrapping. While we succeeded in reducing the overhead of the platform, we still faced bottlenecks like the lack of isolation, scale limitations with small files [8] on the HDFS and low flexibility to manage multiple native libraries versions. With the use of the PacMan platform by Yahoo! properties such as news, finance and sports, the number of high frequency small feeds dramatically increased. Requirements from such new customers did not fit the Hadoop paradigm. Although the need for a different architecture was clear, we wanted to retain the cloud qualities of scalability and reliability while at the same time bring down operational cost by sharing infrastructure. We improve processing of web feeds using the design presented in section 2. In Pepper, we are able to reuse and share the PacMan Hadoop cluster and run existing PacMan processing workflows (1 web application = 1 workflow) by simply configuring the workflow orchestration engine to run in-memory. Coupled with PacMan, this new platform allows us to handle the entire gamut of feeds processing. The platform scales dynamically on a cluster of machines and achieves high throughput due to the twin tenets of streaming data and in-memory processing that avoid HDFS latency. This allows Pepper to be used for near real-time feeds like finance quotes, breaking news and sports results.

Using Hadoop as the base scheduler is a benefit for several reasons. Multiple implementations such as ElasticSite [14] rely on Torque or other independent resource schedulers; Hadoop provides an integrated

5.2 Online Clustering


Clustering refers to computing sets of pages called clusters that are related to each other based on some metrics. Locality Sensitive Hashing algorithm is one of the popular clustering algorithms, which is used for doing offline clustering by utilizing the distributed computing power offered by Hadoop [6]. The offline clustering is performed on a periodic basis (typically daily) by appending the fresh data. The offline clustering system produces a clustering within a couple of hours for corpora containing a million news articles. This scale was still not sufficient for a few Yahoo! properties, as in the case of identification of hot news clusters. An online clustering is built on Pepper that extracts features and assigns incoming news articles to clusters predetermined by offline clustering. The scalability of Pepper enables early cluster assignment to be done online during the acquisition phase. These are also used as feedback for downstream offline clustering.

Figure 3 shows the throughput measured as the number of requests handled successfully per second for a specified number of tasks. As we can see, the performance is comparable with standalone Jetty servers [17].

6. Evaluation
In this section, we evaluate the performance of the Pepper platform. We measure the server response in two modes: linear scaling on statically predetermined capacity and elasticity by dynamically increasing capacity. We also illustrate the performance improvement of Pepper over PacMan in the case of small feeds processing. There are 3 Hadoop nodes in the computing cluster each of which are powered by dual quad-core Intel Xeon L5420 2.50GHz with 8GB DDR2 RAMs running 64-bit SUN JDK 1.6 update 18 on RHEL AS 4 U8, Linux 2.6.989.ELsmp x86_64. Each node is configured with 8 map slots each with a maximum heap size of 512MB. The embedded Jetty servers run with the default configuration of 25 threads.

Figure 4: Elastic scaling on dynamic capacity Figure 4 shows the throughput as well as rejection defined as failure to execute within predefined timeout. The load is increased and additional map task allocated at points A and B based on predefined schedule. The rejection count increases but quickly stabilizes back in seconds once the new servers bootstrap and start sharing the load. In production, we observe a failure rate of < 0.001% with judicious allocation of tasks based on load fluctuations. Pepper is designed to handle compute intensive tasks, like processing and enriching web feeds which involves validation, normalization, geo tagging, persistence in service stores, etc. Table 2 compares the performance of Pepper over PacMan for processing news feeds with sizes below 1MB. Table 2. Pepper performance numbers Burst Rate (request /min) 2,000 50 Through put (request/ day) 3million 10,000 Platform Latency (Avg.) 75 ms 90 s Response Time (Avg.) 4s 120s

System

Pepper PacMan

Figure 3: Linear scaling on predetermined capacity

7. Conclusion
We present a new approach to use Hadoop and ZooKeeper, both open source cloud technologies as the

base of a simple serving cloud that marries the benefits of traditional server farms i.e. low latency and high throughput with those of cloud i.e. elasticity and isolation. The cost and complexity is low compared to standard cloud virtualization systems while at the same time it retains most of the cloud features. Pepper has been in production since December 2009. The simple API and the advantages that it provides have led to its rapid adoption. At present feeds from all 340 newspapers consortium, 42 finance and 900 news partners with online clustering are acquired and pre-processed on Pepper and we are in the process of on-boarding Sports and Entertainment feeds. We have been able to scale linearly by simply adding more machines that results in more Hadoop map task slots available to take on the new processing needs.

10. References
[1] A. Lenk, M. Klems, J. Nimis, S. Tai, and T. Sandholm, What's inside the Cloud? An architectural map of the Cloud landscape, ICSE Workshop on Software Engineering Challenges of Cloud Computing, Vancouver, BC, May 2009, pp. 23-31. [2] Hadoop, Web Page http://hadoop.apache.org/ [3] J. Dean and S. Ghemawat, MapReduce: Simplified Data Processing on Large Cluster, 6th Symposium on Operating Systems Design and Implementation (OSDI04), San Francisco, CA, December 2004, pp. 137150. [4] T. Mather, S. Kumaraswamy, and S. Latif, Cloud Security and Privacy: An Enterprise Perspective on Risks and Compliance, O'Reilly Media Inc., 2009. [5] Oozie, Web Page http://yahoo.github.com/oozie/, http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3-b2oozie/ [6] T. H. Haveliwala, A. Gionis, and P. Indyk, Scalable Techniques for Clustering the Web, Third International Workshop on the Web and Databases (WebDB 2000), Dallas, Texas, May 2000, pp. 129-134 [7] P. Hunt, M. Konar, F.P. Junqueira, and B. Reed, ZooKeeper: Wait-free coordination for Internet-scale systems, Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, Boston, MA, June 2010, pp. 1111. [8] Discussion on Small Files issue on Hadoop. Web Page http://www.cloudera.com/blog/2009/02/the-small-files-problem/ [9] J. Y. Lee, J. W. Lee, D. W. Cheun, and S. Kim, A Quality Model for Evaluating Software-as-a-Service in Cloud Computing, Proceedings of the 2009 Seventh ACIS international Conference on Software Engineering Research, Management and Applications, Washington, DC, December 2009, pp. 261-266. [10] L. Zhong, T. Wo, J. Li, and B. Li, A VirtualizationBased SaaS Enabling Architecture for Cloud Computing, Proceedings of the 2010 Sixth international Conference on Autonomic and Autonomous Systems, Washington, DC, March 2010, pp. 144-149. [11] W. Huang, J. Liu, B. Abali, and D. K. Panda, A Case for High Performance Computing with Virtual Machines, Proceedings of the 20th Annual international Conference on Supercomputing, Queensland, Australia, July 2006, pp. 125-134. [12] C. Ragusa, F. Longo, and A. Puliafito, Experiencing with the Cloud over gLite, Proceedings of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing, Washington, DC, May 2009, pp. 53-60.

8. Future Refinements
This first version of Pepper has a few constraints that require further enhancements. For example, once a web application is deployed, the corresponding Hadoop task is always running, thereby blocking a slot. Reclaiming the tasks automatically if the number of requests goes below a certain threshold and reactivating the task on demand will improve the efficiency of the system. The distribution of requests between web servers can be improved by using a dynamic load balancer. This becomes a significant benefit if the request processing time has a random variance. Pepper proves that web serving on Hadoop is viable and that this facet can be incorporated into the core Hadoop itself in future releases to make it a more powerful general-purpose computing platform. This can be achieved non-intrusively by implementing a custom map class that acts as the Map Web Engine as described previously. A more flexible and extensible solution would be to extend Hadoop JobTracker and TaskTracker to support a generic task instead of just MapReduce tasks as it does currently. Then we could have map, reduce and web tasks as instances of the generic tasks that are managed natively. A user would then have the choice of submitting a job Java ARchive (JAR) or WAR to Hadoop as opposed to only JAR today.

9. Acknowledgements
We would like to thank Alejandro Abdelnur for germinating the idea; Amit Jaiswal, Karteek Jasti, Richa Gupta, Ravikiran Meka, Ruchirbhai Shah and Rupa Satrasala for their contributions in successfully productizing it; Shanmugam Senthil for his support; Aby Philip, Ashvin Agrawal and Suma Shivaprasad for their valuable feedback on the paper.

[13] K. Chard, W. Tan, J. Boverhof, R. Madduri, and I. Foster, Wrap Scientific Applications as WSRF Grid Services Using gRAVI, Proceedings of the 2009 IEEE international Conference on Web Services, Washington, DC, July 2009, pp. 83-90. [14] P. Marshall, K. Keahey, and T. Freeman, Using Clouds to Elastically Extend Site Proceedings of the 2010 10th IEEE/ACM Conference on Cluster, Cloud and Grid Washington, DC, May 2010, pp. 43-52. Elastic Site: Resources, international Computing,

[15] C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins, Piglatin: a not-so-foreign language for data processing Proceedings of the 2008 ACM SIGMOD international Conference on Management of Data, Vancouver, BC, June 2008, pp. 1099-1110. [16] Computing Platform for Structured Data Processing, United States Patent Application 20090307651 [17] Web Performance, Inc, Web Page http://www.webperformanceinc.com/library/reports/ServletRepo rt/

Você também pode gostar