Você está na página 1de 60

1

It was a long way more than 10 years since SAP started with Java! and Java was too young! Version 1.2 at this time!

SAP develops java-based software since more than 10 year. In 2010 SAP Java Application Server registered a success with SPECjEnterprise benchmark a prove that SAP NetWeaver Java Server is scalable and competitive and is excellent basis for scalable and competitive java applications!

The Java virtual machine itself provides performance optimizations with every new release. The SPECjbb2005 Benchmark is an advanced simulation of purchasedelivery process. A comparison with this benchmark between Java 5 and Java 6 shows remarkable increase of throughput.

The same benchmark was executed to compare different competitor JVMs with SAP JVMs shows that on many platforms SAP JVM is better. SAP JVM has its own performance optimizations, for example string concatenation, compressed object references (Oops), tiered compilation (tier1 and tier2). Details could be found on the web pages of SAP JVM. Even with the old NetWeaver 7.0, which is still running at customer side, now the customers can switch to and benefit from stable and solid SAP JVM 1.4 as replacement of the no longer supported by other vendors 1.4 versions of Java.

An application is scalable if it is able to serve additional users with additional hardware capacity and extended software configuration.

The hardware capacity could be extended by adding resources, such as memory or an additional/upgraded CPU to the same machine (so called vertical scaling), or adding more machines to the landscape (so called horizontal scaling).

Netweaver Java Server 7.30, similar to previous releases, supports the concept of extendable java instances and cluster behind central load balancer. The load balancer is only required if there is more than one Java instance used. With only one instance the load balancing between the java server nodes is done by ICM. Due to multi-threaded architecture one java server can utilize multicore machine pretty well. New Java server node (vertical scaling) is added to bring more memory to the cluster to handle higher load. New Java instance on another machine (horizontal scaling) is usually added to the cluster when capacity of the already allocated machines is used more than 65%. The components, which are indicated as Central are message server, enqueue and relational database. The cluster throughput could be limited if those components are not able to scale further.

The thread management system is the backbone of NW AS Java. The concept of thread pool ensures the parallelism of processing requests with no runtime overhead for expensive creation of new threads. Wrong usage of threads can harm scalability. The resources, like db connections, jco connections, etc. are handled by pools of opened physical connections which are logically re-used by the applications in parallel threads. Inappropriate usage of connections can impact scalability. Opening and closing of physical connection at runtime is very expensive and should be avoided. The java memory management with the garbage collection mechanisms is factor which heavily impacts scalability especially when garbage collections take longer. For high throughput, cluster is required. The server nodes in the cluster need to be consistent to one another. Cluster communication APIs are available to exchange data in cluster, such as membership notifications; replication/updates/invalidations, etc. If business logic needs to be consistent in cluster in many cases logical locking should be applied. With Java Server 7.30 there is Locking service which is implemented to manage locking via SAP enqueue. Logical locking is required for example for deployment or startup of applications in cluster. Database locking is valid for transactional applications, where transaction management mechanisms of the application server should be used.

10

It is not recommended to start new threads from the applications. Not only creation of a new thread is expensive because it is OS resource but also too many threads can cause stability problems because other shared resources, like memory and connections, can get short. For better runtime performance the thread pool size could be configured with min=initial=max number of threads: expanding or shrinking of the pool requires synchronization of incoming tasks until the resize is completed. The threads in the pool are precious resources: do not block them for periodic tasks because if the threads are blocked for long time the capacity of the system for processing parallel requests is affected. For periodic tasks or asynchronous requests rather use scheduling mechanisms (in Java server 7.30: Job Scheduler), timeout management (register task class and it will be executed in dedicated threads by timeout service), and for j2ee applications use Message Driven Beans, which will be taken care by the platform. Multithreaded environment requires thread-safe coding. Too much synchronization may cause contention. Wrong usage of synchronization can cause deadlocks. Too little synchronization can cause data inconsistency and there is danger of infinite loops. If you ever tried to resolve a contention problem perhaps you experienced cascading effect - the first level synchronization hides the next level of synchronization. If you have provided fix for the first level , it can happen that at the next level contention is even more severe than the one which was fixed and

11

the throughput is not increased but decreased (!) after the fix. Load test verification is mandatory with java applications when changing synchronizations.

11

One example on how contention looks like in thread dump: many threads are waiting to lock on object which is locked by another thread. If the thread locks the object for 10 ms and if there are 10 threads waiting to lock the resource, the last thread would add at least 90 ms wait time to the response time it delivers to the end user. With this concrete example the reason for contention is related to getting connections from the db connections pool. Normally this synchronization is very cheap but in case there are no free database connections in the pool, the wait time might be unpredictably long. SAP delivers appropriate profiler so called SAP JVM Profiler with a synchronization trace feature which can be applied to identify such hotspots. This trace is to be used preferably with single user requests (not under load).

Deadlock occurs when synchronization (locking) on objects by different threads is done in different order. With all execution paths of the program the shared structures or objects should be locked in the same order. It is good idea to take special care of it and evaluate the required synchronizations theoretically in architecture and development phase.

Insufficient locking can lead to infinite loops or to hard to reproduce and debug data inconsistencies. It is a scalability problem because if it appears it reduces the system capacity by the number of threads which fall into infinite looping and reduce the system capacity by at least one full core/CPU. Infinite loop can be stopped only by restart of the complete java server, therefore it is not situation which is affordable for customers.

Opening and closing of physical connections are runtime is expensive. It is recommended that the db connections pool is configured with a fixed size. Never allocate JDBC/JCO/other connections in recursive methods or in loops: connections which are taken cannot be released because the method is not finished, but it cannot finish because it waits on getting yet another connection. Some persistency implementations try to automatically detect such application behavior handle it automatically, but it is additional code to be executed (maybe error prone) so applications better do not rely on such mechanisms but do correct coding. Caching of connections in the application layer, outside the main connection pool, is bad idea; it hardly makes any sense. The connections should be explicitly closed in try-finally block to return to pool and be available for other threads.

15

Example on how to correctly close connections and resources.

There are different situations in which Out of Memory Error can happen. For example, there could be memory leak or there could be simply memory shortage due to wrong memory sizing. Memory shortage is resolved by adding more java servers. Adding more java server nodes cannot resolve but only postpone the crash due to a memory leak. Typically the reason for memory leakages is wrong decision on scope of objects. For example, if user session specific data is added to central structures most probably it will remain on the server after session is terminated. The soft and weak references are some mechanisms to handle this problem, but there is different problem with it that there is no exact control what is kept in memory and what not and some frequently used cached data might require to be regenerated after collection, and that can require even more resources. The execution of finalize() method increases the duration for garbage collection and can also contribute to stability problems.

17

The garbage collection time has influence on the end user response time. If we have frequent FULL GC the response times of the end user are affected more by the wait times. Full garbage collection is only responsible for peaks in response time and temporary reduction of throughput but it is not harming the overall availability, stability and scalability of the system.

18

Less cluster communication always improves scalability. The APIs for communication shall be selected appropriate: with Java Server 7.30 no bodies should be send through message server, but the lazy channels between the different server nodes shall be used. The volume of data which is exchanged should keep small in all applicable ways. The java server may sometimes be unreliable as receiver of data packages from other server nodes: thus it is almost forbidden to use notifications which require return of result (answer from the receiving server).

19

When transferring messages, the system uses one of the following types of communication: Message Server Communication the communication is established through the Message Server used as a dispatcher when sending messages. On a Cluster Manager level, a verification is made of the threshold value of the message body size. If the size is below the threshold value, the message is sent through the Message Server. If the size is above this value, the connection is through a specially opened lazy communication channel. Lazy Communication lazy communication is used when transferring large messages. This function allows large amounts of information to be exchanged quickly between two servers without using the Message Server as an intermediary. Instead, the information is transported through sockets that are opened on both servers. The main goal is to avoid overloading the Message Server.

20

The most essential aspect of locking with regards to scalability and performance is the duration of the lock: lock as short as possible and as long as required. There are writer locks (shared locks) and reader/writer locks (exclusive locks). If shared lock is used the readers can still access the data concurrently. If the isolation is stronger (exclusive lock) throughput and scalability decrease. There are different locking techniques: Database locks, which is a locking technique provided by the database vendor. For more information, see the documentation of the database, because database vendors do not offer uniform semantics for locks. Logical locks, which is a locking technique provided and managed centrally by the Web AS Java. Logical locks are managed by the Enqueue Server via a central lock table.

21

J2EE applications use the LogicalLocking and TableLocking interfaces provided by the Locking Adapter Service. These interfaces access the Locking Manager, which in turn communicates with the Enqueue Server. If the lifetime of the locks is user session, considering the usual default session duration of 30 minutes, those locks might be really long. It is recommended not to use such high granularity.

22

The architecture of the application determines to very high degree the resource consumption (as part of TCO): the customer will have lower hardware and maintenance/administration cost. By optimizing the software we reduce the TCO for customer! The best and most efficient optimizations are achieved when - scenario execution paths are optimized by truncating all function calls which can be avoided - reusing all function results which can be reused to avoid repeatable calculation Architecture, which is based on multiple software components and involves a lot of remote calls will not have good performance. Design, which is based on multiple software layers and data structures with high access/time complexity algorithms cannot have good performance.

23

Major impact on resource consumption of the application have the User interface design, the number of calls and volumes of data which are remotely send/received between the different systems /clients, the design of the service and component APIs which are involved in the processing and the appropriate design/choice of data structures, alignment of data types and correct scope of data.

24

When designing the user interface the main concern should be about the volume of data which is transferred to the user UI. Correct planning and minimization of exchanged data volumes guarantees as low as possible resource consumption in the entire application layers. The main areas where unnecessary waste is typically observed is in - technical key, ids and page layout which if overhead information calculated by server on every request. Especially unnecessary is when applications (UI frameworks) are using human readable / unnecessary long UI element IDs like in the given example. - Displaying data which requires scroll bars on the end user UI is wasting resources, because data is fetched which might never be used. Better is to provide pagination functionality and minimize required data for one page. - Similar when data volume is not predictable (for example amount of found search results). Correct handling there should be planned.

25

No boomerang calls to the system itself should be send: if http, web-service or another call is send from the server to itself there is danger of blocking system under higher load. The goal should be to always achieve minimum remote communication (to DB, to ABAP server, to MDM, to HANA, to TREX, to cloud services, etc.). Minimal data exchange saves not just network resources, but also memory and CPU resources. Compression needs to be planned only when it is more efficient than expensive. It is trade off between size of exchanged data and the resources required for compression/decompression. Some protocol optimizations such as MTOM (Message Transmission Optimization Mechanism) reduce the throughput (bytes send/received) of webservices but additional CPU and memory resources for text encoding are required.

26

Performance and scalability depend on the quality of service interfaces and component APIs. Merged Data APIs: The server side APIs need to be planned in synch with the screens, which will be shown in the end user UI and make sure that for the most performance critical screens always optimal APIs for retrieval of data (with one remote call) are available. (Sorted) Pagination: Existence of such APIs is absolutely essential especially because memory in Java is shared resource and should never be challenged to hold , even in scope of one request, unpredictably high amount of data. The pagination implementation is typically provided by the source of information layer (be it against database, TREX, HANA, remote cloud service, etc.). Bulk APIs (also called mass calls) are very important for scalability they save remote communication and simplify a lot the execution flow of the applications : millions of singleton method invocations can be avoided.

27

The implementation of the Bulk API should not be fake! It is very simple to do loop around the singleton method to program the mass call. But this does not help at all the performance and scalability. The mass calls should have their own implementation, which is optimized as much as possible. With this example the correct implementation of the mass call is with database statement which has multiple members in the where clause and thus gets advantage of optimized algorithms at database side for mass data retrieval and avoids multiple remote calls to the database.

28

The overall decision on scope of data is one of the most important decisions for achieving optimal memory allocation by the application and improving scalability with multiple concurrent users. Stateless applications scale better. But usually application needs to be stateful for functional reasons. If there is possibility to provide some application screens in anonymous more (stateless mode) this opportunity should be used. To keep number of active sessions lower, appropriate timeout should be chosen by the applications. Caching is a trade-off between CPU utilization and memory usage, and the ideal balance for this trade-off depends on how much memory is available. With too little caching, the desired performance benefit will not be achieved; with too much, performance may suffer because too much memory is being expended on caching and therefore not enough is available for other purposes

29

If the cache is too big then more memory will be consumed by cached objects, which affects available free heap space. If free heap space is low, more frequent Full GC will happen, because typically cached objects are located in the tenured space. The Application server may run out of memory even when no real memory leak. If the cache is too small, lots of persistency accesses, regeneration of objects, etc. and corresponding heavy load on CPU and disk I/O will happen. A general guidance is that data structures, which cost a lot of memory and CPU in runtime processing should be used rarely, only when no alternatives are appropriate.

30

31

The software performance KPIs are defined after the hardware resources like CPU, memory, disk and network. Java KPIs do not make major difference, apart from the breakdown of the memory KPI, which is extended for the logical purpose of the memory allocation from application standpoint and in the context of the special JVM Memory Management to processing memory, session memory and framework memory. Different layers and applications can be integrated to work together on the same Java Server. Only very early measurements, evaluations and addressing optimization requirements to bad performing and non-scalable components can give chances for success. In the modern Infrastructures as a Service (IaaS), some KPIs like, for example CPU and throughput, are often taken in consideration for billable quotas and if total allowed CPU or throughput quota is exceeded the application might not be further accessible to end users. This emphasis the need to minimize and optimize resource consumption.

32

For practical reasons the memory KPIs are split into categories: framework memory, session memory and processing memory. The framework memory is full of objects initialized at start up and warm up of java server and live as long as java server is started. The session memory contains only objects which live until user is active on the system, until session is destroyed or timeout. This size usually is almost identical to the size of the serialized session object. The processing memory is only valid in scope of one request and is expected to be garbage collected already in the Eden space.

If we have to face a choice: it is better to increase processing memory consumption rather than session memory consumption, because the session memory is still allocated even if the user is not sending requests for a while, until it logouts or the session timeouts. The processing memory is only allocated when the user is actively sending request. The intervals of small garbage collections, which are supposed to free the processing memory, may vary in certain ranges without too significant impact. In any case, it is best to optimize both session memory and processing memory to lowest possible value.

34

The Java Distributed Statistical Records is available in SAP NetWeaver Web Administrator in the section for trouble-shooting and analysis. All KPIs related to resources consumption on Java server itself, as well as number of calls and bytes exchanged to other systems are provided out of the box (no additional configurations are required). Only processing memory per dialog step is measured with JDSR. The session memory and the framework memory are calculated after analysis of heap dumps with SAP Memory Analyzer tool.

For Java Applications load testing is simply mandatory. No static check or code review can guarantee that a concurrency issue did not found its way to the deliverables. Even a load test cannot guarantee to 100% as not all possible execution paths could be tested, but at least one should execute load test for the bread and butter scenarios! The load simulation tool is not sufficient. The metrics which are collected in the load generation tool are black-box data which does not give any insides in case of problems. To save time and get more confidence from the test results, automation environment might be used to operate the load generation tool. Together with the operation of the testing flow, specific log files and other sources of information will be collected from the system under test. For load testing in general it is very important to build reliable, realistically parameterized and randomized scripts.

36

37

For every typical performance complain SAP is offering tool for deep analysis and break down of the problem. Java Distributed Statistical records provide breakdown of the response time to identify if the slow-down is on Java server side or related to remote component or service. If the problem is on Java Server side itself, the SAP JVM Profiler can be applied.

38

The Eclipse Memory Analyzer, developed by SAP, is the tool to analyze any heap related issues , like too big session space, too big framework space, memory leaks and so on.

39

Wily Introscope is extendable in development environment too. Probes can be added on demand to narrow down the problem. It is good idea to instrument the API which are used for communication between different application layers. Do not instrument low granularity methods. Methods with execution time less than 50 ms does not make sense to be instrumented the overhead will be higher than the benefit.

40

Reference documentation on the mentioned tools is available in the web.

41

The most of our precious time is still spend in arguing and doubts in our own capability to resolve performance bugs. Unlike the functional correctness fixes, the performance fixes are typically wrongly perceived by management as risks and many times avoided! And in many such situations the pressure to provide the fix comes later, in unplanned time and escalated by some customer.

42

Here are some coding examples.. The reasons why it was coded this way are unknown: there are more memory efficient ways to implement this! Especially with regards to generating log messages without checks of log level is waste which can be easily avoided by doing boolean checks like beTrace(), beError(), beDebug() etc. before actual message concatenation.

43

Object by object, even with small delta of few bytes, the java heap gets full. There are millions of objects generated on every end user request! We should not allow waste and have a good reason for every object which we create.

44

A very prominent bug is with concatenation of strings which are done in frequently executed methods. With this example, before every access to cached value, the programmer generates the access key to the cache. Caches are intended to be accessed frequently the more accesses, the more efficient the cache is, so here it is not a problem that the cache is accessed 3336 times with get() method. The problem is that same key is generated 3336 times wasting in total about 35 MB of processing memory! Similar would be the situation if a frequently used toString() method of an object is creating internally string objects on every invocation.

45

To minimize generation of objects for some most frequently used objects could be created Thread Local-based pools. In this way with no synchronization and central pool size tuning can be achieved lower memory footprint. Behind the PreparedStatement class stands the prepared statement cache, which reduces the runtime and memory consumption for preparation of SQL statement. This cache is not working if just Statement is used in the code.

46

To take advantage of the cache the statement should be implemented appropriately avoiding hardcoded parameters in the SQL statement string, but using the setter methods instead.

Network/Disk should be implemented with buffered reading with appropriate buffer size. Compression should be used when appropriate.

48

This is example of how correct reading of file should be implemented with reusing the buffer and closing the stream appropriately in finalize() method. When we decide for appropriate buffer size, the following example can help: With the big SDA/ SCA files (size of several hundreds of MB) which are used to pack the SAP software for deployment to java server it will be totally ineffective to read with buffer which is only few KB big. Appropriate buffer size in this case could be some tenths of MB big buffer. With regards to http request/response read channel buffer of 1 MB even will be highly oversized because typically the http communication between browser and server is only 20-30 KB per request.

49

There is no need to implement own algorithms as alternatives to algorithms which are provided by the Java Virtual Machine. If something is missing there or needs to be optimized address to SAP JVM team, so that every one later on can benefit from the optimization.

50

51

The teams are often reluctant to measure performance in test environment with the arguments that the machines which are used for development and functional testing are not compliant with reference performance landscapes. This is true. On the other hand early measurements reduce the risk to find performance issues and degradations very late after code is submitted and consumed by other developers. Development has to ensure no degradation in CPU, memory , # roundtrips, # bytes send/received from one change list to another. If optimization is requested it can be verified on relative comparison basis: for example CPU time was reduced by 15% due to submitted optimized code. KPIs, such as the end to end response time , remote call duration to ABAP server, etc., which may experience network latency delays should be evaluated carefully to prevent from false alarms and wasted time.

52

Jlin has already more than 10 years history in SAP. It has proven its value for identification of basic performance anti-patterns in java code. It cannot trouble-shoot all potential performance problems of java applications, but it ensures that we do not waste time in testing faces to discover and repair amateur mistakes.

53

There are many different variants of automation which are possible. The number of variants are driven mainly by the goal to reuse as much as possible already existing functional tests for performance measurements and thus ensure no additional effort in test maintenance, but only added value of performance checks. When the functional tests are implemented with Junit, the performance measurements can be done by integrating SAP JVM API or JDSR API.

54

When the functional tests automation based on Selenium framework, can extend to collect performance KPIs from the SAP JVM API and/or JDSR APIs.

55

Load runner based automation makes sense if we have no user UI based functions , such as web-services or proprietary remote interfaces which are intended for reuse in both single call tests and load tests.

56

Java gives us the opportunity to implement scalable and fast applications : it depends entirely on developers and architects to use this opportunity!

57

59

Você também pode gostar