Step by Step Trouble Shooting

From Documentation

Performance SOP

Identify the Bottleneck

Usually the bottleneck can be found in one of these areas:

  • client
  • network
  • server

To breakdown a slow performing web application is a good idea to start, where the bad performance is perceived... in the browser. Most modern browsers provide very sophisticated tools supporting the search for a bottleneck and draw some conclusions, and eliminate other possible causes easily.

Developer tools - Net(work)
Chrome -> [F12] / [CTRL + SHIFT + I]
Firefox -> [CTRL + SHIFT + Q] or Firebug [F12]
IE9+ -> [F12]
IE8 & others -> fiddler2

Chrome developer tools network.png

Investigating the network traffic following the questions below the biggest problem area(s) should become apparent after a few minutes:

1. Are there one or more long running requests?

NO → #Client Side Issue

YES

2. Is it a static resource?

YES (js, css, images...) → check #ZK Server Configuration (debug / cache / compression) → STILL SLOW → #Network Issue

NO (dynamic request into ZK application)

  • *.zul = full page request (can be followed by ajax requests)
  • zkau/* = ajax request

3. Which PHASE of the request is slowest ? (examples based on Chrome developer tools)

File:Chrome developer tools network example phases.png

CONNECTING (or one of Proxy, DNS Lookup, Blocking, SSL)

3.a) Is this a network problem (everything between browser and ZK Application)?
  • test ping / trace route to different servers
  • test dns lookup timing
YES → #Network Issue
NO → #Server Side Issue (application takes long time to accept connection, or even times out)


SENDING

3.b) Is the request unreasonably big?
YES → #Client Side Issue
NO
3.c) Is the bandwidth low?
  • e.g. try upload the same amount of data to the server via ftp/scp to check possible upload speed
YES → #Network Issue
NO → #Server Side Issue (application server receiving request data slowly)


WAITING#Server Side Issue (application server taking long time to prepare response)


RECEIVING

3.d) Is the response unreasonably big?
YES → #ZK Server Configuration (render on demand / compression)
NO
3.e) Is the bandwidth low?
  • e.g.try to download the same amount of data from the server via ftp/scp to check download speed
YES → ask your administrator to fix it ;)
NO → #Server Side Issue (appserver sending response data slowly)

Client Side Issue

If there is no significant time spend on the Network and Server side, the slowdown must happen somewhere on the client side.

Client side performance is affected my many factors and may vary with different browser types versions. Other factors are:

  • operating system
  • available memory
  • CPU speed / load
  • screen resolution
  • graphics card speed

So it is good to compare the client performance on different computers with different browsers, to identify the configuration causing the issue.

If client performance among configurations / browsers is equally bad, the issue will more likely be found in the Rendering area → #Client Side Profiling

Once you identified the client side rendering takes very long, check the size of the response, if the Client engine needs to render a lot (e.g. a Grid with 1000 lines) it will take its time. So compare the timing with a smaller response, and consider if this can be prevented by reducing the data sent to the client using Render on Demand or Pagination (Most users don't need 1000 lines visible at once)


Performance degrading over time when using the application might indicate a memory leak → #Client Memory Issue

Client Side Profiling

Make sure your local computer is not under heavy CPU load, and has "enough" Memory available, before starting to profile the Javascript execution in the browser.

To measure and break down the time spent in the JS engine you can try the following steps (in chrome), and interpret the results:

  1. switch to the "profiles tab"
  2. choose "Collect Javascript CPU Profile"
  3. click "start"
  4. perform your action e.g. reload the screen, or load the search results
  5. click "stop"
  6. switch to "Flame Chart" (choice at the bottom)

You'll get a nice view like this:

Js profile flame chart.png

This brilliant visualization of the JS execution flow and stack depth can be used / interpreted in many ways to extract the information you require.

The timeline on the top indicates the whole period between "start" and "stop", I selected the range we are interested in, and the colorful area at the bottom gives details about which methods are actually called and their timing (you can zoom in and out using the mouse wheel too), clicking on one method will directly lead you to the associated line in the source code (enabling debug-js will help when using this feature).

The small peak (at 1300ms) on the left side is my actual event refreshing the page. The gap until 1500ms represents the idle time of the JS engine waiting for the first response from the server, then you can follow in which order the JS files are executed and which of them consume most of the time.

Additional waiting times in the middle indicate load time of additional JS files and garbage collection times. At about 1840ms the JS engine stops meaning ZK has finished rendering the page widgets and updating the DOM elements.

We can conclude our page took about 540ms (after the initial event) to load and render in order to become available the user and here is no significant slowdown on either JS or network side.

General conclusions:

  • more wider mountains &rarr: mean more JS time (e.g. ZK render time, a third party library)
  • more valleys (flat lines) &rarr: mean more waiting time (mostly network, maybe also CSS formatting or other time the browser does not assign to the JS engine)

A similar but not as colorful view is available in Firefox [Shift + F5] showing some basic timeline. And in IE you can also get profiling data.

I find the flame-chart in chrome most powerful, and even if the Performance issue only occurs in a different browser it is a good starting point visualize the timing of the problem, and understand the complexity of the execution path (e.g. in IE you get the information that most time is spend in method x(), and in chrome you can actually see the method in a bigger context). Or when using an older version of IE, you can define the methods of interest where to put some log statements to trace performance manually.

Another interesting view to determine the render time is the Timeline - Events view in Chrome e.g. :

Js timeline events.png

Client Memory Issue

Use a system management tool (e.g. on windows: task manager or process explorer) to watch the memory consumption of your browser over time.

If memory is increasing repeating an action on a page -> subsequently remove items from your ZUL file to identify the component causing this issue.

Again Chrome offers a timeline showing garbage collection and memory usage statistics.

Server Side Issue

http://www.computerperformance.co.uk/HealthCheck/Disk_Health.htm#Disk%20Bottleneck%202 OS or VM

Perfmon.exe

 LogicalDisk(_Total)		-> Avg. Disk Queue Length
 LogicalDisk(_Total)		-> Current Disk Queue Length
 PhysicalDisk(_Total)		-> % Disk Time
 Processor Information(_Total)	-> % Processor Time


JVisualVM

 - Busy vs. IO wait
   - Java Debugging
   - Task manager / Process explorer / top
 - GC logging
 - profiling
 - memory leaks

Appserver config e.g. check number of concurrent requests

ZK Monitor/Statistics...

 - Sessions
 - Desktops

Server Side Profiling

Memory Issue

ZK Server Configuration

Network Issue

If something in your network infrastrucure (routers, proxies, web servers...) is causing the performance issues, there is little you can do as web application developer.

Here some ideas to identify possible bottlenecks trying to reduce network complexity by:

  • using ip addresses directly → indicates DNS problem
  • avoiding proxies, routers, firewalls (e.g. access the application server from a remote desktop "closer" to the actual server)
  • accessing the application server directly, instead of going through a webserver or load balancer
  • disabling SSL and check difference

→ "Kindly" inform your network administrator about your observations and ask for help identifying, excluding, fixing these infrastructure problems.