How To Mitigate Against "Timeout Waiting For Data" RS Failure
When a real server is failing due to Timeout waiting for data this means the failing real server’s response is taking longer then the configured Connect Timeout to reach the LoadMaster.
By default the LoadMaster’s Service Check Parameters run health checks every 9 seconds, wait up to 4 seconds to receive a response from the real server, and retry the check twice. See Figure 1.1 below for the location of the Service Check Parameters on the Kemp LoadMaster.
Figure 1.1: Service Check Parameters
The Check Interval is calculated using the following equation:
Check Interval (minimum) = (Connect timeout) * (Retry Count) + 1
To remedy against the Timeout waiting for data real server health check failure, you must first analyze how long your real servers are taking to respond to the LoadMaster health checks. This can be done by taking a look at your LoadMaster’s System Message Logs and Warn logs. Figure 1.2 displays an example of what real server failure due to Timeout waiting for data looks like.
Figure 1.2: Timeout Waiting For Data In Warn Logs
In Figure 1.2 notice how there are three separate instances in which the real server fails its health check due to Timeout waiting for data and is then re-added some time after.
08:42:28: You can see that the real server 10.10.0.152:8045 is removed..
08:42:33: The real server is re-added after it successfully responds to the LoadMaster’s health check.
In the scenario above the real server took 5 seconds to respond to the LoadMaster causing it to fail its health check. In fact in each scenario within Figure 1.2, each real server fails its health check because the real server takes 5 seconds to respond to the LoadMaster health check.
If health checks are failing because the real server is taking 5 seconds to respond to the LoadMaster, adjusting the Check Parameters to 6 seconds would ensure that the real server health check passes.
The Loadmaster Service Check Parameters Connect Timeout value has a maximum configurable value of 60 seconds. This will give a real server’s response to the LoadMaster’s health check up to 60 seconds to reach the LoadMaster.
Slightly changing the Connect Timeout is necessary in some scenarios where a real server is failing health checks because it is missing the Connect Timeout limit by a few seconds.
Increasing this value by too much can promote the inability to identify real server latency to the LoadMaster. Latency issues should ultimately be addressed within the customer environment.