Health checks on non load-balanced / unrelated servers in other subnets

0

Hello,

Is it possible to build the following environment:

In the DMZ we have two View Security Servers. Only port 443 is load-balanced between the two (so completely ignoring PCoIP, BLAST etc. as Internet based View Clients bypass the Load Balancer for these protocols).

In the LAN, the two Horizon View connection brokers that are paired with their respective Security Servers.

Security Server "A" with 172.16.1.1 (DMZ) is paired with Connection Broker "X" with 192.168.10.1 (internal LAN)
Security Server "B" with 172.16.1.2 (DMZ) is paired with Connection Broker "Y" with 192.168.10.2 (internal LAN)

I'm not load balancing anything on X or Y at all. I just need to check their health together with it's paired partner server.

I need this as a health-check:
- when either port 443 and or 4172 on "A" **AND / OR** port 4002 and or 8009 on X goes down, treat entire server A as down.
Same for the other pair:
- when either port 443 and or 4172 on "B" **AND / OR** port 4002 and or 8009 on Y goes down, treat entire server A as down.

In other words, Servers A and X are essentially treated as one unit. If either one goes (partially or completely) down, consider Server A as down even though A has no issues at all.
Same goes for B and its partner Y. They need to be checked as one unit.

The reason is that Horizon View Security servers are useless without their paired connection brokers on the inside. They need to be checked / treated as a chain so to speak.

I've already spend many hours on this. Trying to use subSV's for example, but I can't get it to work (getting the checks of the X and Y servers to work I mean).

Can this be done at all (chained / nested checks i mean) ?
(ps. i've looked at the View Templates but they work in a totally different way and have no "chained checks" and I cannot use them at all).

Kind regards,
Steven Rodenburg

10 comments

Avatar
1
Barry Gleeson

Hi Steven,

It would probably be best to discuss this by opening a support ticket. From the ifo you have provided the approach I would suggest is as follows:

 

You will need separate VS's for each Security Servers. VSA and VSB Then within these VS you can create 4 SubVS's. The 4 subVS's will each have a single Real Server.

1. (For actual Traffic),  RS is Security Server port 444

2. For Port 4172 

3. For Connection Broker 4002

4. For Connection Broker 8009

You will need a content rule to force all user traffic to SubVS1. However all the SubVSs will be marked as critical so that if they fail the whole VS fails

Finally as this solution will mean you have separate VS's (VSA and VSB ) with each only sending traffic to a single security server you will need another VS (VS1) to load balanced between these two VS's

 

So VS1 ------->VSA, VSB as real servers

VSA - subVS1(Traffic), subVs2, subVs3, subVS4

VSB - subVS1(Traffic), subVs2, subVs3, subVS4

 

see attached a rough idea of what I am suggesting

 

 

 

Avatar
0
Barry Gleeson

Hi Steven,

The first feature that springs to mind would be the use of the "Critical SubVS".

I'm not sure if you tredi this but to explain its oepration:

 

Lets take a VS with multiple SubVSs. 1,2 and 3.

If SubVS-1 is marked as critical then the Whole VS will be deemed down if SubVS-1 is down.

 

 

In your example you could potentially use this along with a nested VS:

 

(I have used three IPs to illustrate )

 

Main VS  : 172.16.1.1:443   RS1 172.16.1.100:443, RS2 172.16.1.200:443 (These RS's then point to other VS's)

 

Server"A" VS:  172.16.1.100:443 SUBVS1(c) SUBVS2

Server "B" VS: 172.16.1.200:443 SUBVS1(c) SUBVS2

 

For both Server "A" VS and Server "B" VS:

On SUBVS1 which is Critical you would set the Real Server as the Connection Broker and therefore the healthcheck on the Connection Broker IP. As this is critical it would means that if these health checks failed the Whole SubVS would be marked as down. (In this scenario ALL of the health checks would have to fail) This would then make the Security Server be deemed down.

A content Rule would then be created to send all actual user Traffic to SUBVS2. This would have the actual Security Server as a Real Server with the appropriate Real Server Health Check.

 

Let me know if this addresses your problem. If not it may be best to create a support ticket and one of our engineers can see if this is possible.

Barry

 

 

Avatar
0
steven

Hi Barry,

I used ciritcal subSV's. Problem is: i create a subSV for Real-Server "A". One of the subSV's real-servers has port 443 and the other does 4172. Both are marked critical.

That works fine. But now I also created a listener for 4172 besides 443 and that is not the idea. I only want to Load balance 443 (and check it or course) and simply check 4172 on Server A without doing LB.

Second:  in the subSV-list for server "A" there are now two real servers A. One is 443 and the other is 4172.  I can add a totally different realserver "Y" with 4002 but it always sees it as "down" even though that server is reachable from the LM directly and 4002 is open. It still, always sees it as down (red).

I noticed that in general, a real-server in a subSV with a different IP-address than the "master" subSV is always seen as down.

Avatar
0
Barry Gleeson

That works fine. But now I also created a listener for 4172 besides 443 and that is not the idea. I only want to Load balance 443 (and check it or course) and simply check 4172 on Server A without doing LB.

My approach would be to have a top level

VS on Port 443

Two Second level VS's on Port 443 (one for each Security Server)

As this is only on 443 only 443 traffic will be processed

The Second Level VS's will have 4 SubVSs, one for everything needed to be checked. Each SubVs will have just a single Real Server:port.

 

Second:  in the subSV-list for server "A" there are now two real servers A. One is 443 and the other is 4172.  I can add a totally different realserver "Y" with 4002 but it always sees it as "down" even though that server is reachable from the LM directly and 4002 is open. It still, always sees it as down (red).

As mentioned I would decouple these into different SubVS's. why some of these are being marked as down I am not sure. Perhaps a routing issue?

 

 

Avatar
0
steven

"VS on Port 443
Two Second level VS's on Port 443 (one for each Security Server)
As this is only on 443 only 443 traffic will be processed"

This i did. I kicked out port 4172 on the subSV's so i'm left with only one listener-port on 443 (desired).

 

"The Second Level VS's will have 4 SubVSs"
This I don't understand. Are you talking about subSV's inside subSV's ?

 

"Perhaps a routing issue?"
The "other" real servers (X and Y) are all local  (the LM has one NIC in the 172.16.1 network and one in the 192.168.10.0 network). Traceroute and pings confirm reachability.

Avatar
0
steven

Addition:  If i add a server "C" and an open port (same network as A and B) for checking something, it shows as Down. If i add a server Z and an open port (same network as X and Y) it also shows as "down. In all cases, the port is open and reachable. I use a generic port to test a TCP connection so that should always work. I can telnet to such hosts and ports just fine.

I therefore have the impression that for checks, the LM simply doesn't accept any other ip addresses than those defined in SV real servers / subSV real servers (one can enter them without problems, but it does not check them for some reason. The check(s) cannot fail as the ports are open etc.).

Avatar
0
steven

To clarify, i created a separate environment with SSL / HTTPS on port 444 (so i don't mess up the real environment on other, actual servers using port 443) and have two (test/mess about) security servers running on port 444 (https) and 4172 (PCoIP).

I tried to build what I think you are proposing, but as I don't entirely understand what you mean with:
"VS on port 443 (here 444)"
and
"Two Second level VS's on Port 444 (one for each Security Server)"
with
"The Second Level VS's will have 4 SubVS"
so I might be doing it wrong...

Anyway, the two DMZ based security servers that were A and B before, are now 192.168.30.18  and  192.168.30.25  respectively. Those two servers need port 444 load balanced and 4172 "just checked".
Note: I know i'm also creating LB for 4172 this way, but I see no other way to check 4172's health  when I don't do it like that while keeping them linked.

As you can see, I have two subSV's representing both security servers with 2 real-servers entries each (one 444, the other 4172 on same server).  Checking those works fine as you can see (both are green in each subSV). Both are marked as critical and if 4172 goes down, the entire subSV is marked as offline.

In the second subSV, I went a step further and added the "Internal/paired connection broker" which I called "Y" before, to the second subSV that is security server "B".
That paired connection broker "Y" has ip address 192.168.10.44 (in inside LAN) and listens on 4002 and 8009 which i both need to check and both are critical. They are critical in a way that when either 4002 or 8009 failes, the entire stack of security server B and connection broker Y must be treated as failed. They are a chain.
So i added real server Y with 192.168.10.44 twice and according to the GUI this is allowed (red circle).
Still, they turn up as down (red) even though they are perfectly reachable and both 4002 and 8009 are alive.
That is the part that I don't get.
This setup matches the requirement of having these four ports, divided over 2 real servers, checked and if any goes down, both real servers are treated as down and the entire corresponding subSV (here the 2nd) is marked as down.

 

By the way: this is what I miss in your Horizon View 6.2 solution/example guide. Security Servers and their paired connection brokers **HAVE** to be treated as one unit. They live and die together. If the paired connection broker goes down, it's corresponding security-server must be marked as down also. That can only be done when you check both real servers on several ports: 443, 4101 (not yet configured in this test-setup by the way) and 4172 on security servers + 4001 and 8009 on paired connection brokers.
There are more ports than that by the way, but they are spawned by the same processes / PID's behind the mentioned ports so checking these few ports is enough to catch all eventualities/services.

This is how Horizon View 6 Security servers etc. work.
In the guide and in the Template, they are treated as if there were no connection brokers in the back-end to check. But if such a connection broker dies, it's paired security server is dead too as there is nothing to forward authentication-traffic etc. to anymore. No internal, paired broker = no security server.

I also noticed that you LB all protocols, incl. PCoIP which is very unusual and not best practice. The guide, in chapter 1.2, says "The LoadMaster is deployed in-line as a proxy for all services including PCoIP". You do actually mention the **correct** setup (LB only 443) as the alternative deployment option: Again, chapter 1.2, "Alternative deployment options could have PCoIP bypass the LoadMaster as it is only the initial session establishment (HTTPS) that needs to be load balanced." which is actually the preferred deployment option. And this is exactly what's i'm trying to build.



Avatar
0
steven

Hi Barry,

Thanks a million! It now works with the health-check stuff.
If I kill any relevant service anywhere, regardless if on the security server or on it's paired connection broker, the entire chain is marked offline and traffic-forwarding is stopped to that security-server. It works a charm !

By the way, on the RS 192.168.30.24:444, in the certificate collumn, it does not say "on real server" just like everywhere else. The config method of RS .30.24:444 and it's subSV's, is identical to the RS 192.168.30.19:444  (where it does show as expected). I already recreated the entire .30.24:444 thing out of pure desperation but it never appears. Any ideas? GUI Bug maybe?

So the cascaded/chained health-checking works great now.

I do have difficulties with the content rule though. You wrote "You will need a content rule to force all user traffic to SubVS1" and I understand why because indeed, the two Real-servers behind the main SV (192.168.30.19 and .30.24) are not responding (pinging them works, but not 444), but I can't get it to work. I'm not a master of regex :-(

May I ask you one teensy weensy final question:  how to implement that content rule in my setup?

If I get this to work, we won't need F5's and NetScalers anymore as KEMP covers all our requirements. I'm getting the taste of it. Sweet :-)

Avatar
0
Barry Gleeson

Hey Steven,

That's great news that the health check part is working.

By the way, on the RS 192.168.30.24:444, in the certificate collumn, it does not say "on real server" just like everywhere else. The config method of RS .30.24:444 and it's subSV's, is identical to the RS 192.168.30.19:444  (where it does show as expected). I already recreated the entire .30.24:444 thing out of pure desperation but it never appears. Any ideas? GUI Bug maybe?

This is strange. The Column "On Real Server" is seen when you have a HTTPs Service using Port 443 which does not have SSL offload and re-encrypt enabled. The fact that it is port 444 may be why it is not there. Is it possible that .30.19 was initially a Port 443 VS (thus explaining whay the text is present) ?

I do have difficulties with the content rule though. You wrote "You will need a content rule to force all user traffic to SubVS1" and I understand why because indeed, the two Real-servers behind the main SV (192.168.30.19 and .30.24) are not responding (pinging them works, but not 444), but I can't get it to work. I'm not a master of regex :-(

For this Part it is actually easy. On the VS level at the bottom of the menu you will see each SubVS. Under Advanced Properties you need to enable content switching.

Content Switching-Enable

Then under SubVS you will see a column with Rules. The default will be that each SubVs will have "none" in your case you need to only apply a rule to the SubVS that should handle actual traffic. If you click on "none" you can then assign the default rule. All other SubVS's can be left as blank.

 

Let me know if this works !

Also , as mentioned before if you would like a WebEx to assist please raise a ticket and we'll be happy to help.

Barry

Avatar
0
steven

Hi Barry,

Sorry to bother you.

I enabled Content Switching on both VS  (192.168.30.19 and .30.24) and on their respective "first subVS" which is the security server IP and port 444, i clicked on None and select the "default" entry. So now it says "1" instead of none.

But it does not work. I can call the webpage on 444 directly from the actual security server  (.30.18 for example) but not from the VS that sits in front of it (the .30.19).

The default rule is now applied but it's empty. My common sense tells me it should have an action or something. It has nothing to match or to do or anything. It's blank.

If i look at the Main menu -> Rules & Checking -> Content Rules, there is nothing configured at all.

I googled my butt off by now, but i can't put my thinking of "a content rule that simply forwards all traffic that reaches the VS to the first subVS only" into practice.

 

I just realized that I am learning KEMP on a free edition. I can therefore not open support tickets :-(