PureConnect

 View Only


Discussion Thread View
Expand all | Collapse all

Router maintenance causes a switchover event? This is not logical.

  • 1.  Router maintenance causes a switchover event? This is not logical.

    Posted 09-03-2014 12:51
    Yesterday we upgraded IOS on a branch office router. We have a CIC server in our HQ and one in the branch office. So while the reload command is issued to that branch office router, they go down (after hours of course). Here's what I don't understand. We were running on the primary CIC server in our HQ. Why would it attempt to switch over to the DR server at the branch office once we lost connection to them due to the router reload? Once the router booted, everything failed over to the DR site. But for those few minutes everything went unregistered (clients and phones) while they attempted to switch over to a branch that was not online. Here's my concern. Say there was a legitimate outage at that branch office. Could be network, power, environmental, etc... and the HQ CIC server loses connection to the DR CIC server. Why in the world would the system try to fail over to a server that is unreachable! How can we get this fixed? To switch back we had to reboot the primary CIC server. I just do not understand this logic. God forbid we lose that other site since there is no generator over there.


  • 2.  RE: Router maintenance causes a switchover event? This is not logical.

    GENESYS
    Posted 09-03-2014 13:27
    Historically the way that switchover worked is that a switchover is initiated by the backup server because it is unable to contact the primary. Is it possible that while the switchover "started" during the network outage, but that it didn't have any impact on your HQ environment until connectivity to the branch office was restored?


  • 3.  RE: Router maintenance causes a switchover event? This is not logical.

    Posted 09-03-2014 13:35
    As soon as I initiated the reload command at that router, My I3 client said a switchover operation is in progress and it lost connection to our HQ where I physically reside. My I3 client did not reconnect and my Polycom VVX phone had a red triangle on its display with "Line unregistered" until a few minutes when the router at the other site where our CIC DR server resides fully booted. Once that router came up, everything re-registered to the DR server. To get everything back to the HQ, we had to reboot the CIC servers. We are thinking of requesting switchover to be manual process. At the remote site they have rack mount UPS but that may only keep that server up for a half hour at most. If they lose power, I can't have our entire phone system go down because the HQ would try to switch over to a server that doesn't exist.


  • 4.  RE: Router maintenance causes a switchover event? This is not logical.

    GENESYS
    Posted 09-03-2014 13:41
    Adding to what Jason said... When you don't want a network outage to initiate a switchover, you need to set a couple of Server Parameters. From the Interaction Administrator help: Switchover NetTest A Add this parameter, if necessary, for switchover in WAN environments. It applies to ?Pure SIP? configurations only. Switchover NetTest A specifies the name or IP address of a computer on the same network segment as SwitchoverServer B. The Switchover system uses the IP address on SwitchoverServer A when SwitchoverServer A is the backup server. Whenever a failure condition is detected, the Switchover system on the backup server uses ICMP echo to ping this IP endpoint. It must find the IP endpoint on the same network segment as the active server. If the Switchover system cannot ping this endpoint, it assumes that the active server is still operable and doesn?t switch because there a WAN failure. Important: Since the Switchover system no longer has a network connection (and thus cannot replicate changes), it logs an error to the event log and shuts down processing. Restart the backup server, so that the Switchover system can resume its monitoring and replication. Recommendation: The value for Switchover NetTest A is the closest ?pingable? (ICMP echo) IP address to SwitchoverServer B from SwitchoverServer A. Switchover NetTest B Add this parameter, if necessary, for switchover in WAN environments. It applies to ?Pure SIP? configurations only. This parameter specifies the name or IP address of a computer on the same network segment as SwitchoverServer A. The Switchover process on SwitchoverServer B uses the IP address when SwitchoverServer B is the backup server. Recommendation: The value for Switchover NetTest B is the closest ?pingable? (ICMP echo) IP address to SwitchoverServer A from SwitchoverServer B.


  • 5.  RE: Router maintenance causes a switchover event? This is not logical.

    Posted 09-04-2014 13:23
    Ok I'm not sure how those parameters would apply here because though both Switchover NetTest A and Switchover NetTest B options you describe state "WAN environments"... they also state "IP address of a computer on the same network segment". Well in a WAN if you have a HQ office and a DR office, obviously they are not going to be on the same network segment. So in my case I have a HQ at 172.16.1.1 255.255.255.0 and a DR at 172.16.30.1 255.255.255.0. I want the HQ to ALWAYS run the phone system, unless something happens to the HQ. In a network maintenance (switch / router ) or heck even power at the DR site where 172.16.30.1 is inaccessable, I don't think a switchover should happen because that is not the HQ and in my statement above "I want the HQ to ALWAYS run the phone system, unless something happens to the HQ." Now lets say the SAN goes down or the VM server that the CIC HQ is running on dies, well, THEN and only then would I want the DR server to kick into high gear.


  • 6.  RE: Router maintenance causes a switchover event? This is not logical.

    Posted 09-04-2014 13:30
    What you describe does not sound right. First off what version of CIC are you running as maybe things are different in 4.0 as I don't have much experience with 4.0. However in vanilla out of the box 3.0 and everything (I remember) before 3.0 the backup server is monitoring the primary and it makes the decision to switch over, not the primary. In other words when the network connection went down the primary should just stay up doing its thing as happy as can be. The issue you should have run into is that since the backup server at the branch office was not shut down before you did the router maintenance the backup server should have noticed it lost its connection to the primary and initiated a switchover. What I would have expected is when the network connection came back you would have either 2 primaries or the HQ server would have figured out the backup took over and switched. What is strange is that the primary server somehow initiated a switchover independently of the backup. This is how a vanilla switchover works, if you start using the optional switchover server parameters you can get all sorts of different behavior so I would be curious what switchover parameters you are using. Also it is best practice that if you are doing maintenance to the network where the CIC servers will lose connection with one another you should always shut down the backup server to prevent unplanned switchovers. - Mark


  • 7.  RE: Router maintenance causes a switchover event? This is not logical.

    Posted 09-04-2014 14:08
    Thank you Mark. Just reading through your response makes me think "what if" this is what happened... The router at that remote office was rebooted. Therefore the CIC DR server lost connection to the HQ and initiated switchover. When the remote office router came back online, perhaps the CIC HQ server noticed that the DR server claimed "master" and that's when we switched over. So in the future, to prevent this when doing known maintenance, is to remote in and shut down the CIC DR server. Perform the maintenance, then remote in and boot the CIC DR server? I can certainly make note of that. We are on 4.0 SU4ES Switchover parameters: Switchover Ping on Aux Connection 1 Switchover QoS DSCP 34 Switchover TS Failure Retry Delay 59 Switchover TS Timeout 59 Switchover UDP Initial Ping Delay 59 Switchover UDP Monitor No Switchover Use QoS For Ping Yes SwitchoverServerA ININCICHQ (172.16.1.1) SwitchoverServerB ININCICDR (172.16.30.1) I think my concern was what if it was an environmental (hurricane/power/heat/isp) issue that took the remote office offline. Would HQ try to switch over to a non-existent server at that time and we would be completely down? I'm thinking hopefully not, going by your description that the DR side monitors the HQ - not the other way around. One of the challenges with switchover is that out of 160 phones, maybe 20-25 do not automatically reregister to the other server. Many of them are always the same phones, but there are a handful of phones that are different each time. All our phones are Polycom VVX500. The majority of our staff has gotten good at navigating the Polycom phone menus to reboot the phone to make them come back online to make/receive calls. For the majority of phones that handle switchover ok, there seems to be no indication (at least to the end users) when we are running on backup - besides the "ININCICDR" indicator in the lower left corner of the Interaction Client.


  • 8.  RE: Router maintenance causes a switchover event? This is not logical.

    Posted 09-04-2014 14:30
    Without looking at your logs, I would hazard a guess that your sequence of events was actually- Router upgrade initiated, router stops responding. Backup server notes outages, initiates switchover. Router upgrade completes, router passes stored traffic. Primary server receives switchover notification, begins switchover. Router reboots to complete upgrade process, breaking the connection again. Router completes reboot, service is restored. If you look at the switchover logs on the backup server, you will probably find that that server initiated switchover in response to a network outage. Primary server logs should show that the backup server requested switchover.


  • 9.  RE: Router maintenance causes a switchover event? This is not logical.

    Posted 09-04-2014 21:56
    Yup, your right. Checked the logs and you are right on the money. So next time we have planned maintenance, I will power down the DR server (it's a VM on one of the ESX hosts - so no problem).


  • 10.  RE: Router maintenance causes a switchover event? This is not logical.

    Posted 09-05-2014 13:37
    All of your switchover server parameters look correct for a switchover environment with geographically diverse servers.
    Originally posted by kjstech;31481
    I think my concern was what if it was an environmental (hurricane/power/heat/isp) issue that took the remote office offline. Would HQ try to switch over to a non-existent server at that time and we would be completely down? I'm thinking hopefully not, going by your description that the DR side monitors the HQ - not the other way around.
    The switchover trigger comes from the backup server. What this mean is in the event you lost the site where the backup server is the primary server would never be able to switchover since there is no backup server active to trigger the switchover event. I do agree that the logs will tell you for sure what happened. If its been more than 7 days and the logs are gone you can look in the event log as an event is logged when the servers switch over, although its not always the clearest as to why they switched over. As far as your phones go do you have a SIP proxy sitting between the CIC servers and the phones or do the phones register directly to the CIC servers? What make and model of phones are you using? Are you using TCP or UDP for the station connection? Have you compared the configuration between the phone that always handles the switch and the phone that does not? What I have seen is that when you use SIP via UDP when you are on the backup server the phone will always try the primary server and have to wait for the timers to expire before it tries the second server. By default this is something like 15 seconds of silence the agent will hear before they should start hearing ring back. In our 3.0 environment we used UDP for the station connections so we put a SIP proxy between the stations and the server. The SIP proxy keeps track of which server is active so we didn't have the big delay you talk of. There is some tuning you can do to allow switchover to work without a SIP proxy. I believe there is a technical reference document on the support site on how to configure your phones.


  • 11.  RE: Router maintenance causes a switchover event? This is not logical.

    Posted 09-05-2014 15:19
    Originally posted by MarkT;31488
    As far as your phones go do you have a SIP proxy sitting between the CIC servers and the phones or do the phones register directly to the CIC servers? What make and model of phones are you using? Are you using TCP or UDP for the station connection? Have you compared the configuration between the phone that always handles the switch and the phone that does not? What I have seen is that when you use SIP via UDP when you are on the backup server the phone will always try the primary server and have to wait for the timers to expire before it tries the second server. By default this is something like 15 seconds of silence the agent will hear before they should start hearing ring back. In our 3.0 environment we used UDP for the station connections so we put a SIP proxy between the stations and the server. The SIP proxy keeps track of which server is active so we didn't have the big delay you talk of. There is some tuning you can do to allow switchover to work without a SIP proxy. I believe there is a technical reference document on the support site on how to configure your phones.
    The phones are all Polycom VVX 500, SIP Firmware 4.0.3.9381 and the environment is TLS. There are some phones that do not re-register after a switchover. The end user must reboot the phone and then it comes up fine. However the majority of phones re-register just fine. Calls are not generally dropped during a switchover. Just the IC Client. On a phone that cannot re-register automatically, once the phone call is ended, it will go line unregistered. I looked in the log for one of the phones and I did see this line: SSL_get_error Error code=6 and SocketFailCb: for REG call m_nExpire A google search for that lead me to this post: http://community.polycom.com/t5/VoIP/IP550-Loses-Registration-as-soon-as-I-try-to-make-a-call/td-p/21372 The only real difference I could see on a problem phone vs. my phone (which doesn't have the problem) was for the managed IP phones under options > advanced options > Sip Options the Audio Path is "Always In", whereas a few handpicked "problem phones" were set to "Dynamic". We changed a few "Dynamic's" to "Audio In" to see if next time there's a switchover if those phones register or not. So pretty much a problem phone just goes "Line unregistered". It has network connectivity, but it will not reach out to reregister itself until rebooted. I'd love to open a support case with our vendor, but we will have to try to budget for that cost as each support case is very expensive. That will have to make it into next years planned budget. I do see some very rarely used phones "Not Registered" in Interaction Administrator from this week's switchover. And another thing I cannot remotely reboot "Not Registered" phones, so we will have to wait until someone tries to use one of those phones, see's it doesn't work, and if they know how to reboot it they will, and if they don't, they will call IT support from another phone. Oh and I checked our line <Stations-TLS> and it has the following: Transport > TLS, SRTP, Security: End-to-Edge


  • 12.  RE: Router maintenance causes a switchover event? This is not logical.

    Posted 09-05-2014 16:11
    Oh I did post this a few months back about our random sampling of phones failing to register: http://community.inin.com/forums/showthread.php?9661-Random-handful-of-Polycom-VVX-do-not-re-register-after-switchover&highlight=polycom+vvx500 So we could continue that discussion over there, or continue it here. Whatever you prefer.


  • 13.  RE: Router maintenance causes a switchover event? This is not logical.

    Posted 09-08-2014 10:39
    Unfortunately I have yet to play with the VVX phones so I won't be able to be any help. You already did what I would have suggested and that is check the boot and app logs on the phone to see why it goes unregistered. Another thing that I don't play around with is TLS but I would suspect that TLS is to "blame". As far as line always in vs dynamic could play into this. Always in means that the phone will always route its audio through the media server where if you use dynamic if there is no need for the phone to go through the media server (like if the call is not recorded there is no need) then the audio will flow directly from the gateway (or whatever endpoint you have) and the phone bypassing the media server. This use to be a big deal with switchover environments without media servers as the call would drop if it was always in but since 4.0 always had media servers this is no longer an issue as you mentioned the calls don't drop on switchovers. I would suggest setting some of the problematic phones to always in and on next switch see what happens. You can't reboot phones that are not registered because the CIC server has no connection to the phone, this is normal.


  • 14.  RE: Router maintenance causes a switchover event? This is not logical.

    GENESYS
    Posted 09-05-2014 01:00
    Originally posted by kjstech;31474
    Ok I'm not sure how those parameters would apply here because though both Switchover NetTest A and Switchover NetTest B options you describe state "WAN environments"... they also state "IP address of a computer on the same network segment". Well in a WAN if you have a HQ office and a DR office, obviously they are not going to be on the same network segment.
    Sorry - I should have explained what the help means :-) Let's say your HQ server is set up as the A server and the DR is set up as the B server. SwitchoverNetTestA will have an IP Address of an endpoint on the same LAN as the DR server. SwitchoverNetTestB will have an IP address of an endpoint on the same LAN as the HQ server. If the WAN goes down, the DR server senses the outage. Before it initiates the switchover, however, it tries to contact the SwitchoverNetTestB address. Since the WAN is down, it does not initiate the switch, but goes dormant. After the WAN comes back, you should notice that the DR server is not showing as the Backup any more (I set an alert in Supervisor on the field that shows the name of the Backup server). You will need to reboot the DR server to restore full switchover functionality. So, essentially it does the same thing as manually shutting down the DR server before maintenance, but it also ensures that if the WAN goes down unexpectedly, or glitches for a minute or two, you don't have an automatic switchover and end up running on the DR server due to a short WAN outage (which is your stated desire, I think). Does that make it a bit more clear? I hope?


  • 15.  RE: Router maintenance causes a switchover event? This is not logical.

    Posted 09-05-2014 15:10
    Originally posted by GGanahl;31487
    Sorry - I should have explained what the help means :-) SwitchoverNetTestA will have an IP Address of an endpoint on the same LAN as the DR server. SwitchoverNetTestB will have an IP address of an endpoint on the same LAN as the HQ server.
    Ok so what I would do is put maybe the local router's gateway IP here. That way if the router reboots, or the switch in between reboots, it could save a switchover event from occuring when they come back? However I do not see these values. I see SwitchoverServer A - ININCICHQ (Our Headquarters) SwitchoverServer B - ININCICDR (The DR office) I am looking in Interaction Administrator under Server Parameters


Need Help finding something?

Check out the Genesys Knowledge Network - your all-in-one access point for Genesys resources