This particular issue all started late one night during a cutover to move server connections from an old Brocade switch to a new Nexus infrastructure (this cluster was just a fraction of the servers migrating). Initially all connections were migrated to Nexus 2248TPs hanging off of Nexus 5596UPs [FWIW, running NX-OS version 5.1(3)N1(1)], and all servers appeared to be working just as they were. Once the sysadmin starting looking deeper into this particular server cluster, it was found that cluster adjacencies would form, then fail for no apparent reason.
Just based off of that word, I immediately started checking for physical link errors, speed/duplex settings, and logs on the Nexus for any indication of problems. Of course that would be too easy! The links were error free, logs were clean, links negotiated at 1000/full, and to top it off, the interface counters for the servers were incrementing packet counters like they were operating just fine. And they were, somewhat, since the sysadmin had no issue logging into the servers, it's just this application cluster operation that was failing. The cluster had been operating just fine on the Brocade switch - which was L2-only, and essentially a dumb switch.
Me: "Ok Sysadmin, what does the application cluster software need from the network in order to operate?"
Sysadmin: "Well I believe it's multicast."...after further digging in documentation... "Yes it is multicast and it's using multicast group 224.1.1.1."
Me: "Are the adjacencies just never forming, or are some partially up?"
Sysadmin: "It appears some of the servers form a adjacency, but then a few minutes later it drops. It appears to keep cycling through randomly"
Now I had more info on where to further isolate the problem and extra details about the failure occurring. What did the server switchport configs look like?
interface Ethernet141/1/22 switchport access vlan 200 spanning-tree port type edge
Ok, that's a pretty plain-jane server config. Interesting to note that VLAN 200 in this case is a non-routed VLAN, meaning there is no SVI, router, or any other L3 gateway in that VLAN. The Nexus 5596UPs in this instance did not have the L3 module either. No L3 device on the VLAN - that could be a problem - let's investigate IGMP, which is what hosts use to communicate multicast group membership.
N5k-A# sh ip igmp snooping Global IGMP Snooping Information: IGMP Snooping enabled Optimised Multicast Flood (OMF) disabled IGMPv1/v2 Report Suppression enabled IGMPv3 Report Suppression disabled Link Local Groups Suppression enabled VPC Multicast optimization disabled [...other VLAN output omitted...] IGMP Snooping information for vlan 200 IGMP snooping enabled Optimised Multicast Flood (OMF) disabled IGMP querier none Switch-querier disabled IGMPv3 Explicit tracking enabled IGMPv2 Fast leave disabled IGMPv1/v2 Report suppression enabled IGMPv3 Report suppression disabled Link Local Groups suppression enabled Router port detection using PIM Hellos, IGMP Queries Number of router-ports: 1 Number of groups: 0 VLAN vPC function enabled Active ports: Po2 Po999 Eth144/1/1 Eth144/1/42 Eth144/1/43 Eth141/1/18 Eth141/1/20 Eth141/1/22 Eth141/1/24 Eth142/1/39 Po416 Po434 Po436 Po437 Po438 Po444 N5k-A# sh mac address-table int e141/1/22 Legend: * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC age - seconds since last seen,+ - primary entry using vPC Peer-Link VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID ---------+-----------------+--------+---------+------+----+------------------ * 200 0011.1111.a2ec dynamic 0 F F Eth141/1/22
Ok, so all the server ports in VL200 show in the IGMP Snooping table, and Snooping is enabled by default. There's no L3 device to respond to hosts on the VLAN with IGMP Queries (hosts use IGMP Reports to request a multicast group), which is the communication that keeps an intermediary switch with IGMP Snooping 'in-the-know' about the multicast needs on the VLAN. Also to note there's no MAC address learned for the multicast group that server is trying to join.
What I really like about NX-OS is the very detailed logs (debug level) that it stores for just about every process running, at all times.
N5k-A# sh ip igmp snooping event-history vlan vlan Events for IGMP Snoop process 2012 Aug 21 02:07:26.634368 igmp [3344]: [3370]: Noquerier timer expired, remove all the groups in this vlan. 2012 Aug 21 02:07:26.634356 igmp [3344]: [3370]: IGMPv3 proxy report: no records to send 2012 Aug 21 02:04:27.421333 igmp [3344]: [3557]: Forwarding the packet to router-ports 2012 Aug 21 02:04:27.421299 igmp [3344]: [3557]: IGMPv3 proxy report: no records to send 2012 Aug 21 02:04:27.421262 igmp [3344]: [3557]: Updated oif Eth141/1/22 for (*, 224.1.1.1) entry 2012 Aug 21 02:04:27.421242 igmp [3344]: [3557]: Received v3 report: group 224.1.1.1 from 10.11.200.11 on Eth141/1/22 2012 Aug 21 02:04:27.421233 igmp [3344]: [3557]: Record type: "change-to-exclude-mode" for group 224.1.1.1, sources count: 0 2012 Aug 21 02:04:27.421228 igmp [3344]: [3557]: Processing v3 report with 1 group records, packet-size 16 from 10.11.200.11 on Eth141/1/22 2012 Aug 21 02:04:27.421167 igmp [3344]: [3557]: Process a valid IGMP packet type:34 iod:390
In bottom-to-top order of events:
- IGMP packet is received
- packet type = 34 = IGMP Version 3 Membership Report
- Processing packet with 1 multicast group listed
- IGMPv3 Membership Report Message = 'change-to-exclude-mode' for our group 224.1.1.1
- There are 0 multicast sources listed to exclude - meaning any source will do
- IGMPv3 report with 1 multicast group seen from host 10.11.200.11 on Eth141/1/22
- OIF (Outgoing Interface) Eth141/1/22 for (*,224.1.1.1)
- No IGMP Proxy info the N5k has stored
- From Cisco N5k documentation - "The [IGMP] proxy feature builds the group state from membership reports from the downstream hosts and generates membership reports in response to queries from upstream queriers."
- Forward the IGMP packet to 'router-ports', which the only one in this system is the VPC Peer-Link
Exactly 3 minutes later (this was a consistent timer, but I can't find any documentation on why - IGMP group timeout defaults are 260sec, Querier timeout default is 255sec, bug maybe?).
- Still no IGMP proxy records, and since the Nexus never 'saw' an IGMP Query, the 'Noquerier timer expired, remove all groups in this vlan.'
Boom! This correlates to the constant up/down of multicast adjacencies the sysadmin was seeing. We were then also able to watch a particular server that had a successful peer, drop, and matched the timestamps.
Using the debug command will give you slightly more information than the 'event-history' command (note this output is on the peer N5k, receiving the IGMP packet from the Peer-Link Po999):
N5k-B# debug ip igmp snooping vlan 2012 Aug 21 03:31:07.115153 igmp: SNOOP: [vlan 200] Process a valid IGMP packet type:34 iod:15 2012 Aug 21 03:31:07.115195 igmp: SNOOP: [vlan 200] Processing v3 report with 1 group records, packet-size 16 from 10.11.200.11 on Po999 2012 Aug 21 03:31:07.115215 igmp: SNOOP: [vlan 200] Record type: "change-to-exclude-mode" for group 224.1.1.1, sources count: 0 2012 Aug 21 03:31:07.115290 igmp: SNOOP: [vlan 200] Received v3 report: group 224.1.1.1 from 10.11.200.11 on Po999 2012 Aug 21 03:31:07.115318 igmp: SNOOP: [vlan 200] Created ET port Po999 for group 224.1.1.1 2012 Aug 21 03:31:07.115342 igmp: SNOOP: [vlan 200] Created ET host-entry 10.11.200.11 on port Po999 for group 224.1.1.1 2012 Aug 21 03:31:07.115371 igmp: SNOOP: [vlan 200] Created igmpv3 oif Po999 for (*, 224.1.1.1) 2012 Aug 21 03:31:07.115548 igmp: SNOOP: In function igmp_snoop_copy_del_ifindex_list: 2012 Aug 21 03:31:07.115719 igmp: SNOOP: [vlan 200] Updated oif Po999 for (*, 224.1.1.1) entry 2012 Aug 21 03:31:07.115809 igmp: SNOOP:Processing reportfrom_cfs: 1, on_internal_mcec: 0, im_is_iod_valid: 1im_is_if_up: 1 im_id_ifindex_veth = 0, rc = 0, ifindex = Po999 2012 Aug 21 03:31:07.115826 igmp: SNOOP: [vlan 200] IGMPv3 proxy report: no records to send 2012 Aug 21 03:31:07.115854 igmp: SNOOP: [vlan 200] Forwarding the packet to router-ports , came from cfs 2012 Aug 21 03:31:07.115875 igmp: SNOOP: [vlan 200] not sending the CFS packet back to MCT [...] 2012 Aug 21 03:34:10.102439 igmp: SNOOP: [vlan 200] IGMPv3 proxy report: no records to send 2012 Aug 21 03:34:10.212437 igmp: SNOOP: [vlan 200] IGMPv3 proxy report: no records to send 2012 Aug 21 03:34:23.432430 igmp: SNOOP: [vlan 200] Noquerier timer expired, remove all the groups in this vlan.
And even more ways to see this 'flapping' problem in action through the use of the debug command above and regular 'show' commands:
N5k-A# sh ip igmp snooping groups vlan 200 Type: S - Static, D - Dynamic, R - Router port, F - Fabricpath core port Vlan Group Address Ver Type Port list 200 */* - R Po999 200 224.1.1.1 v3 D Po416 2012 Aug 21 03:41:28.765007 igmp: SNOOP: [vlan 200] Noquerier timer expired, remove all the groups in this vlan. N5k-A# sh ip igmp snooping groups vlan 200 Type: S - Static, D - Dynamic, R - Router port, F - Fabricpath core port Vlan Group Address Ver Type Port list 200 */* - R Po999 N5k-A# sh ip igmp snooping mrouter vlan 200 Type: S - Static, D - Dynamic, V - vPC Peer Link, I - Internal, F - Fabricpath core port C - Co-learned, U - User Configured Vlan Router-port Type Uptime Expires 200 Po999 SV 33w5d never
So how do we fix this issue? Well there are a few ways, but in this instance I added Static IGMP Snooping mappings for the 224.1.1.1 multicast group to each server switchport (there were only a handful of ports).
Other methods to fix this would be
- Add a L3 gateway into the VLAN to reply to the IGMP messages so the snooping would work correctly
- Configuring a manual IGMP Snooping Querier (for situations like this where there is no PIM running because the traffic isn't routed)
- Disabling snooping for that VLAN altogether
N5k-A(config)# vlan configuration 200 N5k-A(config-vlan-config)# ip igmp snooping static-group 224.1.1.1 interface po416 Warning: This command should be executed on peer VPC switch [vlan 200] as well. N5k-A(config-vlan-config)# 2012 Aug 21 03:49:17.225983 igmp: SNOOP: [vlan 200] Interface Po416 (mode trunk) check for vlan 200: access 0, native 0, trunk-allowed 1 N5k-A# sh ip igmp snooping groups vlan 200 Type: S - Static, D - Dynamic, R - Router port, F - Fabricpath core port Vlan Group Address Ver Type Port list 200 */* - R Po999 200 224.1.1.1 v3 S Po416 N5k-A(config)# vlan configuration 200 N5k-A(config-vlan-config)# ip igmp snooping static-group 224.1.1.1 interface Ethernet141/1/22 N5k-A(config-vlan-config)# ip igmp snooping static-group 224.1.1.1 interface port-channel438 N5k-A(config-vlan-config)# ip igmp snooping static-group 224.1.1.1 interface port-channel444 N5k-2A# sh ip igmp snooping event-history vlan vlan Events for IGMP Snoop process 2012 Aug 21 04:12:37.019267 igmp [3344]: [3451]: Interface Eth141/1/22 (mode access) check for vlan 200: access 1, native 0, trunk-allowed 1 2012 Aug 21 04:06:31.977383 igmp [3344]: [3451]: Interface Po444 (mode trunk) check for vlan 200: access 0, native 0, trunk-allowed 1 2012 Aug 21 03:59:55.853893 igmp [3344]: [3451]: Interface Po416 (mode trunk) check for vlan 200: access 0, native 0, trunk-allowed 1 2012 Aug 21 03:48:37.843537 igmp [3344]: [3451]: Interface Po438 (mode trunk) check for vlan 200: access 0, native 0, trunk-allowed 1 N5k-2A# sh ip igmp snooping groups Type: S - Static, D - Dynamic, R - Router port, F - Fabricpath core port Vlan Group Address Ver Type Port list 200 */* - R Po999 200 224.1.1.1 v3 S Eth141/1/22 Po416 Po438 Po444 N5k-2A# sh mac address-table int e141/1/22 Legend: * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC age - seconds since last seen,+ - primary entry using vPC Peer-Link VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID ---------+-----------------+--------+---------+------+----+------------------ * 200 0011.1111.a2ec dynamic 10 F F Eth141/1/22 200 0100.5e01.0101 igmp 0 F F Po999 Eth141/1/22 Po416 Po438 Po444So as you can see from the configuration and output above, the static IGMP Snooping mappings were added, the 'sh ip igmp snooping groups' command showed the server ports joined to the proper group, and the MAC address table showing a multicast MAC for the server ports. Once the statics were added, the sysadmin immediately saw the application cluster form all its adjacencies, and remained stable.