Wednesday, January 29, 2014

Troubleshooting Cisco Nexus 5500 IGMP and Non-Routed Multicast

I came across a unique issue a while ago that I thought would make a great blog topic with the Nexus 5500/2248 platforms and a server cluster attempting to sync/peer through the use of IP multicast. Strangely the cluster would constantly drop adjacencies, and was a bit of a mystery. Being an IT consultant that works with customers to design and implement data center infrastructure, most of the time we (the consultants) don't have any background info on lesser known or custom applications. To compound that even further, many times sysadmins are not very network savvy, and do not understand how the application operates from a low-level network perspective.

This particular issue all started late one night during a cutover to move server connections from an old Brocade switch to a new Nexus infrastructure (this cluster was just a fraction of the servers migrating). Initially all connections were migrated to Nexus 2248TPs hanging off of Nexus 5596UPs [FWIW, running NX-OS version 5.1(3)N1(1)], and all servers appeared to be working just as they were. Once the sysadmin starting looking deeper into this particular server cluster, it was found that cluster adjacencies would form, then fail for no apparent reason.

Just based off of that word, I immediately started checking for physical link errors, speed/duplex settings, and logs on the Nexus for any indication of problems. Of course that would be too easy! The links were error free, logs were clean, links negotiated at 1000/full, and to top it off, the interface counters for the servers were incrementing packet counters like they were operating just fine. And they were, somewhat, since the sysadmin had no issue logging into the servers, it's just this application cluster operation that was failing. The cluster had been operating just fine on the Brocade switch - which was L2-only, and essentially a dumb switch.

Me: "Ok Sysadmin, what does the application cluster software need from the network in order to operate?"

Sysadmin: "Well I believe it's multicast."...after further digging in documentation... "Yes it is multicast and it's using multicast group 224.1.1.1."

Me: "Are the adjacencies just never forming, or are some partially up?"

Sysadmin: "It appears some of the servers form a adjacency, but then a few minutes later it drops. It appears to keep cycling through randomly"

Now I had more info on where to further isolate the problem and extra details about the failure occurring. What did the server switchport configs look like?

interface Ethernet141/1/22
  switchport access vlan 200
  spanning-tree port type edge

Ok, that's a pretty plain-jane server config. Interesting to note that VLAN 200 in this case is a non-routed VLAN, meaning there is no SVI, router, or any other L3 gateway in that VLAN. The Nexus 5596UPs in this instance did not have the L3 module either. No L3 device on the VLAN - that could be a problem - let's investigate IGMP, which is what hosts use to communicate multicast group membership.

N5k-A# sh ip igmp snooping 
Global IGMP Snooping Information:
  IGMP Snooping enabled
  Optimised Multicast Flood (OMF) disabled
  IGMPv1/v2 Report Suppression enabled
  IGMPv3 Report Suppression disabled
  Link Local Groups Suppression enabled
  VPC Multicast optimization disabled

[...other VLAN output omitted...]

IGMP Snooping information for vlan 200
  IGMP snooping enabled
  Optimised Multicast Flood (OMF) disabled
  IGMP querier none
  Switch-querier disabled
  IGMPv3 Explicit tracking enabled
  IGMPv2 Fast leave disabled
  IGMPv1/v2 Report suppression enabled
  IGMPv3 Report suppression disabled
  Link Local Groups suppression enabled
  Router port detection using PIM Hellos, IGMP Queries
  Number of router-ports: 1
  Number of groups: 0
  VLAN vPC function enabled
  Active ports:
    Po2 Po999   Eth144/1/1      Eth144/1/42
    Eth144/1/43 Eth141/1/18     Eth141/1/20     Eth141/1/22
    Eth141/1/24 Eth142/1/39     Po416   Po434
    Po436       Po437   Po438   Po444

N5k-A# sh mac address-table int e141/1/22
Legend: 
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link
   VLAN     MAC Address      Type      age     Secure NTFY   Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------------
* 200      0011.1111.a2ec    dynamic   0          F    F  Eth141/1/22

Ok, so all the server ports in VL200 show in the IGMP Snooping table, and Snooping is enabled by default. There's no L3 device to respond to hosts on the VLAN with IGMP Queries (hosts use IGMP Reports to request a multicast group), which is the communication that keeps an intermediary switch with IGMP Snooping 'in-the-know' about the multicast needs on the VLAN. Also to note there's no MAC address learned for the multicast group that server is trying to join.

What I really like about NX-OS is the very detailed logs (debug level) that it stores for just about every process running, at all times.

N5k-A# sh ip igmp snooping event-history vlan

 vlan Events for IGMP Snoop process
2012 Aug 21 02:07:26.634368 igmp [3344]: [3370]: Noquerier timer expired, remove all the groups in this vlan.
2012 Aug 21 02:07:26.634356 igmp [3344]: [3370]: IGMPv3 proxy report: no records to send
 
2012 Aug 21 02:04:27.421333 igmp [3344]: [3557]: Forwarding the packet to router-ports  
2012 Aug 21 02:04:27.421299 igmp [3344]: [3557]: IGMPv3 proxy report: no records to send
2012 Aug 21 02:04:27.421262 igmp [3344]: [3557]: Updated oif Eth141/1/22 for (*, 224.1.1.1) entry
2012 Aug 21 02:04:27.421242 igmp [3344]: [3557]: Received v3 report: group 224.1.1.1 from 10.11.200.11 on Eth141/1/22
2012 Aug 21 02:04:27.421233 igmp [3344]: [3557]: Record type: "change-to-exclude-mode" for group 224.1.1.1, sources count: 0
2012 Aug 21 02:04:27.421228 igmp [3344]: [3557]: Processing v3 report with 1 group records, packet-size 16 from 10.11.200.11 on Eth141/1/22

2012 Aug 21 02:04:27.421167 igmp [3344]: [3557]: Process a valid IGMP packet type:34 iod:390

In bottom-to-top order of events:
  • IGMP packet is received
    • packet type = 34 = IGMP Version 3 Membership Report
  • Processing packet with 1 multicast group listed
  • IGMPv3 Membership Report Message = 'change-to-exclude-mode' for our group 224.1.1.1
    • There are 0 multicast sources listed to exclude - meaning any source will do
  • IGMPv3 report with 1 multicast group seen from host 10.11.200.11 on Eth141/1/22
  • OIF (Outgoing Interface) Eth141/1/22 for (*,224.1.1.1)
  • No IGMP Proxy info the N5k has stored
    • From Cisco N5k documentation - "The [IGMP] proxy feature builds the group state from membership reports from the downstream hosts and generates membership reports in response to queries from upstream queriers."
  • Forward the IGMP packet to 'router-ports', which the only one in this system is the VPC Peer-Link
Exactly 3 minutes later (this was a consistent timer, but I can't find any documentation on why - IGMP group timeout defaults are 260sec, Querier timeout default is 255sec, bug maybe?).
  • Still no IGMP proxy records, and since the Nexus never 'saw' an IGMP Query, the 'Noquerier timer expired, remove all groups in this vlan.'
Boom! This correlates to the constant up/down of multicast adjacencies the sysadmin was seeing. We were then also able to watch a particular server that had a successful peer, drop, and matched the timestamps.

Using the debug command will give you slightly more information than the 'event-history' command (note this output is on the peer N5k, receiving the IGMP packet from the Peer-Link Po999):

N5k-B# debug ip igmp snooping vlan
2012 Aug 21 03:31:07.115153 igmp: SNOOP: [vlan 200] Process a valid IGMP packet type:34 iod:15
2012 Aug 21 03:31:07.115195 igmp: SNOOP: [vlan 200] Processing v3 report with 1 group records, packet-size 16 from 10.11.200.11 on Po999 
2012 Aug 21 03:31:07.115215 igmp: SNOOP: [vlan 200] Record type: "change-to-exclude-mode" for group 224.1.1.1, sources count: 0 
2012 Aug 21 03:31:07.115290 igmp: SNOOP: [vlan 200] Received v3 report: group 224.1.1.1 from 10.11.200.11 on Po999 
2012 Aug 21 03:31:07.115318 igmp: SNOOP: [vlan 200] Created ET port Po999 for group 224.1.1.1 
2012 Aug 21 03:31:07.115342 igmp: SNOOP: [vlan 200] Created ET host-entry 10.11.200.11 on port Po999 for group 224.1.1.1
2012 Aug 21 03:31:07.115371 igmp: SNOOP: [vlan 200] Created igmpv3 oif Po999 for (*, 224.1.1.1)
2012 Aug 21 03:31:07.115548 igmp: SNOOP: In function igmp_snoop_copy_del_ifindex_list: 
2012 Aug 21 03:31:07.115719 igmp: SNOOP: [vlan 200] Updated oif Po999 for (*, 224.1.1.1) entry 
2012 Aug 21 03:31:07.115809 igmp: SNOOP:  Processing reportfrom_cfs: 1, on_internal_mcec: 0, im_is_iod_valid: 1im_is_if_up: 1 im_id_ifindex_veth = 0, rc = 0, ifindex = Po999
2012 Aug 21 03:31:07.115826 igmp: SNOOP: [vlan 200] IGMPv3 proxy report: no records to send 
2012 Aug 21 03:31:07.115854 igmp: SNOOP: [vlan 200] Forwarding the packet to router-ports , came from cfs  
2012 Aug 21 03:31:07.115875 igmp: SNOOP: [vlan 200] not sending the CFS packet back to MCT 
[...]
2012 Aug 21 03:34:10.102439 igmp: SNOOP: [vlan 200] IGMPv3 proxy report: no records to send 
2012 Aug 21 03:34:10.212437 igmp: SNOOP: [vlan 200] IGMPv3 proxy report: no records to send 
2012 Aug 21 03:34:23.432430 igmp: SNOOP: [vlan 200] Noquerier timer expired, remove all the groups in this vlan. 

And even more ways to see this 'flapping' problem in action through the use of the debug command above and regular 'show' commands:

N5k-A# sh ip igmp snooping groups vlan 200
Type: S - Static, D - Dynamic, R - Router port, F - Fabricpath core port

Vlan  Group Address      Ver  Type  Port list
200   */*                -    R     Po999
200   224.1.1.1          v3   D     Po416

2012 Aug 21 03:41:28.765007 igmp: SNOOP: [vlan 200] Noquerier timer expired, remove all the groups in this vlan. 

N5k-A# sh ip igmp snooping groups vlan 200
Type: S - Static, D - Dynamic, R - Router port, F - Fabricpath core port

Vlan  Group Address      Ver  Type  Port list
200   */*                -    R     Po999

N5k-A# sh ip igmp snooping mrouter vlan 200
Type: S - Static, D - Dynamic, V - vPC Peer Link,       I - Internal, F - Fabricpath core port
      C - Co-learned, U - User Configured
Vlan  Router-port   Type      Uptime      Expires
200   Po999         SV        33w5d       never

So how do we fix this issue? Well there are a few ways, but in this instance I added Static IGMP Snooping mappings for the 224.1.1.1 multicast group to each server switchport (there were only a handful of ports).

Other methods to fix this would be
  • Add a L3 gateway into the VLAN to reply to the IGMP messages so the snooping would work correctly
  • Configuring a manual IGMP Snooping Querier (for situations like this where there is no PIM running because the traffic isn't routed)
  • Disabling snooping for that VLAN altogether
I definitely didn't want to disable snooping since we don't want that traffic flooded throughout the VLAN, and due to the existing customer layout of things (and security) I also did not want to create a route-able entry point into that VLAN. Going the route of a manual IGMP Snooping Querier would have been an option, but that would have then required locating an IP address to use late night (IPAM is usually an afterthought for many). Using static mappings for specific interfaces allows the most granular control, but at the cost of a little extra complexity (document!).

N5k-A(config)# vlan configuration 200
N5k-A(config-vlan-config)# ip igmp snooping static-group 224.1.1.1 interface po416
Warning: This command should be executed on peer VPC switch [vlan 200] as well.
N5k-A(config-vlan-config)# 
2012 Aug 21 03:49:17.225983 igmp: SNOOP: [vlan 200] Interface Po416 (mode trunk) check for vlan 200: access 0, native 0, trunk-allowed 1 


N5k-A# sh ip igmp snooping groups vlan 200
Type: S - Static, D - Dynamic, R - Router port, F - Fabricpath core port

Vlan  Group Address      Ver  Type  Port list
200   */*                -    R     Po999
200   224.1.1.1          v3   S     Po416

N5k-A(config)# vlan configuration 200
N5k-A(config-vlan-config)# ip igmp snooping static-group 224.1.1.1 interface Ethernet141/1/22
N5k-A(config-vlan-config)# ip igmp snooping static-group 224.1.1.1 interface port-channel438
N5k-A(config-vlan-config)# ip igmp snooping static-group 224.1.1.1 interface port-channel444


N5k-2A# sh ip igmp snooping event-history vlan 

 vlan Events for IGMP Snoop process
2012 Aug 21 04:12:37.019267 igmp [3344]: [3451]: Interface Eth141/1/22 (mode access) check for vlan 200: access 1, native 0, trunk-allowed 1
2012 Aug 21 04:06:31.977383 igmp [3344]: [3451]: Interface Po444 (mode trunk) check for vlan 200: access 0, native 0, trunk-allowed 1
2012 Aug 21 03:59:55.853893 igmp [3344]: [3451]: Interface Po416 (mode trunk) check for vlan 200: access 0, native 0, trunk-allowed 1
2012 Aug 21 03:48:37.843537 igmp [3344]: [3451]: Interface Po438 (mode trunk) check for vlan 200: access 0, native 0, trunk-allowed 1

  
N5k-2A# sh ip igmp snooping groups 
Type: S - Static, D - Dynamic, R - Router port, F - Fabricpath core port

Vlan  Group Address      Ver  Type  Port list
200   */*                -    R     Po999
200   224.1.1.1          v3   S     Eth141/1/22 Po416 Po438 Po444
          
          
N5k-2A# sh mac address-table int e141/1/22
Legend: 
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link
   VLAN     MAC Address      Type      age     Secure NTFY   Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------------
* 200      0011.1111.a2ec    dynamic   10         F    F  Eth141/1/22
  200      0100.5e01.0101    igmp      0          F    F  Po999 
                                                          Eth141/1/22 Po416 Po438 
                                                          Po444           
So as you can see from the configuration and output above, the static IGMP Snooping mappings were added, the 'sh ip igmp snooping groups' command showed the server ports joined to the proper group, and the MAC address table showing a multicast MAC for the server ports. Once the statics were added, the sysadmin immediately saw the application cluster form all its adjacencies, and remained stable.