Monday, November 18, 2013

Cisco ACI - SDN in the DC, Cisco's Way

As you may have read by now, Cisco has announced their first big 'SDN' (Software Defined Networking) solution named ACI (Application Centric Infrastructure) that tightly pairs with the Nexus 9000 line (announced along with ACI). However, with most product announcements that are released far in advance to the actual product release, the technical details are very few and far in-between. I recently had the opportunity to attend a conference where I attended an ACI and Nexus 9000 breakout session discussion with presenter Joe Onisick (www.definethecloud.net), a Cisco TME for ACI/N9k.

From the discussions that followed, these were the interesting points and thoughts that stuck out to me about ACI and the N9k:

  • As Cisco has already stated, the N9k will be shipping soon, but they won't be able to run in ACI-mode until 2HCY14. The upgrade from standalone-mode (standard NX-OS) to ACI-mode will be a major upgrade, as the whole underlying OS/firmware is completely different. No ISSU upgrade.
  • The N9k and ACI is currently a Data Center only solution, in a CLOS fabric design (Spines and Leafs) with the APIC controller (Application Policy Infrastructure Controller). It was not designed to replace Core, WAN-edge, or Campus network environments - it will likely expand to these other environments after the technology gains momentum in the DC space. The whole concept of SDN is still very early in it's infancy - at least for everyone who isn't Google.
  • The N9k will be priced very competitively - partly due to the use of merchant silicon and mid-plane elimination - but I would say more importantly due to the DC-focused scope of software functionality. Technologies like OTV, LISP, etc will still require a N7k or ASR. Design guides will become available with how to integrate the ACI DC infrastructure with other areas of the network. Since it's using VXLAN as an overlay - there will certainly be a VXLAN-gateway functionality to have that integration.
  • 40G BiDi optics - man these are great (also announced along w/ ACI and the N9k)! 40GE over a single pair of OM3 MMF (good for 100m) using essentially CWDM, but only 2 waves (20G each). And they are able to manufacture and sell them very cost effectively. This could be a major Cisco differentiator when 40G becomes more of the norm. Businesses already have a lot of sunk cost into their fiber cable plants - would they rather replace/addon to accommodate the 12-strand MTP fiber cables for MMF 40G or use their existing 10G fiber plant? A 'no-brainer' decision. Some great info on them here: http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps13386/white-paper-c11-729493_ns1261_Networking_Solutions_White_Paper.html
  • How will the APIC controller look/feel/operate? It's still somewhat of a mystery, but I expect it to be very similar to the successful UCS Manager (configuring network/application policies with various metrics/SLAs). After all, the people at Insieme were also the people who created UCS and the Nexus products.
  • NSH - Network Service Header - a Cisco vPath-like technology that has been submitted to the IETF as a draft (http://tools.ietf.org/html/draft-quinn-nsh-00). See who co-authored it? Cisco and a certain company Cisco announced they were acquiring at the launch of ACI (Insieme). This appears to be one of the major underlying technologies that the APIC (the controller) will use to chain network services (firewalls, load balancers, etc). vPath is a really cool technology that the Cisco Nexus 1000v uses to communicate with VMware ESX and virtual network appliances (VSG, vASA, etc) for logically 'chaining' network services. That makes the Cisco AVS (Application Virtual Switch, also announced along w/ ACI) seem to fit quite nicely in the mix, as it's essentially a Nexus 1000v that communicates with the ACI infrastructure. With NSH having a fixed header, it makes it easily implemented into hardware - essentially doing the same function of the N1000v and vPath, but with the ability to have hardware ASICs participate in the service chaining.
Starting to see the potential of ACI now? There are still lots of technical details that are missing, and for that matter the actual product. It'll be very interesting to see how the market reacts to ACI and VMware's NSX. VMware has already released NSX, but will customers adopt it? Will NSX be production-ready by the time ACI/APIC are released; will customers see the need for tighter integration with network and other hardware (VMware has stated that they are working with networking vendors for interop, but how well will that turn out)? All questions that come to mind in terms of the race to see who wins SDN in the DC. The next couple years in the networking field are going to be really interesting.


Saturday, October 5, 2013

Cisco Catalyst 4500-X EtherChannel Auto-QoS

Typically in any size routing and switching infrastructure environment that real-time and business critical applications rely on, QoS is an absolute must. One of the main advantages of buying Cisco equipment is the extensive services that IOS can provide - one of which is granular QoS control.

As real-time services are increasingly converging to the IP network, namely voice and video, QoS is becoming even more important to ensure a quality end-user experience. Gigabit and multi-Gigabit (EtherChannel bundle) uplinks are becoming more saturated as users and the businesses increase the need for data-intensive network connectivity. Queue the requirement for QoS!

Unfortunately, between the different Business Units within Cisco that are responsible for the various Catalyst Switch products, there is a decent amount of feature and hardware disparity between them (Catalyst 2960, 3560/3750, 4500, 6500 - and the Nexus lines). QoS configurations between the different products can be very different, which makes understanding QoS in switched environments very cumbersome and easy to forget. This is especially true since there can often be an abundance of bandwidth available in a LAN, and easy for a network engineer/admin to discount the need for QoS.

Network Management tools may suggest to an engineer that a link is not congested, however, these tools rely on SNMP to poll the device for interface stats, taking the delta of the counters to show a rate. Often these tools cannot poll any faster than 30 second intervals, and usually are set to 1-5 minute intervals. This is really an average and doesn't account for spikes in utilization or microbursts. Once these spikes and microbursts of data become too large for the device buffers, packets drop. In the case of real-time traffic, even buffering of the data will cause a degradation of service because buffering this traffic causes delay and jitter.

Fortunately, Cisco has a great design guide for QoS called the "Medianet Campus QoS Design 4.0", also known as the QoS Solution Reference Network Design (SRND) 4.0 guide. Here are the web and PDF links to that document:

Most often the amount of granular control that is explained in the SRND 4 guide for campus switches is not needed because of the simple feature known to Cisco switches as Auto-QoS. Auto-QoS on switches is essentially a macro that has all of the recommended configurations from the QoS SRND 4 guide. With the Auto-QoS feature, it simplifies QoS for switches to just a few commands, and covers probably close to 95% of any QoS needs an enterprise might need. Since it's just a macro, the actual configurations created are easily modifiable for any specific or custom requirements.

There can be a few hiccups when using Auto-QoS, and one of them that I've run into on numerous occasions happens when trying to apply the Auto-QoS generated policies to an EtherChannel interface or physical interfaces linked to it.

The information in this article is in reference to Auto-QoS VoIP for the Cat4500-X.

To enable Auto-QoS on an interface use the following command:
4500X#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
4500X(config)#int te1/1
4500X(config-if)#auto qos voip trust

Once the above command is applied, IOS automatically generates the following QoS policy (based on SRND 4.0):
ip access-list extended AutoQos-4.0-ACL-Bulk-Data
 permit tcp any any eq ftp
 permit tcp any any eq ftp-data
 permit tcp any any eq 22
 permit tcp any any eq smtp
 permit tcp any any eq 465
 permit tcp any any eq 143
 permit tcp any any eq 993
 permit tcp any any eq pop3
 permit tcp any any eq 995
 permit tcp any any eq 1914
ip access-list extended AutoQos-4.0-ACL-Default
 permit ip any any
ip access-list extended AutoQos-4.0-ACL-Multimedia-Conf
 permit udp any any range 16384 32767
ip access-list extended AutoQos-4.0-ACL-Scavenger
 permit tcp any any eq 1214
 permit udp any any eq 1214
 permit tcp any any range 2300 2400
 permit udp any any range 2300 2400
 permit tcp any any eq 3689
 permit udp any any eq 3689
 permit tcp any any range 6881 6999
 permit tcp any any eq 11999
 permit tcp any any range 28800 29100
ip access-list extended AutoQos-4.0-ACL-Signaling
 permit tcp any any range 2000 2002
 permit tcp any any range 5060 5061
 permit udp any any range 5060 5061
ip access-list extended AutoQos-4.0-ACL-Transactional-Data
 permit tcp any any eq 443
 permit tcp any any eq 1521
 permit udp any any eq 1521
 permit tcp any any eq 1526
 permit udp any any eq 1526
 permit tcp any any eq 1575
 permit udp any any eq 1575
 permit tcp any any eq 1630
 permit udp any any eq 1630

class-map match-all AutoQos-4.0-Scavenger-Classify
  match access-group name AutoQos-4.0-ACL-Scavenger
class-map match-all AutoQos-4.0-Signaling-Classify
  match access-group name AutoQos-4.0-ACL-Signaling
class-map match-any AutoQos-4.0-Priority-Queue
  match cos  5 
  match  dscp ef 
  match  dscp cs5 
  match  dscp cs4 
class-map match-all AutoQos-4.0-VoIP-Data-Cos
  match cos  5 
class-map match-any AutoQos-4.0-Multimedia-Stream-Queue
  match  dscp af31 
  match  dscp af32 
  match  dscp af33 
class-map match-all AutoQos-4.0-Network-Mgmt
  match  dscp cs2 
class-map match-all AutoQos-4.0-VoIP-Signal-Cos
  match cos  3 
class-map match-any AutoQos-4.0-Multimedia-Conf-Queue
  match cos  4 
  match  dscp af41 
  match  dscp af42 
  match  dscp af43 
  match access-group name AutoQos-4.0-ACL-Multimedia-Conf
class-map match-any AutoQos-4.0-Transaction-Data
  match  dscp af21 
  match  dscp af22 
  match  dscp af23 
class-map match-all AutoQos-4.0-Network-Ctrl
  match  dscp cs7 
class-map match-all AutoQos-4.0-Scavenger
  match  dscp cs1 
class-map match-all AutoQos-4.0-Default-Classify
  match access-group name AutoQos-4.0-ACL-Default
class-map match-any AutoQos-4.0-Signaling
  match  dscp cs3 
  match cos  3 
class-map match-any AutoQos-4.0-Bulk-Data-Queue
  match cos  1 
  match  dscp af11 
  match  dscp af12 
  match  dscp af13 
  match access-group name AutoQos-4.0-ACL-Bulk-Data
class-map match-all AutoQos-4.0-Transaction-Classify
  match access-group name AutoQos-4.0-ACL-Transactional-Data
class-map match-all AutoQos-4.0-Broadcast-Vid
  match  dscp cs5 
class-map match-any AutoQos-4.0-Bulk-Data
  match  dscp af11 
  match  dscp af12 
  match  dscp af13 
class-map match-any AutoQos-4.0-Scavenger-Queue
  match  dscp cs1 
  match cos  1 
  match access-group name AutoQos-4.0-ACL-Scavenger
class-map match-any AutoQos-4.0-VoIP
  match  dscp ef 
  match cos  5 
class-map match-any AutoQos-4.0-Multimedia-Conf
  match  dscp af41 
  match  dscp af42 
  match  dscp af43 
class-map match-any AutoQos-4.0-Control-Mgmt-Queue
  match cos  3 
  match  dscp cs7 
  match  dscp cs6 
  match  dscp cs3 
  match  dscp cs2 
  match access-group name AutoQos-4.0-ACL-Signaling
class-map match-all AutoQos-4.0-Bulk-Data-Classify
  match access-group name AutoQos-4.0-ACL-Bulk-Data
class-map match-any AutoQos-4.0-Trans-Data-Queue
  match cos  2 
  match  dscp af21 
  match  dscp af22 
  match  dscp af23 
  match access-group name AutoQos-4.0-ACL-Transactional-Data
class-map match-any AutoQos-4.0-Multimedia-Stream
  match  dscp af31 
  match  dscp af32 
  match  dscp af33 
class-map match-any AutoQos-4.0-VoIP-Data
  match  dscp ef 
  match cos  5 
class-map match-all AutoQos-4.0-Internetwork-Ctrl
  match  dscp cs6 
class-map match-all AutoQos-4.0-Realtime-Interact
  match  dscp cs4 
class-map match-all AutoQos-4.0-Multimedia-Conf-Classify
  match access-group name AutoQos-4.0-ACL-Multimedia-Conf
class-map match-any AutoQos-4.0-VoIP-Signal
  match  dscp cs3 
  match cos  3 
!
policy-map AutoQos-4.0-Input-Policy
 class AutoQos-4.0-VoIP
 class AutoQos-4.0-Broadcast-Vid
 class AutoQos-4.0-Realtime-Interact
 class AutoQos-4.0-Network-Ctrl
 class AutoQos-4.0-Internetwork-Ctrl
 class AutoQos-4.0-Signaling
 class AutoQos-4.0-Network-Mgmt
 class AutoQos-4.0-Multimedia-Conf
 class AutoQos-4.0-Multimedia-Stream
 class AutoQos-4.0-Transaction-Data
 class AutoQos-4.0-Bulk-Data
 class AutoQos-4.0-Scavenger
policy-map AutoQos-4.0-Output-Policy
 class AutoQos-4.0-Scavenger-Queue
    bandwidth remaining percent 1
 class AutoQos-4.0-Priority-Queue
    priority
    police cir percent 30 bc 33 ms
 class AutoQos-4.0-Control-Mgmt-Queue
    bandwidth remaining percent 10
 class AutoQos-4.0-Multimedia-Conf-Queue
    bandwidth remaining percent 10
 class AutoQos-4.0-Multimedia-Stream-Queue
    bandwidth remaining percent 10
 class AutoQos-4.0-Trans-Data-Queue
    bandwidth remaining percent 10
    dbl
 class AutoQos-4.0-Bulk-Data-Queue
    bandwidth remaining percent 4
    dbl
 class class-default
    bandwidth remaining percent 25
    dbl

The actual interface configuration now shows as:
interface TenGigabitEthernet1/1
 auto qos trust 
 service-policy input AutoQos-4.0-Input-Policy
 service-policy output AutoQos-4.0-Output-Policy

Now let's look at the problems that arise when attempting to configure Auto-QoS for an EtherChannel. The following are the configurations for the 2 physical interfaces with Auto-QoS already applied that we want to bundle into an EtherChannel:
interface TenGigabitEthernet1/15
 switchport mode trunk
 auto qos trust 
 service-policy input AutoQos-4.0-Input-Policy
 service-policy output AutoQos-4.0-Output-Policy
!
interface TenGigabitEthernet1/16
 switchport mode trunk
 auto qos trust 
 service-policy input AutoQos-4.0-Input-Policy
 service-policy output AutoQos-4.0-Output-Policy

This is what happens when attempting to configure the ports into the EtherChannel:
4500X(config)#int range te1/15-16
4500X(config-if-range)#channel-group 1 mode active 
% The attached policymap is not suitable for member either due to non-queuing actions or due to type of classmap filters.

TenGigabitEthernet1/15 is not added to port channel 1
% Range command terminated because it failed on TenGigabitEthernet1/15
4500X(config-if-range)#

The error that IOS gave us says that either the 'non-queuing actions' or the type of class-map filters that are in the QoS policy-map configuration applied to the interface. And the EtherChannel configuration was not applied because of this error:
4500X(config-if-range)#do sh run | b 1/15
interface TenGigabitEthernet1/15
 switchport mode trunk
 auto qos trust 
 service-policy input AutoQos-4.0-Input-Policy
 service-policy output AutoQos-4.0-Output-Policy
!
interface TenGigabitEthernet1/16
 switchport mode trunk
 auto qos trust 
 service-policy input AutoQos-4.0-Input-Policy
 service-policy output AutoQos-4.0-Output-Policy

Ok, so what if we just apply the 'auto qos trust voip' command to the Port-Channel interface itself? Maybe that will work (nope):
4500X(config)#int po1
4500X(config-if)#auto ?            
% Unrecognized command

Damn, that would have been nice Cisco (hint hint). Ok, what happens if we remove the policy-map configurations from the physical interfaces, add them to the EtherChannel, and then reapply the policy-maps to the physical interfaces?
4500X(config-if)# int range te1/15-16
4500X(config-if-range)#auto?
% Unrecognized command
4500X(config-if-range)#a?  
aaa  access-expression  access-group  arp

Still no dice. What if we try to apply the policy-maps to the Port-Channel interface?
4500X(config-if-range)#int po1
4500X(config-if)# service-policy input AutoQos-4.0-Input-Policy
4500X(config-if)# service-policy output AutoQos-4.0-Output-Policy
% A service-policy with queuing actions can be attached in output direction only on physical ports.

4500X(config-if)#do sh run int po1
interface Port-channel1
 switchport
 switchport mode trunk
 service-policy input AutoQos-4.0-Input-Policy
end

Ok, so at least something was configured that time. We also received a different error this time, suggesting that the output policy-map has queuing actions, which is only supported on physical ports. Let's try applying only the output policy-map to the physical interfaces, and leave the input policy on the Port-Channel interface:
4500X(config-if)#int range te1/15-16
4500X(config-if-range)#service-policy output AutoQos-4.0-Output-Policy
% A service-policy with more than one  type of marking field based filters in the  class-map is not allowed on the channel member ports. 

Yet another error. Fun. Now there appears to be an issue with the ACLs that are applied to the class-map.

So to not drag this troubleshooting scenario out any further, these are the limitations for EtherChannel QoS that you need to work around from the Auto-QoS generated policy:

  • Output policing needs to be configured on the Port-Channel interface, and separated from any queuing.
  • Output queuing needs to be configured on the physical interfaces.
  • The class-maps for the queuing policy-map can only have one type of match statement (i.e. an ACL, or matching on QoS tags) per class-map.
  • The policing policy-map cannot use the ‘policing percent’ command.

The following functional QoS policy I adapted for use on EtherChannels I tried to closely match the SRND 4.0 configs (mostly comes from SRND configs):
class-map match-any MULTIMEDIA-STREAMING-QUEUE
  match  dscp af31  af32  af33 
class-map match-any CONTROL-MGMT-QUEUE
  match  dscp cs7 
  match  dscp cs6 
  match  dscp cs3 
  match  dscp cs2
class-map match-any TRANSACTIONAL-DATA-QUEUE
  match  dscp af21  af22  af23 
class-map match-any SCAVENGER-QUEUE
  match  dscp cs1 
class-map match-any MULTIMEDIA-CONFERENCING-QUEUE
  match  dscp af41  af42  af43 
class-map match-any BULK-DATA-QUEUE
  match  dscp af11  af12  af13 
class-map match-any PRIORITY-QUEUE
  match  dscp ef 
  match  dscp cs5 
  match  dscp cs4 

! The police percentage for the default Auto-QoS Output policy is set to 30%, however, in this scenario with the 4500-X and 10Gig interfaces, there isn't a need for 7Gig allocated for voice and video traffic. The example below of 2Gig is 10% of the EtherChannel aggregate bandwidth (20G). Adjust accordingly for your needs.

policy-map OUTPUT-PRIORITY-POLICING-EC
 class PRIORITY-QUEUE
    police cir 2000000000

policy-map OUTPUT-QUEUING-NOPOLICING-EC
 class PRIORITY-QUEUE
    priority
 class CONTROL-MGMT-QUEUE
    bandwidth remaining percent 10
 class MULTIMEDIA-CONFERENCING-QUEUE
    bandwidth remaining percent 10
 class MULTIMEDIA-STREAMING-QUEUE
    bandwidth remaining percent 10
 class TRANSACTIONAL-DATA-QUEUE
    bandwidth remaining percent 10
    dbl
 class BULK-DATA-QUEUE
    bandwidth remaining percent 4
    dbl
 class SCAVENGER-QUEUE
    bandwidth remaining percent 1
 class class-default
    bandwidth remaining percent 25
    dbl

interface Port-channel1
 switchport
 switchport mode trunk
 service-policy input AutoQos-4.0-Input-Policy
 service-policy output OUTPUT-PRIORITY-POLICING-EC

interface TenGigabitEthernet1/15
 switchport mode trunk
 channel-group 1 mode active
 service-policy output OUTPUT-QUEUING-NOPOLICING-EC
!
interface TenGigabitEthernet1/16
 switchport mode trunk
 channel-group 1 mode active
 service-policy output OUTPUT-QUEUING-NOPOLICING-EC

CCIE RS Written Exam - Passed (Again)

Since the last post of reviving this blog, I have been studying for the CCIE RS Written exam again to recert my existing Professional-level certifications, as well as being qualified to attempt the lab again. Good news is I passed the written exam a week ago! Some renewed interest again in going for the IE again, so we'll see how the studies go in the next few months.

In other news I have been exploring other options for the blog host to get better design and layout options. Wix.com seemed like a great alternative to Blogger, but their 'blog' widget is still a work in progress. Hopefully soon!

Thursday, June 27, 2013

Blog Update

So it's hard to believe, but it's been 2 years since I've last posted on here. Lots of things have happened since then - professional and personal.

I did end up taking the CCIE RS Lab in November of 2011, but unfortunately did not pass. The Troubleshooting section was very difficult, even coming from someone who considers themselves skilled in the art-of-tshoot, but the Config section was very reasonable. I've been pecking at re-studying to retake the lab, but I've been lead engineer in a couple large data center upgrades that have consumed a LOT of my time.

I almost daily weigh the benefits of committing several months+ to studying for this cert, versus using that time learning other stuff (security/wireless/DC, programming, Linux). Now with the prospects looking very high for SDN, it's starting to make me see the decline in the value of the CCIE (for me personally).

Back in May of 2013, I purchased the lah.io domain a few hours after Google annouced they were promoting the status of the .io TLD for search (to the same level as .com, etc), and was able to pickup this 3-character domain out of the last hundred or two still available (lucked out!).

A bit of a renewed interest in blogging again, however, posts will be focused more on day-to-day technology and interesting bits learned from my consulting job (for a major Cisco partner).

-Mark