I was doing a clear out of some paper work and came across these write performance tests which I did a few years back on an HP DL380 G5 with a p400 RAID controller.

  • 4 drive RAID10 = 144MB/s (~72MB/s per drive)
  • 4 drive RAID0 = 293MB/s (~73MB/s per drive)
  • 8 drive RAID0 = 518MB/s (~64MB/s per drive)
  • 8 drive RAID5 = 266MB/s (~38MB/s per drive)
  • 8 drive RAID6 = 165MB/s (~28MB/s per drive)
  • 8 drive RAID10 = 289MB/s (~72MB/s per drive)

The “per drive” is data being written per active data drive, excluding any RAID overheads.

I inferred from the above that the p400 RAID controller maxes out at around 518MB/s as it is unable to saturate 8 drives in a RAID0 array (64MB/s vs 72MB/s per drive). Not sure if this is a controller throughput or PCIe bus limitation.

Eitherway, this testing was (I think) done on Ubuntu 10.x or 12.x with pretty bog standard settings using a simple command such as:

dd if=/dev/zero of=/dev/cciss/c0d0 bs=1024k

I thought I’d capture these figures here since I have no where else to save them.

This is just a short post about an annoying issue I encountered today while updating my automated Ubuntu installer with Ubuntu 14.04 (Trusty Tahr). I have a PXE based network boot process to automatically install and configure Ubuntu server instances. The server to be installed will PXE boot and then get the installation files and preseed configuration file via HTTP. This worked well for prior LTS releases.  I don’t bother with automating non-LTS releases as their lifespan is far too short for “production” use.

I updated the install server with the new 14.04 server images (AMD64 and i386). I then updated my standard preseed configuration file with the new paths to 14.04 and set off a server installation. Unfortunately, not long into the installation process an error message was displayed. “Install the system” was the title of message and the error was “Installation step failed” “An installation step failed. You can try to run the failing item again from the menu, or skip it and choose something else. The failing step is: Install the system”.

Error during a netboot install of Ubuntu 14.04
Error during a netboot install of Ubuntu 14.04

Not terribly useful, if I say so myself. Looking at the installation log on VTY-4 (accessed via ALT-F4 on the console), I saw messages about “main menu: INFO: Menu item ‘live-installer’ selected'” followed by “base-installer: error: Could not find any live images”. Again, not very useful.

To cut a long story short, after much time using Google, I found the solution. The way base Ubuntu is installed seems to have changed with Ubuntu 12.10 (Quantal Quetzal). It seems that rather than installing individual packages initially, a base preconfigured file-system is deployed. This is now contained in a file called “filesystem.squashfs” which is located at “/install/filesystem.squashfs” on the installation media. It seems that when installing via the network (in some situations), you need to configure the preseed file to use this “default” filesystem from the network. This is done in your preseed file by adding the “d-i live-installer/net-image” option, such as in the following line:

d-i live-installer/net-image string http://10.1.1.2/trusty-server-amd64/install/filesystem.squashfs

where 10.1.1.2 is your network installation server and /trusty-server-amd64 is the location of the installation media on the network installation webserver.

Once that is in place, you’re good to go! As I said before, this is only necessary since Ubuntu 12.10. As a result, all of those upgrading our installations from 12.04 LTS to 14.04 LTS may need to be aware of this. There is surprisingly little reference to this on the Internet. Do not many people install over the network in isolated install networks?

 

I thought I’d write a quick post on using Linux and Quagga (zebra, ospfd and bgpd) in place of a Cisco router. Given how expensive data-centre space and power is, I thought I would evaluate using a Linux server in place of a Cisco router. A benefit of this is that it is possible to use a Linux VM to do this, if one so wishes. Now – how viable or wise this would be depends on one’s circumstances so I don’t expect everyone to agree – this raises similar debate points as the debate around the wisdom of virtualising a VMware Virtual Center server instance. Any given deployment scenario is different and the various requirements and their associated costs, benefits and risks need to be evaluated.

Onward. Technically – yes a Linux server coupled with Quagga can do a similar routing function to a Cisco router. I did a functional test in a lab where I swapped out a border BGP/OSPF speaking router with a Linux VM and Quagga. I did it in steps – first added the Linux box into the existing OSPF area 0 and checked routes were sent and received. I then added the BGP (iBGP initially) component and checked for route propagation – which worked. All was looking good. BGP and OSPF routes were being exchanged and routing (routes advertised out OSPF) was happening to a newly created test subnet  behind the Linux router.

So I did some configuration work in preparation for the eBGP peering. Added the peering /30 subnet to a VMNIC (did not up the interface in the VM nor on the host – nothing like double protection), added said subnet to the OSPF configuration (so next hop address would be available) to be propagated when the interface is up. I also added the eBGP peer’s configuration to the bgpd instance with an included “shutdown” statement.

At this point, I figured all that would be needed to switch over would be to

  • shutdown the eBGP peer on the Cisco router
  • shutdown the peering interface on the Cisco router
  • up the peering interface on the Linux router
  • no “shutdown” the eBGP peer on the Linux router

I had double checked the iBGP and eBGP configuration, including route-maps, prefix lists, etc on the Linux router so I was pretty confident of success. At this point I nearly progressed but then remembered the ACLs on the eBGP peering interface. Right, I ought to migrate them too.

Clickety click, a copy and paste and an import into fwbuilder. Hmm – some slight anomalies with the import from an IOS access-list into iptables equivalent rules. I ended up needing to go line-by-line through the ACL to ensure that the generated fwbuilder line was correct and efficient. Some tweaking of inbound/outbound/both parameters and addition of the generic “any/any both on loopback” and we looked to be in business.

Next trick is to make the ruleset stateless since we have asymmetric routing due to BGP peering… no simple in-out single default route here. So I went through and marked all the rules as “stateless”. I also ensured that the option to enable ESTABLISHED and RELATED packets was ticked in the fwbuilder gui. I pushed the ruleset to the Linux box and promptly broke the OSPF routing. No real harm – update the policy and republish the policy. EEK – with the OSPF loopback address not being announced I could not easily change the ruleset. Two options – disable the firewall rules, let OSPF catch up and then change the ruleset or point at a different IP address. I chose the former 🙂 So clickety click added OSPF/IGMP to the ruleset and pushed the rules. OSPF looks good an routing to the test subnet works too. Success.

Righto, check the firewall logs and nothing unexpected is being dropped. So I decide to press forward, following the four switch over steps above. I see the route withdrawal and subsequent announcement. Looking good. The lab has asymmetric BGP routing (two eBGP peers with different routes out each but a single route in… due to a real life transit, partial and private peering configuration setup the lab was configured to mimic) and this is what caused the next two problems which needed to be resolved.

Firstly, the routing all looked to be correct yet there was partially working connectivity. By this I mean that some hosts on the “inside” could communicate with some hosts on the outside. At first I thought I must have muddled some firewall rules, so I turned off the firewall but the problems persisted. Then I remembered the reverse path filtering on Linux…

To see the filters’ state you can use:

sysctl -a |grep -F .rp_filter

So a quick

for i in /proc/sys/net/ipv4/conf/*/rp_filter
do
    echo 0 > $i
done

got things working, but with IPTABLES still disabled. Traceroute traces looked correct and some telnet initiated connections to various ports were working as expected. I enabled the firewall rules and once more some connections broke. Hmm…

Connections going in and out of the Linux router worked, but connections being routed asymmetrically were having issues. I knew I’d enabled “ESTABLISHED” connections but this did not seem to be working. A quick investigation reveals that the definition of ESTABLISHED in Cisco IOS access-list terms is different to that in IPTABLES rule terms. In Cisco-land this means any packets with the ACK or RST flags set but in IPTABLES-land it means packets belonging to connections where both a SYN and SYN-ACK packet have been seen (i.e. IPTABLES is a state-full ESTABLISHED rather than Cisco’s stateless ESTABLISHED). I had thought I’d covered this by making the rules stateless – but apparently not.

Anyway, I created two TCP “services” – one with the ACK flag enabled and one with the RST flag enabled – and added these to the ruleset. What do you know, this mimics the Cisco “ESTABLISHED” and allowed traffic to flow as expected. I performed some more testing and announced/withdrew some routes and all seemed OK.

I then switched out the Linux box and put the Cisco back in its original place in the network.

I didn’t test the combination of OSPF and keepalived with redundant Linux routers (which would be needed for a Cisco HSRP equivalent) in this instance. I’ve used keepalived previously and it works well. I did not do any performance testing either, but expect it would certainly be sufficient to replace some of the smaller Cisco products 8xx, 18xx, 19xx, 28xx, 29xx and possibly 38xx/39xx or higher depending on the Linux router’s configuration.

Care needs to be taken when using asymmetric routing with your Linux router. rp_filter and IPTABLES both don’t cope well and rp_filter is easier to fix. IPTABLES rules will need to be carefully tuned when used with asymmetric routing. If using Linux routers, my recommendation would be to only do INPUT/OUTPUT filtering on border Linux routers and then bring traffic into a “core” firewall failover cluster combined with keepalived and conntrackd. This would allow a proper set of stateful rules to be put in place combined with a highly available firewall.

Something like this would be my recommendation as a starting point. Obviously, this is just one example of what is possible.

Linux-HA-network

 

The firewalls could be made redundant using keepalived and VRRP addresses or possibly using Quagga and advertised routes with specific preferences to ensure one firewall is primary and the other failover. As always, any redundant/HA setup should be fully tested to ensure all, or as many as practically possible, failure modes result in the desired backup state.

So – in summary – Linux, Quagga and IPTABLES could be a good fit in certain organisations or situations. There are many things to factor in when deciding on your routing platform – cost, performance, features, maintenance, skill sets, upgadeability, reuse, supportability, recoverablity, reliability, etc – however I do believe that Linux coupled with the mentioned tools offers a compelling option these days.

 

Last night I was upgrading some ESX 3.5 VMs from “flexible” NICs to “VMXNET Enhanced” NICs and ran into a problem with on of the servers. As an aside, doing some rudimentary throughput testing using the iperf tool I was surprised to see the significant throughput increase and CPU usage decrease when switching from flexible (i.e. VMXNET) to VMXNET Enhanced (i.e. VMXNET2) NICs. Well worth doing – apart from the gotcha I ran into…

This one particular VM which showed a problem is running Quagga (a BGP daemon) to inject routes for a local AS112 server. Once I switched over the NICs the remote Cisco router started logging the following errors

  • %TCP-6-BADAUTH: No MD5 digest from
  • %TCP-6-BADAUTH: Invalid MD5 digest from

and would not bring up the BGP peering session. BGP MD5 authentication is enabled between the router and the Quagga daemon. The use of “debug ip tcp transactions” also shows the invalid MD5 signatures.

I initially suspected it was related to a offloaded checksumming issue which I previously observed (http://communities.vmware.com/thread/250159). Turns out it was not incorrect TCP checksums being calculated (related to the above post’s incorrect UDP checksums).

Digging a bit deeper I came across a post on the quagga-users mailing list describing a similar problem to the one I was observing. The usage of MD5 checksums “complicates” the process of offloading checksumming to NICs.

The VMXNET Enhanced NIC enables additional offloading from the VM to the NIC to enhance performance. This works well for many use cases but causes problems when using TCP MD5 checksums. In my case, turning off “TCP Segmentation Offload” has worked around the problem. Adding a command such as

ethtool -K eth0 tso off
ethtool -K eth0 sg off

to a startup script has worked around the issue to some degree. In an ideal world, the VMXNET driver should allow “tx-checksumming” to be turned off using ethtool aswell.

In fairness to VMware on this one, this issue appears to not be specific to virtual machines but may in fact be observed on physical hardware with NICs providing offload functions.

Having used the above two ethtool commands, to allow the BGP session to be established, I still continue to see the following errors on the Cisco router:

TCP0: bad seg from x.x.x.x -- no MD5 string: port 179 seq
%TCP-6-BADAUTH: No MD5 digest from

%TCP-6-BADAUTH: Invalid MD5 digest from

 

Interestingly, using the flexible and hence original VMXNET VNIC within the VM (but oddly the same actual VMXNET driver binary!!) tx-checksumming and scatter-gather is enabled for the NIC:

$ ethtool -k eth0
Offload parameters for eth0:
Cannot get device rx csum settings: Operation not supported
rx-checksumming: off
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
$

For the record, all the above is using the latest ESX 3.5 VMware tools build 317866 from VMwareTools-3.5.0-317866.tar.gz.

 

Doing a tcpdump (from a physical server) of the traffic between the router and the VM reveals that packets which leave the VM with a valid MD5 signature (as per tcpdump from within the VM) arrive on the wire with an invalid MD5 signature (tcpdump -s0 -M md5password port 179). This indicates that VMware ESXi may infact be altering the packets between the VM and the wire in some way which is invalidating the MD5 signature 🙁

For now, some errors with an established session is better than lots of errors and no BGP session. Ideally there would be no errors being logged by the Cisco router.