Lose ICMP intermittently to one IP address through VPN

Issues related to configuring your network
Post Reply
daikamar
Posts: 4
Joined: 2014/10/06 15:22:33

Lose ICMP intermittently to one IP address through VPN

Post by daikamar » 2014/10/06 16:05:00

Hello,

I've been troubleshooting this problem for weeks now with no success. We've been losing ICMP ping to a single IP on our IPSEC VPN and we can't seem to find out why. Here is our setup followed by a more detailed explanation of the problem:

Setup

Left Network (172.27.1.0/24) <-> SonicWALL 8500 <-> Juniper Firewall <-> Internet <-> Right Network (Multiple Client networks)

The Left Network is small and consists of 2 CentOS 6 servers (2 NICs each: 1 on 172.27.1.0 network and other on local network - 192.168.1.0)
The Right Network at each site consists of a single CentOS 5 server that is also the Right side VPN endpoint (2 NICs: 1 on VPN network and one on local network - 192.168.10.0)
- Right side VPN networks cover a couple of different subnets depending on the site. For the examples here I'll use 192.168.24.0/24

The VPN is IPSEC IKEv2 using Certificate based authentication. We have our own CA and sign and issue all the certs in the configuration.

On the CentOS 5 machines we're using OpenSwan 2.6.32-5 using NSS for the certs. Here is a sample config file:

ipsec.conf

Code: Select all

# /etc/ipsec.conf - Libreswan IPsec configuration file

# This file:  /etc/ipsec.conf
#
# Enable when using this configuration file with openswan instead of libreswan
#version 2
#
# Manual:     ipsec.conf.5

# basic configuration
config setup
        # which IPsec stack to use, "netkey" (the default), "klips" or "mast".
        # For MacOSX use "bsd"
        protostack=netkey
        #
        # The interfaces= line is only required for the klips/mast stack
        #interfaces="%defaultroute"
        #interfaces="ipsec0=eth0 ipsec1=ppp0"
        #
        # If you want to limit listening on a single IP - not required for
        # normal operation
        #listen=127.0.0.1
        #
        # Do not set debug options to debug configuration issues!
        #
        # plutodebug / klipsdebug = "all", "none" or a combation from below:
        # "raw crypt parsing emitting control kernel pfkey natt x509 dpd
        #  private".
        # Note: "crypt" is not included with "all", as it can show confidential
        #       information. It must be specifically specified
        # examples:
        # plutodebug="control parsing"
        # plutodebug="all crypt"
        plutodebug="control"
        # Again: only enable plutodebug or klipsdebug when asked by a developer
        #plutodebug=none
        #klipsdebug=none
        #
        # Normally, pluto logs via syslog. If you want to log to a file,
        # specify below or to disable logging, eg for embedded systems, use
        # the file name /dev/null
        # Note: SElinux policies might prevent pluto writing to a log file at
        #       an unusual location.
        plutostderrlog=/var/log/pluto.log
        #
        # Enable core dumps (might require system changes, like ulimit -C)
        # This is required for abrtd to work properly
        # Note: SElinux policies might prevent pluto writing the core at
        #       unusual locations
        dumpdir=/var/run/pluto/
        #
        # NAT-TRAVERSAL support
        # exclude networks used on server side by adding %v4:!a.b.c.0/24
        # It seems that T-Mobile in the US and Rogers/Fido in Canada are
        # using 25/8 as "private" address space on their wireless networks.
        # This range has not been announced via BGP (at least upto 2010-12-21)
        nat_traversal=yes
        #virtual_private=%v4:10.0.0.0/8,%v4:192.168.0.0/16,%v4:172.16.0.0/12,%v4:25.0.0.0/8,%v4:100.64.0.0/10,%v6:fd00::/8,%v6:fe80::/10
        virtual_private=%v4:172.16.0.0/12,%v4:192.168.0.0/16,%v4:10.0.0.0/8,%v4:!192.168.24.6/32

# Add connections here

# For example connections, see your distribution's documentation directory,
# or the documentation which could be located at
#  /usr/share/docs/libreswan-3.*/ or look at https://www.libreswan.org/
#
# There is also a lot of information in the manual page, "man ipsec.conf"

# You may put your configuration (.conf) file in the "/etc/ipsec.d/" directory
# by uncommenting this line
include /etc/ipsec.d/*.conf
site1.conf

Code: Select all

conn site1
        ike=aes256-sha1;modp1024
        phase2=esp
        phase2alg=aes256-sha1;modp1024
        ikev2=insist
        pfs=yes
        salifetime=480m
        rekeymargin=9m
        rekeyfuzz=100%
        ikelifetime=28800s
        left=<our external IP address>
        leftid="<our cert CN>"
        leftsubnet=172.27.1.0/24
        leftnexthop=%defaultroute
        leftrsasigkey=%cert
        right=<client side external IP>
        rightsourceip=192.168.24.6
        rightid="<client cert CN>"
        rightsubnet=192.168.24.6/32
        rightnexthop=192.168.80.1
        rightrsasigkey=%cert
        rightcert=site1
        rightupdown="ipsec _updown --route yes"
        auto=start
Problem
We connect and establish the VPN fine. The two servers on the 172.27.1.0 network can communicate to the server on 192.168.24.6 and vice versa. We use this VPN tunnel as a hop to get to devices on the client side local network via IP forwarding. For example:

VNC to client local machine
Tech on windows machine (192.168.1.10) to client Linux machine (192.168.10.10)
Server 1 with VPN IP of 172.27.1.21 is configured to forward traffic from port 64905 (randomly chosen) to client server (Server 2) on 192.168.24.6 and SNAT to 172.27.1.21.
Server 2 is configured to forward traffic from port 64905 to client Linux machine (192.168.10.10) on port 5900 and SNAT to 192.168.24.6.

This process works and we don't have problems with that.
We have a background process on the client side setup to monitor via ping that it can reach the 172.27.1.21 verifying that the VPN is up and we can pass traffic through it. If the check fails for 3 minutes, we restart ipsec. This is where we first noticed the problem. It seems that we are losing ICMP communication to 172.27.1.21 in chunks. If we do a ping to that IP address from the client server, in 1000 packets, we lose about 15-20% of them in one large chunk (ex. 700-850 will fail). This only happens to this IP address and while the problem does occur on every client server, the packet loss does not present itself on all client servers simultaneously.

tcpdump from the client server side during the problem:
11:10:11.108816 IP 172.27.1.21 > 192.168.24.6: ICMP echo request, id 14602, seq 119, length 64
11:10:12.108833 IP 172.27.1.21 > 192.168.24.6: ICMP echo request, id 14602, seq 120, length 64
11:10:13.108772 IP 172.27.1.21 > 192.168.24.6: ICMP echo request, id 14602, seq 121, length 64
11:10:14.108707 IP 172.27.1.21 > 192.168.24.6: ICMP echo request, id 14602, seq 122, length 64

The SonicWALL 8500 packet sniffer indicates that it never recieves the ICMP packet. Also a wireshark capture on the switch indicates that the ICMP/ESP packet never leaves the client CentOS 5 server.
Pinging from the other direction (172.27.1.21 to 192.168.24.6) we can trace the packet through the 8500 and even in the tcpdump of the CentOS 5 machine yet we never see the reply.

Troubleshooting
We've tried several things to try and fix the problem to no avail. It's been difficult even tracking it down as much as we have. We've tried different switches, ipsec configs, replacing OpenSwan with libreswan 3.10-1. The log files don't seem to show any relevant information (I'll post anything you want to see though). We're kind of at our wits end at this point.

Any assistance anyone may have would be greatly appreciated.

Thanks!

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Lose ICMP intermittently to one IP address through VPN

Post by TrevorH » 2014/10/07 10:18:25

How do you know this is a VPN problem and not a general problem with the network on that CentOS 5 machine? What ethernet card is installed?
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

daikamar
Posts: 4
Joined: 2014/10/06 15:22:33

Re: Lose ICMP intermittently to one IP address through VPN

Post by daikamar » 2014/10/07 11:18:17

I don't know that this is a VPN problem just wanted there to be full disclosure on our configuration. While it could be the network on the CentOS 5 machine, I think it's more likely a problem with the OS on the machines themselves as there are about 200 of these machines all in different locations.

This is the NIC installed on these machines:

NetXtreme II BCM5716 Gigabit Ethernet

Post Reply