I've been troubleshooting this problem for weeks now with no success. We've been losing ICMP ping to a single IP on our IPSEC VPN and we can't seem to find out why. Here is our setup followed by a more detailed explanation of the problem:
Setup
Left Network (172.27.1.0/24) <-> SonicWALL 8500 <-> Juniper Firewall <-> Internet <-> Right Network (Multiple Client networks)
The Left Network is small and consists of 2 CentOS 6 servers (2 NICs each: 1 on 172.27.1.0 network and other on local network - 192.168.1.0)
The Right Network at each site consists of a single CentOS 5 server that is also the Right side VPN endpoint (2 NICs: 1 on VPN network and one on local network - 192.168.10.0)
- Right side VPN networks cover a couple of different subnets depending on the site. For the examples here I'll use 192.168.24.0/24
The VPN is IPSEC IKEv2 using Certificate based authentication. We have our own CA and sign and issue all the certs in the configuration.
On the CentOS 5 machines we're using OpenSwan 2.6.32-5 using NSS for the certs. Here is a sample config file:
ipsec.conf
Code: Select all
# /etc/ipsec.conf - Libreswan IPsec configuration file
# This file: /etc/ipsec.conf
#
# Enable when using this configuration file with openswan instead of libreswan
#version 2
#
# Manual: ipsec.conf.5
# basic configuration
config setup
# which IPsec stack to use, "netkey" (the default), "klips" or "mast".
# For MacOSX use "bsd"
protostack=netkey
#
# The interfaces= line is only required for the klips/mast stack
#interfaces="%defaultroute"
#interfaces="ipsec0=eth0 ipsec1=ppp0"
#
# If you want to limit listening on a single IP - not required for
# normal operation
#listen=127.0.0.1
#
# Do not set debug options to debug configuration issues!
#
# plutodebug / klipsdebug = "all", "none" or a combation from below:
# "raw crypt parsing emitting control kernel pfkey natt x509 dpd
# private".
# Note: "crypt" is not included with "all", as it can show confidential
# information. It must be specifically specified
# examples:
# plutodebug="control parsing"
# plutodebug="all crypt"
plutodebug="control"
# Again: only enable plutodebug or klipsdebug when asked by a developer
#plutodebug=none
#klipsdebug=none
#
# Normally, pluto logs via syslog. If you want to log to a file,
# specify below or to disable logging, eg for embedded systems, use
# the file name /dev/null
# Note: SElinux policies might prevent pluto writing to a log file at
# an unusual location.
plutostderrlog=/var/log/pluto.log
#
# Enable core dumps (might require system changes, like ulimit -C)
# This is required for abrtd to work properly
# Note: SElinux policies might prevent pluto writing the core at
# unusual locations
dumpdir=/var/run/pluto/
#
# NAT-TRAVERSAL support
# exclude networks used on server side by adding %v4:!a.b.c.0/24
# It seems that T-Mobile in the US and Rogers/Fido in Canada are
# using 25/8 as "private" address space on their wireless networks.
# This range has not been announced via BGP (at least upto 2010-12-21)
nat_traversal=yes
#virtual_private=%v4:10.0.0.0/8,%v4:192.168.0.0/16,%v4:172.16.0.0/12,%v4:25.0.0.0/8,%v4:100.64.0.0/10,%v6:fd00::/8,%v6:fe80::/10
virtual_private=%v4:172.16.0.0/12,%v4:192.168.0.0/16,%v4:10.0.0.0/8,%v4:!192.168.24.6/32
# Add connections here
# For example connections, see your distribution's documentation directory,
# or the documentation which could be located at
# /usr/share/docs/libreswan-3.*/ or look at https://www.libreswan.org/
#
# There is also a lot of information in the manual page, "man ipsec.conf"
# You may put your configuration (.conf) file in the "/etc/ipsec.d/" directory
# by uncommenting this line
include /etc/ipsec.d/*.conf
Code: Select all
conn site1
ike=aes256-sha1;modp1024
phase2=esp
phase2alg=aes256-sha1;modp1024
ikev2=insist
pfs=yes
salifetime=480m
rekeymargin=9m
rekeyfuzz=100%
ikelifetime=28800s
left=<our external IP address>
leftid="<our cert CN>"
leftsubnet=172.27.1.0/24
leftnexthop=%defaultroute
leftrsasigkey=%cert
right=<client side external IP>
rightsourceip=192.168.24.6
rightid="<client cert CN>"
rightsubnet=192.168.24.6/32
rightnexthop=192.168.80.1
rightrsasigkey=%cert
rightcert=site1
rightupdown="ipsec _updown --route yes"
auto=start
We connect and establish the VPN fine. The two servers on the 172.27.1.0 network can communicate to the server on 192.168.24.6 and vice versa. We use this VPN tunnel as a hop to get to devices on the client side local network via IP forwarding. For example:
VNC to client local machine
Tech on windows machine (192.168.1.10) to client Linux machine (192.168.10.10)
Server 1 with VPN IP of 172.27.1.21 is configured to forward traffic from port 64905 (randomly chosen) to client server (Server 2) on 192.168.24.6 and SNAT to 172.27.1.21.
Server 2 is configured to forward traffic from port 64905 to client Linux machine (192.168.10.10) on port 5900 and SNAT to 192.168.24.6.
This process works and we don't have problems with that.
We have a background process on the client side setup to monitor via ping that it can reach the 172.27.1.21 verifying that the VPN is up and we can pass traffic through it. If the check fails for 3 minutes, we restart ipsec. This is where we first noticed the problem. It seems that we are losing ICMP communication to 172.27.1.21 in chunks. If we do a ping to that IP address from the client server, in 1000 packets, we lose about 15-20% of them in one large chunk (ex. 700-850 will fail). This only happens to this IP address and while the problem does occur on every client server, the packet loss does not present itself on all client servers simultaneously.
tcpdump from the client server side during the problem:
11:10:11.108816 IP 172.27.1.21 > 192.168.24.6: ICMP echo request, id 14602, seq 119, length 64
11:10:12.108833 IP 172.27.1.21 > 192.168.24.6: ICMP echo request, id 14602, seq 120, length 64
11:10:13.108772 IP 172.27.1.21 > 192.168.24.6: ICMP echo request, id 14602, seq 121, length 64
11:10:14.108707 IP 172.27.1.21 > 192.168.24.6: ICMP echo request, id 14602, seq 122, length 64
The SonicWALL 8500 packet sniffer indicates that it never recieves the ICMP packet. Also a wireshark capture on the switch indicates that the ICMP/ESP packet never leaves the client CentOS 5 server.
Pinging from the other direction (172.27.1.21 to 192.168.24.6) we can trace the packet through the 8500 and even in the tcpdump of the CentOS 5 machine yet we never see the reply.
Troubleshooting
We've tried several things to try and fix the problem to no avail. It's been difficult even tracking it down as much as we have. We've tried different switches, ipsec configs, replacing OpenSwan with libreswan 3.10-1. The log files don't seem to show any relevant information (I'll post anything you want to see though). We're kind of at our wits end at this point.
Any assistance anyone may have would be greatly appreciated.
Thanks!