IPv6 ICMP “packet too big” filtering considered harmful

If you intend to seriously run Internet servers or firewalls in the future (hence, IPv6 servers and firewalls), please read this.

This problem is so well-known, so old and yet still so unfixed and pervasive nowadays that, after pulling my hair for days on many hanging or time-outing IPv6 sessions, I felt I had to write this.

Executive summary: there are a huge number of sites with misconfigured firewalls who filter out “ICMP6 packet too big” packets. This breaks Path MTU discovery, causing hanging or broken IPv6 sessions.

Many sites unknowingly assume that the Internet MTU is at least 1500 bytes. This is wrong, whether in IPv4 or IPv6.

Many Internet hosts are connected through tunnels reducing the real MTU. Use of PPPoE for example, on ADSL links, reduces the MTU by a few bytes, and use of 6rd (“6 rapid deployment” tunneling) reduces it more than that. As 6rd is used extensively in France (Free ISP), this is a big problem.

1. The symptom: hanging IPv6 connections

Here’s a sample capture for a request where the server has more than 1 data packet.

08:39:57.785196 IP6 2a01:e35:8b50:2c40::7.39738 > 2001:xxx.43: S 165844086:165844086(0) win 65535 <mss 1440,nop,wscale 3,sackOK,timestamp 901

08:39:57.807709 IP6 2001:xxx.43 > 2a01:e35:8b50:2c40::7.39738: S 883894656:883894656(0) ack 165844087 win 14280 <mss 1440,sackOK,timestamp 2377433946 90108,nop,wscale 7>

08:39:57.808452 IP6 2a01:e35:8b50:2c40::7.39738 > 2001:xxx.43: .ack 1 win 8211 <nop,nop,timestamp 90132 2377433946>

08:39:57.808655 IP6 2a01:e35:8b50:2c40::7.39738 > 2001:xxx.43: P 1:9(8) ack 1 win 8211 <nop,nop,timestamp 90132 2377433946>

08:39:57.833052 IP6 2001:xxx.43 > 2a01:e35:8b50:2c40::7.39738: .ack 9 win 112 <nop,nop,timestamp 2377433972 90132>

08:39:57.888981 IP6 2001:xxx.43 > 2a01:e35:8b50:2c40::7.39738: P 1:1025(1024) ack 9 win 112 <nop,nop,timestamp 2377434026 90132>

(missing packet here : 1025:2453 containing 1428 bytes)

08:39:57.889315 IP6 2001:xxx.43 > 2a01:e35:8b50:2c40::7.39738: FP 2453:2723(270) ack 9 win 112 <nop,nop,timestamp 2377434027 90132> 08:39:57.890100 IP6 2a01:e35:8b50:2c40::7.39738 > 2001:xxx.43: .ack 1025 win 8211 <nop,nop,timestamp 90213 2377434026,nop,nop,sack 1 {2453:2723}>

(session hangs here, unterminated because of the missing bytes)

This is difficult to debug as modern Unices have a “TCP host cache” keeping track of Path MTUs on a host-by-host basis, causing the problem to suddenly disappear. in unpredictable ways depending on the size of transmitted data.

2. A sample successful session with working trial-and-error Path MTU discovery

10:09:55.291649 IP6 2a01:e35:8b50:2c40::7.40948 > 2a01:e0d:1:3:58bf:fa61:0:1.43: S 1032533547:1032533547(0) win 65535 <mss 1440,nop,wscale 3,sackOK,timestamp 5487603 0>

10:09:55.291787 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.40948:S 3695299654:3695299654(0) ack 1032533548 win 65535 <mss 1440,nop,wscale 3,sackOK,timestamp 3185067848 5487603>

10:09:55.316234 IP6 2a01:e35:8b50:2c40::7.40948 > 2a01:e0d:1:3:58bf:fa61:0:1.43: . ack 1 win 8211 <nop,nop,timestamp 5487628 3185067848>

10:09:55.317965 IP6 2a01:e35:8b50:2c40::7.40948 > 2a01:e0d:1:3:58bf:fa61:0:1.43: P 1:9(8) ack 1 win 8211 <nop,nop,timestamp 5487628 3185067848> 10:09:55.417301 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.40948: . ack 9 win 8210 <nop,nop,timestamp 3185067974 5487628>

Now the big packet that was missing in the broken session above:

10:09:56.084457 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.40948: . 1:1429(1428) ack 9 win 8210 <nop,nop,timestamp 3185068641 5487628>

The 6rd gateway replies with an ICMP6 message:

10:09:56.085221 IP6 2a01:e00:1:11::2 > 2a01:e0d:1:3:58bf:fa61:0:1: ICMP6, packet too big, mtu 1480, length 584

Missing data is retransmitted by the server using a lower packet size (and an entry is created in the server’s host cache to remember that):

10:09:56.085489 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.40948: . 1:1409(1408) ack 9 win 8210 <nop,nop,timestamp 3185068642 5487628> 10:09:56.085522 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.40948: . 1409:1429(20) ack 9 win 8210 <nop,nop,timestamp 3185068642 5487628>

Then the connection goes on to correct completion (no use showing the packets here).

Interestingly, trying an identical request then shows that the MSS negotiation takes the host cache into account, with a MSS set to 1420 instead of 1440 from the start in the server reply:

10:10:14.053218 IP6 2a01:e35:8b50:2c40::7.20482 > 2a01:e0d:1:3:58bf:fa61:0:1.43: S 2231600544:2231600544(0) win 65535 <mss 1440,nop,wscale 3,sackOK,timestamp 5506365 0>

10:10:14.053382 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.20482: S 2676514636:2676514636(0) ack 2231600545 win 65535 <mss 1420,nop,wscale 3,sackOK,timestamp 1128201317 5506365>

3. The simple fix

The fix is dead simple: just make sure that your filters are configured so that ICMP6 “packet too big”, type number 2, messages are correctly transmitted end-to-end, and correctly handled.