Category Archives: en

Converting a FreeBSD ezjail configuration to VNET

I have recently converted my self-hosted FreeBSD jails (including this very blog) to the VNET architecture.

A few words about VNET

The purpose of this post is not to explain jails, or VNET, but to provide examples for migration from the traditional jail networking environment (in my case, using ezjail), to the VNET architecture. There are numerous documents online for jail environments based on iocage, but not that much about ezjail-based ones.

Up to VNET, networking in jails had severe limitations on addressing, in particular limitations on the loopback interfaces (::1 and 127.0.1) and usage of IP aliases, which caused numerous configuration headaches. This was due to the jails sharing network interfaces and the full networking stack with the host. It was possible to alleviate some of this with multiple routing tables (setfib & al), but it was still limited.

VNET allows the jails to run networking stacks totally separated from the host’s, like it would in a fully virtualized guest. As a consequence, it allows running virtual routers with specific firewalls filters to better organize and isolate jail networking.

VNET basically works by moving network interfaces to the guest jails, in a separate instance of the network stack, hiding them from the host environment. This is done at jail startup, but it can also be done dynamically to a running jail with:

ifconfig <interface> vnet <jail_id>

VNET works on any kind of interface: physical or virtual. It is thus perfectly possible to assign a physical interface, or a VLAN tagged interface, etc, to a jail.

Enabling VNET

First, we need to enable VNET in the kernel. From FreeBSD 12, the default kernel has VNET already, so there is nothing to do, unless you have a custom kernel. On FreeBSD 11, you need to recompile a kernel after adding the following line:

options VIMAGE # Subsystem virtualization, e.g. VNET

VNET and ezjail

What to do next with ezjail?

ezjail‘s configuration files are stored in /usr/local/etc/ezjail, one file per jail, named after the jail’s name. ezjail uses environment variables based on the former jail configuration variables stored in /etc/rc.conf. Under the hood, the system converts these lines to the new jail syntax, .conf files stored in /var/run.

The line that configures networking looks like the following (may be wrapped on your screen):

export jail_jailname_ip="re0|,re0|2a01:e34:ec2a:94a0::11,lo0|"

To convert this configuration to VNET, we have to:

  • disable the traditional jail networking system: this done by providing an empty value for the above line
  • enable VNET for the jail
  • specify the VNET interface(s) the jail is going to use

Which is done using the following lines:

export jail_jailname_ip=""
export jail_jailname_vnet_enable="YES"
export jail_jailname_vnet_interface="epair17b"

Note that we don’t specify IP addresses or the loopback interface anymore. Configuration will be done by the jail itself, possibly in the regular /etc/rc.conf way:


We still have to create the interface the jail is going to use, here epair17b. I chose the epair/if_bridge architecture as it seemed the most flexible and easier to get a grip of, but it is also possible to use netgraph-based interfaces, or anything other the system supports.

epair interfaces are 2 virtual network interfaces linked with a virtual crossover cable. if_bridge is a bridge interface which switches traffic between the interfaces you attach to it. By combining both and adding routers, you can create any virtual network architecture.

To prepare the interfaces,

ifconfig epair17

creates two interfaces, epair17a and epair17b.

epair17b will be given to the jail; epair17a will stay on the host, and will have to get connectivity somehow. This is typically done by making it a bridge member.

epair17a may or may not have an IP address assigned to it (it does not need one if it is only used for bridging), but it needs to be up:

ifconfig epair17a up

We also need to add one of the interfaces to a bridge, so it gets connectivity to the rest of the network:

ifconfig bridge0 create up
ifconfig bridge0 addm epair17a

To make it easier to understand, I made a view images showing possible architectures.

First, example of a basic configuration where all the jails are configured on the same local network as the host through bridge0, mimicking the traditional jail networking.

Figure 1

Here, the jails are organized on two separate subnetworks, with Host possibly providing IP routing and firewalling.

Figure 2

Lastly, on Figure 3, another architecture where the first group of guests, Guest 1 and Guest 2, is directly configured on the local network, whereas Guest 4 and Guest 5 are connected through virtual router Guest 3. For example, this can be used in a setting where Guest 1 and Guest 2 provide the front-end to a service, and Guest 3 and Guest 4 provide the backend (databases, etc). Guest 4 and Guest 5 don’t even need full connectivity to the Internet, this can be enforced with firewall rules on Host or Guest 3.

Figure 3

Making the configuration persistent

The above commands were meant to explain the workings of the setup, but they are ephemeral. The configurations need to be made persistent in the boot configuration of Host, for example in /etc/rc.conf:

cloned_interfaces="bridge0 bridge1 epair1 epair2 ... ifconfig_bridge0="up addm re0 addm epair1a addm epair2a ..."

Note that the epair interfaces on the guests don’t need to be up from the host configuration. The guest startup code will manage this.

Using jib to create/destroy interfaces dynamically

The above static configuration has a small issue: VNET takes quite some time (dozens of seconds) to reassign an interface of a deleted jail to the host, making it invisible in the meantime. This means that a jail restart will fail for lack of the adequate interface.

To avoid this, and create persistent MAC addresses for the interface, which comes-in handy, there are scripts provided in /usr/share/examples/jails, jib (for epair/bridge-based interfaces) and jng (for netgraph-based interfaces).

We just need to install these scripts in /usr/local/sbin and make them executable.

cp /usr/share/examples/jails/jib /usr/local/sbin
chmod a+rx /usr/local/sbin/jib
cp /usr/share/examples/jails/jng /usr/local/sbin
chmod a+rx /usr/local/sbin/jng

jib creates epair interfaces and adds one interface of the pair to a bridge connected to an output interface, ie:

jib addm TEST re0

will create interfaces e0a_TEST and e0b_TEST and add e0a_TEST to a bridge named re0bridge if it exists, or failing that, create such a bridge and connect it to re0. The jail will be configured to use nterface e0b_TEST.

The cherry on the cake with jib/jng : they try and keep MAC addresses persistent.

To create and destroy interfaces dynamically with ezjail, instead of tweaking /etc/rc.conf, we only need to add the following lines to the ezjail configuration file for the jail:

export jail_jailname_vnet_enable="YES"
export jail_jailname_vnet_interface="e0b_jailname"
export jail_jailname_exec_prestart0="/usr/local/sbin/jib addm jailname re0"
export jail_jailname_exec_poststop0="/usr/local/sbin/jib destroy jailname"


Note that it is possible to directly set-up IP addresses on bridge0 bridge1 etc, which may save a couple of epair interfaces in the second and third examples. This is left as an exercise for the reader.

Also, it seems currently difficult or impossible to use VLAN interfaces (if_vlan) in a bridge configuration. I’m still digging on this subject.


I have found the following pages useful when preparing my setup and this post:

Thanks to Jacques Foucry for his work on the nice graphics, Mat Arnold for pointing me to /usr/share/examples/jails and Éric Walter for the idea of the SVG WordPress plugin, avoiding the use of pixelated graphics 🙂

Post-mortem of a DNSSEC incident at

(or: the good, the bad and the ugly)


Due a bug in zone generation, all updates for the EU.ORG zone were stuck from 2020-08-29 02:19 UTC to 2020-09-04 14:40 UTC. Then an incorrect fix was made, resulting in the publication of incorrect DNSSEC signatures for the zone from 2020-09-04 14:40 UTC to 2020-09-04 19:37:00 UTC. Then the final, correct fix was implemented.

This episode, unoriginal albeit humbling, nevertheless yielded interesting returns of experience.

All times in the rest of this document are UTC times.

The software setup at

The primary DNS server for EU.ORG runs ISC‘s BIND. The zone is currently generated by Python and shell scripts from a Postgresql database. This does not include DNSSEC records for the zone (except DS records for delegations). DNSSEC records are generated and refreshed by dnssec-signzone, one of the tools provided with bind. Once the zone file has been updated, it is reloaded using rndc reload, another command-line tool provided with bind.

Zone key rotation is handled by custom scripts which periodically check for key age and schedule key generation, pre-publication, activation and de-activation as needed, calling dnssec-keygen to manage the key files.

Setup for the failure: blocked updates

2020-08-29 02:19: due to a race condition in the zone generation process (issue #1), the EU.ORG zone file disappeared.

The last good and published version of the EU.ORG zone file, still loaded in the primary server, had serial number 2020082907, generated at 2020-08-29 01:12. In the case of a missing file, the reload obviously fails but bind behaves nicely and keeps serving its older in-memory version of the file.

However, the disappearance of the zone file caused all subsequent zone file generation processes to fail (issue #2), as they were accessing the current version of the file to fetch the currently published serial number.

The problem remained unnoticed (issue #3: incomplete monitoring) until 4 September 2020, when a user notified us that his new domain was still undelegated.

The ugly

Around 2020-09-04 14:40, a first fix was attempted: a known good version of the zone file was reinstalled to allow the zone generation process to succeed, then a new zone was generated, freshly DNSSEC-signed, and loaded.

However, the above timeline conflicted with a scheduled key rotation of the zone-signing keys. The theoretical key rotation schedule was as follows:

Theoretical key rotation schedule

The new key (14716) was due to be published from 2020-08-29 05:37, a few hours after the zone update process failed. It should have been present in concerned resolver caches about 24 hours later, alongside the previous key (22810), ready to be used to check signatures (RRSIG records) of the zone which were supposed to be published from 2020-09-03 05:37.

However, due the zone update suspension, this happened instead. The skipped steps are shown in gray.

Actual key rotation schedule (before fix)

The zone was directly updated from the 2020-08-14/2020-08-29 key configuration to 2020-09-04 14:40.

A few minutes after 2020-09-04 14:40, it was apparent that something was amiss: the resolution of EU.ORG domains failed for people using resolvers with DNSSEC validation.

The cause was quickly identified: since pre-publication for DNSKEY 14716 was missed, most resolvers only had the unexpired DNSKEY 22810 in their cache, while the only RRSIG records available in the zone servers required key 14716.

The bad

The obvious fix was to reactivate key 22810 and regenerate the zone signatures (RRSIG records) with dnssec-signzone. This also leaves in place the signatures with key 14716 (keeping the latter was needed for resolvers which had begun to cache key 14176).

As a side note, it helped that the EU.ORG switched a few months ago to NSEC3 “opt-out” mode. This saves a lot of space (especially in nameserver memory) for zones with many delegations, which is especially useful if you temporarily need double signatures such as in this episode.

A first implementation attempt was made at 2020-09-04 14:52 by updating the dates in the public key file (.key) for key 22810, pushing the inactivation date to 2020-09-07 05:37:00 and the deletion date to 2020-09-09 05:37:00.

Before update:

; Created: 20200808100738 (Sat Aug  8 12:07:38 202)
; Publish: 20200809053700 (Sun Aug  9 07:37:00 202)
; Activate: 20200814053700 (Fri Aug 14 07:37:00 202)
; Inactive: 20200903053700 (Thu Sep  3 07:37:00 202)
; Delete: 20200905053700 (Sat Sep  5 07:37:00 202)
EU.ORG. 172800 IN DNSKEY 256 3 8 AwEAAcHAqfeFzQqo9vFq8ZziaQs2...

Side remarks:

  • the TTL value above is ignored by dnssec-signzone, which by default reuses the TTL in the zone file. The actual TTL is 86400.
  • note the weird year 202 instead of 2020

After update:

; Created: 20200808100738 (Sat Aug  8 12:07:38 202)
; Publish: 20200809053700 (Sun Aug  9 07:37:00 202)
; Activate: 20200814053700 (Fri Aug 14 07:37:00 202)
; Inactive: 20200907053700
; Delete: 20200909053700
EU.ORG. 172800 IN DNSKEY 256 3 8 AwEAAcHAqfeFzQqo9vFq8ZziaQs2...

However… (issue #4: when working in a hurry, expect stupid mistakes), this fix was wrong, albeit harmless. As should have been obvious from the “;” prefix, the above lines are informational. The change above was without any effect, but this was initially unnoticed for lack of adequate testing. (issue #5: don’t reset resolver caches too early, it may hamper testing; if you are expecting specific RRSIG records, test this explicitly).

The good

The actual dates are in the adjoining .private file, which was finally updated as follows:

Private-key-format: v1.3
Algorithm: 8 (RSASHA256)
Successor: 14716
Created: 20200808100738
Publish: 20200809053700
Activate: 20200814053700
Inactive: 20200907053700
Delete: 20200909053700

This resulted in the following key rotation schedule, implemented from 2020-09-05 19:37, which finally fixed the issue and probably reduced the zone downtime by almost 19 hours.

It was tested on an untouched resolver which failed EU.ORG requests and recovered from the update (hypothesis: is this because of heuristics on RRSIG records when no cached DNSKEY matches the cached RRSIG records?).

Fixed key rotation schedule

Lessons learned

The above incident will result in several procedural changes on the EU.ORG servers. Some of these are marked as issue #n; others are being considered, like using bind‘s automated signature mode, coupled with dynamic zone updates, which would have made the whole episode moot (but would introduce a strong dependency on bind). Writing this post-mortem text helped make the most of the incident.

Thanks to Stéphane Bortzmeyer, always vigilant when it comes to DNS and DNSSEC bugs, who noticed and notified us that the zone was still broken after the initial incorrect fix, and who read and commented an initial version of this text.

Article 13 of the Copyright Directive considered harmful

[this is a translation+partial update of my original post in French here]

The “directive on copyright in the Digital Single Market“,  in short “Copyright Directive”, having passed the JURI commission vote with amendments on 20 June  2018, will soon be voted in a plenary session of the European parliament, 5 July 2018.

I wrote the following text before calling some Members of the European Parliament (MEPs), thus participating in the campaign started by

I would like to invite you to do the same, not before you have read some of the references quoted at the end of this page, and consulted

Two articles are especially dangerous.

  • Article 11, about referencing and quoting press articles; we will not develop this issue any further here.
  • Article 13, about so-called “upload filters” on all content sharing sites (ie all sites who have a function of sharing content, including comments/videos/photographs/audio on social networks).

The stated goal of article 13 is to protect rightholders of the entertainment industry against the hegemony of the big web sharing platforms, most notably Youtube, which alledgedly results in revenue “evasion” when rightholder’s contents are illegally uploaded and consulted on these platforms.

The proposed solution is to create a legal obligation to deploy system blacklisting protected contents, on all content sharing sites, for all types of content, even those that don’t need protection (for example, computer software source code).

We are going to examine how such systems work, why they are costly to implement, with significant collateral damage, and why the targeted platform already implement measures to satisfy the stated goal.

Content blacklist systems

They can be roughly classified in three categories :

“Exact match” detection

They are relatively cheap in terms of resources. They work on raw digital data. They don’t need to be aware of formats or media type, not even of the detailed original content to protect, thanks to the use of so-called “hashing” or “digest” algorithms.

These features make these systems very easy to implement and operate, and very cheap. The algorithms are free and open source software, or public domain (for the underlying mechanism), and they are easily adapted to any platform.

On the other hand, these systems are very easy to bypass, through minor changes in the protected file. In consequence, they constitute a very poor protection for rightholders.

Detection “by similarity”

These systems are much more sophisticated. They have a knowledge of media formats, and are able to extract characteristic elements, similar to a fingerprint of the protected content.

This process enables a much wider detection of the content, even heavily modified, for example a barely audible background sound in a family video ou amateur show.

The most famous system in this category is Content-Id, implemented by Youtube, described here by Google. A lot of comments on Article 13 refer to Content-Id as a model. Article 13 itself seems to have been written with Content-Id in mind.

Systems “by similarity” are very expensive to develop and implement. According the the Google video quoted above, Content-Id required an investment of over $100 million.

There are also no free and open source implementation of such systems, which makes it even more difficult to deploy: you need to develop a custom, in-house system, or acquire a license for an existing commercial system, if you find one.  The companies in a position to provide such specific services are rare.

Furthermore, the detection performance (false positive and false negative rates) of these systems is difficult to estimate. First, for the above mentioned reasons (proprietary systems with limited access), second, because the underlying technical processes are based on heuristics which stops them from being fully reliable.

Finally, these system present an important drawback: as explained by Google in the Content-Id presentation video, rightholders must provide the original content, or protected excerpts from the content, which is difficult to achieve on a wide scale (many works and many actors on both roles, rightholders and content sharing platforms).

“watermarking” systems

These systems are mentioned in the annex of the directive. They are only presented here for the sake of completeness. Their costs are comparable to those of similarity detection systems, but they are of limited scope, probably not reasonably usable in the context of Article 13.

Blacklist management

Black list management, independently from the above technical criteria, constitutes an issue in itself.

Article 13 does not really provide satisfactory solutions to the following issues:

  • false positive (over-blocking): blocking legitimate content.
    • erroneous blacklisting by an alleged rightholder
    • erroneous blocking of content protected by an exception (parody, memes, etc), but in which the blacklisting systems have identified protected content.
    • erroneous insertions in the blacklist for other reasons. This happened repeatedly, for example, in the French police DNS blocking systems, including by misconfigured test systems. See [FR] bloqué pour apologie du terrorisme suite à une « erreur humaine » d’Orange.
  • false negative (under-blocking): not blocking illegitimate rightholder content. Content protection is difficult to implement, even on the rightholder side: many works have not even been digitalized by their legitimate rightholders.
  • adding new content to the blacklist may require manual, hence heavy, checks, to reduce false positives, but does not guarantee their elimination.
  • unwieldy and unreliable complaint mechanisms: all over-blocking and under-blocking issues have to be handled via human, or even judicial, intervention. But there are daily reports of abusive content removal here or there. For example, under the United States DCMA (Digital Millennium Copyright Act), some rightholders have been known to request content removal on works they didn’t own, by mere title similarity, or by claiming DMCA procedures to force removal of price lists in price comparators.
  • individuals and small companies are defenceless against abusive blocking of their content, if the site-internal reporting mechanism fails to address the issue in time. In most cases, action in court or even using an alternative dispute resolution system (13a) will be too expensive and too slow, resulting in a posteriori self-censorship.

Article 13 in its final redaction does not satisfactorily address these concerns, the last point above being the most worrisome.

The Content-Id system

Although Content-Id is owned by Google and Youtube-specific, it deserves a more thorough examination, as it seems to have been an implicit model for Article 13.

Content-Id is a “detection by similarity”. To use it, rightholders have to provide Youtube with the videos they wish to protect, or samples of these.

When a protected content is identified in a posted video, 3 options are available:

  • block the video
  • monetize the video (advertisement)
  • obtain traffic data, for example to know in which countries the video is popular.

According to Google, Content-Id has already enabled payment of several billions of dollars to rightholders, and the system includes hundreds of millions of videos.

Impact assessment of the directive

The summary of the impact assessment, as annexed to the project, is very incomplete: as compared to the full impact assessment study, it mentions only in part the impact for rightholders, limiting itself to a legal discussion in the digital single market. It doesn’t mention either the efficiency and technical feasibility of Article 13, or its consequences on Internet sites and the Internet ecosystem. It is advised to refer to the full impact assessment study.

1. Disappearance or marginalization of  contributive sites

Contributive sites based on free (Creative Commons, etc) content will not have the resources to exploit, not to mention develop or even rent/subscribe to systems similar to Content-Id.

The impact assessment study provides a real example of the subscribing costs to such a service: €900/month for a small site (5000 transactions/month, ie about €0.18/transaction; a transaction being a single check, needing to be executed for every post by a user).

The study only considers commercial sites where sharing is the main purpose. This fails to recognize the impact on high volume contributive sites, social networks, amateur or family photo sharing sites, classified advertisement, etc, for which there is no significant revenue stream as compared to the cost of monitoring posted content.

Most notably, social networks are targeted, as Article 2/4b of the directive excludes only 3 very specific types of sites from the requirements of Article 13.

  • services acting in a non-commercial purpose capacity such as online encyclopaedia
  • providers of cloud services for individual use which do not provided direct access to the public
  • open source software developing platforms
  • online market places whose main activity is the online retail of physical goods

As a consequence, this first impact on freedom of speech seems underevaluated.

2. All content types are targeted

Most content protection systems currently operated focus on contents from the entertainment industry:

  • videos and movies
  • music

On the other hand, Internet sharing applies to many other types of contents, for example photographs.

Again, the burden on Internet sites will be significant, with the same risks for abusive blocking, which also amplifies the consequences on the other listed issues.

3. Issues with respect to Freedom of Speech

As explained above and confirmed by many non-profit organizations, similarity detection systems are unable to differentiate illegal use from legal use such as a quote, a meme, a parody, etc.

It also happens frequently that works that are initially free of use are erroneously blacklisted, for example after being presented or quoted in protected TV shows or TV news.

In any case, content detection systems already result, when they are implemented, in abusive censorship. To force their generalization through the Directive can only be severely harmful to Freedom of Speech, especially on social networks, making it more difficult to exercise the above mentioned legal exceptions.

Finally, as explained, widening content detection systems to all types of contents can only make this risk more acute.

4. The proposed legal dispositions are inefficient to protect rightholders

As explained, similarity systems like Content-Id are not usable at global scale because of their cost, and exact match systems are easy to bypass.

Furthermore, similarity systems are already deployed on major sites, as explained by the impact assessment study:

In all, as content recognition technologies are already applied by the major user uploaded content services, it is likely that this option would not lead to significant increases in unjustified cases of prevented uploads compared to the current situation

In other words, Article 13 is not needed since the goals it seeks to achieve are already implemented where it matters.

5. The proposed dispositions may be harmful to cultural diversity

The impact assessment studies estimates that Article 13 will promote cultural diversity, which is assumed to be a natural byproduct of rightholder protection.

But Article 13 hampers the ability of contributive and/or non-profit sites, which without a doubt are also part of cultural diversity. Most of their contents are free of rights, hence with naturally maximized visibility and dissemination.

This is evidenced by Wikipedia’s statistics: 5th site in the world, according to the Alexa study. Furthermore, according to Wikimédia France: “platforms will prefer precaution by blocking more content than necessary, which will hamper their diversity, by preventing participation from people less accustomed to new technologies” (translated from « les plateformes opteront pour un principe de précaution en bloquant plus de contenu que nécessaire ce qui réduira la diversité de ces plateformes en empêchant les personnes peu aguerries aux nouvelles technologies d’y participer » here)

In summary, Article 13:

  • would not improve the rightholder’s situation with respect to the big platforms, since these already have deployed content detection and revenue sharing systems;
  • would not improve, either, the rightholder’s situation with respect to non-profit or low traffic platforms, which don’t have the ability to operate complex detection systems, don’t violate protected works other than accidentally thus in a limited way, and are already in position to remove illegal content.
  • represents, on the other hand, the following risks:
    • arbitrary censorship
    • reinforcement of the hegemony of big platforms by introducing significant barriers to entry
    • disappearance or marginalization of non-profit platforms, or fallback of these platforms on static content, removing the content sharing angle which is a key characteristic of the Internet;
  • represents, as well, serious risks regarding Freedom of Speech and Cultural Diversity.

For the above reasons, and as expressed by numerous organizations and renowned experts, it seems likely that Article 13, if kept in the directive, will do more harm than good on the European Internet.

A few references

The Open Letter on EP Plenary Vote, of which (as CEO) I am a signatory:

2 articles (amongst many others) on Julia Reda’s blog :

Open letter by 70 Internet experts

Positions of the EFF (Electronic Frontiers Foundation)

Other sites campaigning against Article 13:

Statement by the Wikimédia Foundation:

Bad idea: Gmail now discriminates against mail servers without an IPv6 reverse

This new gem is from the SMTP Gmail FAQ at

(Fun note: they call it the “Bulk Senders Guidelines”… hence apparently anyone running their own personal mail server falls in that category…)

“Additional guidelines for IPv6


  • The sending IP must have a PTR record (i.e., a reverse DNS of the sending IP) and it should match the IP obtained via the forward DNS resolution of the hostname specified in the PTR record. Otherwise, mail will be marked as spam or possibly rejected.
  • The sending domain should pass either SPF check or DKIM check. Otherwise, mail might be marked as spam.”

I happen to be running my own mail server, and I happen to also be IPv6-connected, and finally I happen to be lacking a reverse DNS delegation for IPv6 because my ISP (Free) didn’t yet bother providing me with one.

I’m happier than most as my mail is sent through the server, which happens to get its mail accepted by Gmail. But it ends up tagged as “spam”.

I’m not the only one in France. OVH is reported as having the same problem.

So what are my points?

  • obviously, my ISP should provide me with a correctly delegated IPv6 reverse… at some point, of course the sooner would be the better.
  • but, as has been determined for over 15 years now with IPv4, refusing mail based on a lacking reverse delegation is counter-productive… since spammers statistically tend to send spam from hosts with a reverse more often than legitimate users!
  • so measures like the above end up bothering legitimate users more than spammers.

So I hope Google will step back on this one, whether or not the reverse problem gets fixed.




IPv6 ICMP “packet too big” filtering considered harmful

If you intend to seriously run Internet servers or firewalls in the future (hence, IPv6 servers and firewalls), please read this.

This problem is so well-known, so old and yet still so unfixed and pervasive nowadays that, after pulling my hair for days on many hanging or time-outing IPv6 sessions, I felt I had to write this.

Executive summary: there are a huge number of sites with misconfigured firewalls who filter out “ICMP6 packet too big” packets. This breaks Path MTU discovery, causing hanging or broken IPv6 sessions.

Many sites unknowingly assume that the Internet MTU is at least 1500 bytes. This is wrong, whether in IPv4 or IPv6.

Many Internet hosts are connected through tunnels reducing the real MTU. Use of PPPoE for example, on ADSL links, reduces the MTU by a few bytes, and use of 6rd (“6 rapid deployment” tunneling) reduces it more than that. As 6rd is used extensively in France (Free ISP), this is a big problem.

1. The symptom: hanging IPv6 connections

Here’s a sample capture for a request where the server has more than 1 data packet.

08:39:57.785196 IP6 2a01:e35:8b50:2c40::7.39738 > 2001:xxx.43: S 165844086:165844086(0) win 65535 <mss 1440,nop,wscale 3,sackOK,timestamp 901

08:39:57.807709 IP6 2001:xxx.43 > 2a01:e35:8b50:2c40::7.39738: S 883894656:883894656(0) ack 165844087 win 14280 <mss 1440,sackOK,timestamp 2377433946 90108,nop,wscale 7>

08:39:57.808452 IP6 2a01:e35:8b50:2c40::7.39738 > 2001:xxx.43: .ack 1 win 8211 <nop,nop,timestamp 90132 2377433946>

08:39:57.808655 IP6 2a01:e35:8b50:2c40::7.39738 > 2001:xxx.43: P 1:9(8) ack 1 win 8211 <nop,nop,timestamp 90132 2377433946>

08:39:57.833052 IP6 2001:xxx.43 > 2a01:e35:8b50:2c40::7.39738: .ack 9 win 112 <nop,nop,timestamp 2377433972 90132>

08:39:57.888981 IP6 2001:xxx.43 > 2a01:e35:8b50:2c40::7.39738: P 1:1025(1024) ack 9 win 112 <nop,nop,timestamp 2377434026 90132>

(missing packet here : 1025:2453 containing 1428 bytes)

08:39:57.889315 IP6 2001:xxx.43 > 2a01:e35:8b50:2c40::7.39738: FP 2453:2723(270) ack 9 win 112 <nop,nop,timestamp 2377434027 90132> 08:39:57.890100 IP6 2a01:e35:8b50:2c40::7.39738 > 2001:xxx.43: .ack 1025 win 8211 <nop,nop,timestamp 90213 2377434026,nop,nop,sack 1 {2453:2723}>

(session hangs here, unterminated because of the missing bytes)

This is difficult to debug as modern Unices have a “TCP host cache” keeping track of Path MTUs on a host-by-host basis, causing the problem to suddenly disappear. in unpredictable ways depending on the size of transmitted data.

2. A sample successful session with working trial-and-error Path MTU discovery

10:09:55.291649 IP6 2a01:e35:8b50:2c40::7.40948 > 2a01:e0d:1:3:58bf:fa61:0:1.43: S 1032533547:1032533547(0) win 65535 <mss 1440,nop,wscale 3,sackOK,timestamp 5487603 0>

10:09:55.291787 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.40948:S 3695299654:3695299654(0) ack 1032533548 win 65535 <mss 1440,nop,wscale 3,sackOK,timestamp 3185067848 5487603>

10:09:55.316234 IP6 2a01:e35:8b50:2c40::7.40948 > 2a01:e0d:1:3:58bf:fa61:0:1.43: . ack 1 win 8211 <nop,nop,timestamp 5487628 3185067848>

10:09:55.317965 IP6 2a01:e35:8b50:2c40::7.40948 > 2a01:e0d:1:3:58bf:fa61:0:1.43: P 1:9(8) ack 1 win 8211 <nop,nop,timestamp 5487628 3185067848> 10:09:55.417301 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.40948: . ack 9 win 8210 <nop,nop,timestamp 3185067974 5487628>

Now the big packet that was missing in the broken session above:

10:09:56.084457 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.40948: . 1:1429(1428) ack 9 win 8210 <nop,nop,timestamp 3185068641 5487628>

The 6rd gateway replies with an ICMP6 message:

10:09:56.085221 IP6 2a01:e00:1:11::2 > 2a01:e0d:1:3:58bf:fa61:0:1: ICMP6, packet too big, mtu 1480, length 584

Missing data is retransmitted by the server using a lower packet size (and an entry is created in the server’s host cache to remember that):

10:09:56.085489 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.40948: . 1:1409(1408) ack 9 win 8210 <nop,nop,timestamp 3185068642 5487628> 10:09:56.085522 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.40948: . 1409:1429(20) ack 9 win 8210 <nop,nop,timestamp 3185068642 5487628>

Then the connection goes on to correct completion (no use showing the packets here).

Interestingly, trying an identical request then shows that the MSS negotiation takes the host cache into account, with a MSS set to 1420 instead of 1440 from the start in the server reply:

10:10:14.053218 IP6 2a01:e35:8b50:2c40::7.20482 > 2a01:e0d:1:3:58bf:fa61:0:1.43: S 2231600544:2231600544(0) win 65535 <mss 1440,nop,wscale 3,sackOK,timestamp 5506365 0>

10:10:14.053382 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.20482: S 2676514636:2676514636(0) ack 2231600545 win 65535 <mss 1420,nop,wscale 3,sackOK,timestamp 1128201317 5506365>

3. The simple fix

The fix is dead simple: just make sure that your filters are configured so that ICMP6 “packet too big”, type number 2, messages are correctly transmitted end-to-end, and correctly handled.


What to do on June 6th / IPv6 Launch day?

June 6th, 2012 is the “World IPv6 Launch” day: see

As it stands, it is presented as mainly oriented toward ISPs and hardware makers, giving the impression that home users are not concerned.

Actually IPv6 has begun deployment years ago, but has failed so far to be on the radar of most organizations, slowing its adoption.

So let’s get things straight, you can participate from your home:

  • if your home ISP doesn’t yet provide you with IPv6 connectivity yet, he will have to, in the not-too-distant future. Call them and ask them when!
  • if your home ISP does already provide you with IPv6, activate it on your Internet connection and on your computer! In France, Free Telecom and Nerim already have been providing IPv6 connectivity for years.
  • if you run a personal server, activate IPv6 on it if available, and if not, ask for support!

It may be a little too soon to pester mobile phone operators (3G and 4G) to get IPv6 connectivity from them. They are telcos, after all… but if you feel like it, don’t hesitate to ask them, too, what their IPv6 deployment schedule is.

For French users, the G6 association has a nice set of resources on IPv6:

Lossless import of MPEG-TS or HDV video files to iMovie

Here’s a little trick I learned and wanted to share. As it’s not complete, comments and additional hints are welcome!

The problem

I have a Canon HDV camcorder with many hours of HDV video. HDV is mpeg2-compressed video with a bitrate of about 25 Mbps.

I also have a MacOS X computer where I can run iMovie, Apple’s consumer-grade video editing application.

The camcorder video can be easily imported to FreeBSD using the built-in fwcontrol tool. It generates MPEG-TS files (mostly like IP TV channels) which read nicely in vlc, mplayer and other video tools. It’s easy and reliable.

The video can also be imported directly from the camcorder to iMovie, but it is painful and not adapted to easy archiving of the rushes. The import process is slow and buggy and you often have to rewind the tape and restart it.

I wanted to get the best of both worlds — fwcontrol’s easy import followed with iMovie editing.

But iMovie doesn’t know how to directly import MPEG-TS files. It can only import video from .mov (Quicktime) or .mp4 (MPEG4) containers. It’s difficult to know which video codecs are supported by iMovie but it seems to accept MPEG2, which means it can losslessly import HDV files, it’s just a matter of converting their container format from MPEG-TS to Quicktime.. It saves us from the slow, error-prone, lossy and painful process of transcoding.

So how do you do that?

The (mostly complete) solution

Here’s the incantation that mostly works for me. input.mpg is my MPEG-TS file; it can come from a fwcontrol import or from a IPTV capture (Freebox file for example); is the resulting Quicktime-container file:

ffmpeg -i input.mpg -acodec copy -vcodec copy

On my server (a double-core Intel Atom D525 processor with SATA disks, ie not a very fast machine) it converts at about 80-100 frames per second (3x to 4x real time), which is very fair (IO bound probably) and 12 to 20 times faster than transcoding the video.

From an IPTV capture you may have to explicitly transcode audio to AAC using -acodec libvo_aacenc instead.

Your second-best bet if the above doesn’t work is to let ffmpeg make a (much slower) almost-lossless transcoding to MPEG4, using option -sameq, yielding a bigger file (was almost twice as big as the original in my trials):

ffmpeg -i input.mpg -acodec copy -sameq

It works, but…

Why do I say it mostly works? Because there are two remaining gotchas:

  1. the original video timestamps (date and time of the video) are lost and set to the date and time of the conversion process — it’s constant and doesn’t even increment throughout the file duration. It is probably a ffmpeg bug. I tweaked the import with -copyts option but this apparently handles the time index from the camcorder (duration from the beginning of the tape). This may (or may not) be related to the following error message from ffmpeg: [NULL @ 0x806c1d920] start time is not set in av_estimate_timings_from_pts
  2. iMovie doesn’t seem to grok huge files. It works for a couple hundred megabytes, but not for a couple gigabytes. So you may have to split files take by take, and I don’t know how to do that easily, especially given the above regarding broken timestamps.

Thanks to Benjamin Sonntag for the excellent idea of using ffmpeg for this 😉

Comments and especially clues/solutions more than welcome 😉

TCP-Estimated round-trip test

In an attempt to evaluate different methods for measuring the performance of a TCP/IP connection, I’ve bumped into FreeBSD‘s getsockopt(TCP_INFO) system call, cloned from a similar call invented by Linux, which kindly returns interesting data about the current TCP connection.

I was mainly interested about round-trip time (RTT, called tcpi_rtt) and its standard deviation, mistakenly called tcpi_rttvar even though it’s not a variance.

I’ve written a small proof-of-concept tool accessible at to display operating system information retrieved from the current HTTP access. The page currently runs on a FreeBSD 9-CURRENT machine; feel free to try it out, it works either in IPv4 or IPv6. Here’s a sample as of today:

This experimental page displays raw system TCP estimates, in microseconds.

Address: 2a01:e35:8b50:2c40::4
Estimated round-trip time: 15437
Estimated standard deviation: 27937

Note that the measures are very rough. First, the real resolution is about 1 millisecond (one kernel tick), not 1 microsecond. Then, several RTT samples are smoothed into the provided values, with a bigger weight for more recent samples. I left the actual values obtained from the kernel, for clarity, even though giving them up to a 1 microsecond resolution is somewhat misleading.

Then, of course, the results also depend on the number of samples, which tends to be low: the above page waits for the client HTTP headers to be fully received, then emits its own headers in reply, then waits for one second to give some time for the TCP ack(s) to come back, then displays the then-current estimations.

The results are probably sufficient for TCP’s internal needs, but they may differ wildly from real RTT values. Plus, the real RTT value depends on packet size, which TCP doesn’t seem to take into account. The above example is taken from my local network and displays over 15 ms for the RTT, whereas the real RTT is well below 1 ms (0.23 min, 0.4 average with 0.01 standard deviation, according to ping). The results are not always wildly up, I’ve noticed the opposite effect from a remote mobile phone displaying ~100 ms whereas the ping time was more like ~200 ms…

Feel free to use it and add comments below.