Category Archives: ipv6

Converting a FreeBSD ezjail configuration to VNET

I have recently converted my self-hosted FreeBSD jails (including this very blog) to the VNET architecture.

A few words about VNET

The purpose of this post is not to explain jails, or VNET, but to provide examples for migration from the traditional jail networking environment (in my case, using ezjail), to the VNET architecture. There are numerous documents online for jail environments based on iocage, but not that much about ezjail-based ones.

Up to VNET, networking in jails had severe limitations on addressing, in particular limitations on the loopback interfaces (::1 and 127.0.1) and usage of IP aliases, which caused numerous configuration headaches. This was due to the jails sharing network interfaces and the full networking stack with the host. It was possible to alleviate some of this with multiple routing tables (setfib & al), but it was still limited.

VNET allows the jails to run networking stacks totally separated from the host’s, like it would in a fully virtualized guest. As a consequence, it allows running virtual routers with specific firewalls filters to better organize and isolate jail networking.

VNET basically works by moving network interfaces to the guest jails, in a separate instance of the network stack, hiding them from the host environment. This is done at jail startup, but it can also be done dynamically to a running jail with:

ifconfig <interface> vnet <jail_id>

VNET works on any kind of interface: physical or virtual. It is thus perfectly possible to assign a physical interface, or a VLAN tagged interface, etc, to a jail.

Enabling VNET

First, we need to enable VNET in the kernel. From FreeBSD 12, the default kernel has VNET already, so there is nothing to do, unless you have a custom kernel. On FreeBSD 11, you need to recompile a kernel after adding the following line:

options VIMAGE # Subsystem virtualization, e.g. VNET

VNET and ezjail

What to do next with ezjail?

ezjail‘s configuration files are stored in /usr/local/etc/ezjail, one file per jail, named after the jail’s name. ezjail uses environment variables based on the former jail configuration variables stored in /etc/rc.conf. Under the hood, the system converts these lines to the new jail syntax, .conf files stored in /var/run.

The line that configures networking looks like the following (may be wrapped on your screen):

export jail_jailname_ip="re0|192.168.0.17,re0|2a01:e34:ec2a:94a0::11,lo0|127.0.0.17"

To convert this configuration to VNET, we have to:

  • disable the traditional jail networking system: this done by providing an empty value for the above line
  • enable VNET for the jail
  • specify the VNET interface(s) the jail is going to use

Which is done using the following lines:

export jail_jailname_ip=""
export jail_jailname_vnet_enable="YES"
export jail_jailname_vnet_interface="epair17b"

Note that we don’t specify IP addresses or the loopback interface anymore. Configuration will be done by the jail itself, possibly in the regular /etc/rc.conf way:

ifconfig_epair17b="192.168.0.17/24"
ifconfig_epair17b_ipv6="2a01:e34:ec2a:94a0::11/64"

We still have to create the interface the jail is going to use, here epair17b. I chose the epair/if_bridge architecture as it seemed the most flexible and easier to get a grip of, but it is also possible to use netgraph-based interfaces, or anything other the system supports.

epair interfaces are 2 virtual network interfaces linked with a virtual crossover cable. if_bridge is a bridge interface which switches traffic between the interfaces you attach to it. By combining both and adding routers, you can create any virtual network architecture.

To prepare the interfaces,

ifconfig epair17

creates two interfaces, epair17a and epair17b.

epair17b will be given to the jail; epair17a will stay on the host, and will have to get connectivity somehow. This is typically done by making it a bridge member.

epair17a may or may not have an IP address assigned to it (it does not need one if it is only used for bridging), but it needs to be up:

ifconfig epair17a up

We also need to add one of the interfaces to a bridge, so it gets connectivity to the rest of the network:

ifconfig bridge0 create up
ifconfig bridge0 addm epair17a

To make it easier to understand, I made a view images showing possible architectures.

First, example of a basic configuration where all the jails are configured on the same local network as the host through bridge0, mimicking the traditional jail networking.

Figure 1

Here, the jails are organized on two separate subnetworks, with Host possibly providing IP routing and firewalling.

Figure 2

Lastly, on Figure 3, another architecture where the first group of guests, Guest 1 and Guest 2, is directly configured on the local network, whereas Guest 4 and Guest 5 are connected through virtual router Guest 3. For example, this can be used in a setting where Guest 1 and Guest 2 provide the front-end to a service, and Guest 3 and Guest 4 provide the backend (databases, etc). Guest 4 and Guest 5 don’t even need full connectivity to the Internet, this can be enforced with firewall rules on Host or Guest 3.

Figure 3

Making the configuration persistent

The above commands were meant to explain the workings of the setup, but they are ephemeral. The configurations need to be made persistent in the boot configuration of Host, for example in /etc/rc.conf:

cloned_interfaces="bridge0 bridge1 epair1 epair2 ... ifconfig_bridge0="up addm re0 addm epair1a addm epair2a ..."
ifconfig_epair1a="up"

Note that the epair interfaces on the guests don’t need to be up from the host configuration. The guest startup code will manage this.

Using jib to create/destroy interfaces dynamically

The above static configuration has a small issue: VNET takes quite some time (dozens of seconds) to reassign an interface of a deleted jail to the host, making it invisible in the meantime. This means that a jail restart will fail for lack of the adequate interface.

To avoid this, and create persistent MAC addresses for the interface, which comes-in handy, there are scripts provided in /usr/share/examples/jails, jib (for epair/bridge-based interfaces) and jng (for netgraph-based interfaces).

We just need to install these scripts in /usr/local/sbin and make them executable.

cp /usr/share/examples/jails/jib /usr/local/sbin
chmod a+rx /usr/local/sbin/jib
cp /usr/share/examples/jails/jng /usr/local/sbin
chmod a+rx /usr/local/sbin/jng

jib creates epair interfaces and adds one interface of the pair to a bridge connected to an output interface, ie:

jib addm TEST re0

will create interfaces e0a_TEST and e0b_TEST and add e0a_TEST to a bridge named re0bridge if it exists, or failing that, create such a bridge and connect it to re0. The jail will be configured to use nterface e0b_TEST.

The cherry on the cake with jib/jng : they try and keep MAC addresses persistent.

To create and destroy interfaces dynamically with ezjail, instead of tweaking /etc/rc.conf, we only need to add the following lines to the ezjail configuration file for the jail:

export jail_jailname_vnet_enable="YES"
export jail_jailname_vnet_interface="e0b_jailname"
export jail_jailname_exec_prestart0="/usr/local/sbin/jib addm jailname re0"
export jail_jailname_exec_poststop0="/usr/local/sbin/jib destroy jailname"

Notes

Note that it is possible to directly set-up IP addresses on bridge0 bridge1 etc, which may save a couple of epair interfaces in the second and third examples. This is left as an exercise for the reader.

Also, it seems currently difficult or impossible to use VLAN interfaces (if_vlan) in a bridge configuration. I’m still digging on this subject.

References

I have found the following pages useful when preparing my setup and this post:

https://www.reddit.com/r/freebsd/comments/je9oxv/can_i_add_vnet_to_an_ezjail/

https://yom.iaelu.net/2019/03/freebsd-12-vnet-jail-using-bridge-epair-and-pf.html

https://www.cyberciti.biz/faq/how-to-configure-a-freebsd-jail-with-vnet-and-zfs/

Thanks to Jacques Foucry for his work on the nice graphics, Mat Arnold for pointing me to /usr/share/examples/jails and Éric Walter for the idea of the SVG WordPress plugin, avoiding the use of pixelated graphics 🙂

Le VDSL2, ou le haut débit pour ceux qui l’ont déjà

Bon, celle là je la fais rapide, parce que je n’ai pas le temps de fignoler les références.

J’espère ne pas trop proférer d’énormités, les experts sont bienvenus pour me corriger si nécessaire dans les commentaires.

Nos fournisseurs d’accès nationaux annoncent en grande pompe le déploiement national du VDSL2 ce 1er octobre, sous le haut patronage de l’ARCEP qui a bien voulu les y autoriser (on est en France, on n’est pas là pour déployer n’importe quoi sans l’aval des autorités et la validation de France Télécom qui a encore la mainmise sur le réseau cuivre).

Le VDSL2, en gros, c’est une évolution de l’ADSL qui permet des débits plus élevés en changeant le matériel électronique à chaque extrémité, c’est à dire d’une part du côté du central téléphonique (vulgairement appelé NRA, nœud de raccordement d’abonnés) dans un équipement appelé DSLAM ; d’autre part à l’autre bout du côté de chez nous, c’est à dire dans le modem ou la box ADSL.

Le VDSL2 est présenté comme une manière économique d’accéder quasiment aux débits de la fibre optique, sans avoir à tirer pour cela de nouveaux câbles dans la voirie.

Et ce n’est pas faux, mais ce n’est pas tout à fait vrai non plus.

Car le VDSL2 n’est intéressant qu’en deçà d’une longueur de ligne de 1200 à 1500 mètres (Free place la barre à 1200 mètres). Habitant pourtant en zone très dense, en plein Paris, je suis à 1279 mètres de mon NRA. Dommage. En même temps, cela m’évitera la tentation de perdre mon temps avec cette technologie temporaire.

Autrement dit, le VDSL2 n’est intéressant (à condition d’avoir une box récente, le supportant) que pour les gens qui ont déjà de l’ADSL de très bonne qualité, profitant déjà de 15 à 20 Mbps en lien descendant. Ce sont donc, déjà, les plus privilégiés des abonnés cuivre.

En zone rurale ou moins dense, la norme est plutôt aux longueurs de ligne de 3 à 10 km. Autrement dit, les déjà moins bien connectés (qui bénéficient royalement de débits de 512 Kbps à 2 Mbps)  le resteront tant qu’ils n’auront pas la fibre.

En revanche, on peut établir des petits locaux de répartition intermédiaires, reliés généralement au NRA par une fibre optique, et qui permettent de réduire la longueur du cuivre jusqu’à l’abonné final, afin de profiter au mieux des progrès du VDSL2. Cette petite contrainte fait partie quasi-intégrante de la technologie. Mais cela réclame des travaux et donc, à l’heure actuelle, le VDSL2 qui est testé depuis près d’un an n’est pas déployé de cette façon.

De mon côté j’attends, donc, toujours la fin de pose de la fibre chez moi, le seul vrai support de transmission d’avenir.

Voilà.

Mise à jour : un article sur ZDNet beaucoup plus détaillé montrant que les locaux de répartition ne sont même pas possibles en France.

Bad idea: Gmail now discriminates against mail servers without an IPv6 reverse

This new gem is from the SMTP Gmail FAQ at https://support.google.com/mail/answer/81126?hl=en

(Fun note: they call it the “Bulk Senders Guidelines”… hence apparently anyone running their own personal mail server falls in that category…)

“Additional guidelines for IPv6

 

  • The sending IP must have a PTR record (i.e., a reverse DNS of the sending IP) and it should match the IP obtained via the forward DNS resolution of the hostname specified in the PTR record. Otherwise, mail will be marked as spam or possibly rejected.
  • The sending domain should pass either SPF check or DKIM check. Otherwise, mail might be marked as spam.”

I happen to be running my own mail server, and I happen to also be IPv6-connected, and finally I happen to be lacking a reverse DNS delegation for IPv6 because my ISP (Free) didn’t yet bother providing me with one.

I’m happier than most as my mail is sent through the eu.org server, which happens to get its mail accepted by Gmail. But it ends up tagged as “spam”.

I’m not the only one in France. OVH is reported as having the same problem.

So what are my points?

  • obviously, my ISP should provide me with a correctly delegated IPv6 reverse… at some point, of course the sooner would be the better.
  • but, as has been determined for over 15 years now with IPv4, refusing mail based on a lacking reverse delegation is counter-productive… since spammers statistically tend to send spam from hosts with a reverse more often than legitimate users!
  • so measures like the above end up bothering legitimate users more than spammers.

So I hope Google will step back on this one, whether or not the reverse problem gets fixed.

 

 

 

IPv6 ICMP “packet too big” filtering considered harmful

If you intend to seriously run Internet servers or firewalls in the future (hence, IPv6 servers and firewalls), please read this.

This problem is so well-known, so old and yet still so unfixed and pervasive nowadays that, after pulling my hair for days on many hanging or time-outing IPv6 sessions, I felt I had to write this.

Executive summary: there are a huge number of sites with misconfigured firewalls who filter out “ICMP6 packet too big” packets. This breaks Path MTU discovery, causing hanging or broken IPv6 sessions.

Many sites unknowingly assume that the Internet MTU is at least 1500 bytes. This is wrong, whether in IPv4 or IPv6.

Many Internet hosts are connected through tunnels reducing the real MTU. Use of PPPoE for example, on ADSL links, reduces the MTU by a few bytes, and use of 6rd (“6 rapid deployment” tunneling) reduces it more than that. As 6rd is used extensively in France (Free ISP), this is a big problem.

1. The symptom: hanging IPv6 connections

Here’s a sample capture for a request where the server has more than 1 data packet.

08:39:57.785196 IP6 2a01:e35:8b50:2c40::7.39738 > 2001:xxx.43: S 165844086:165844086(0) win 65535 <mss 1440,nop,wscale 3,sackOK,timestamp 901

08:39:57.807709 IP6 2001:xxx.43 > 2a01:e35:8b50:2c40::7.39738: S 883894656:883894656(0) ack 165844087 win 14280 <mss 1440,sackOK,timestamp 2377433946 90108,nop,wscale 7>

08:39:57.808452 IP6 2a01:e35:8b50:2c40::7.39738 > 2001:xxx.43: .ack 1 win 8211 <nop,nop,timestamp 90132 2377433946>

08:39:57.808655 IP6 2a01:e35:8b50:2c40::7.39738 > 2001:xxx.43: P 1:9(8) ack 1 win 8211 <nop,nop,timestamp 90132 2377433946>

08:39:57.833052 IP6 2001:xxx.43 > 2a01:e35:8b50:2c40::7.39738: .ack 9 win 112 <nop,nop,timestamp 2377433972 90132>

08:39:57.888981 IP6 2001:xxx.43 > 2a01:e35:8b50:2c40::7.39738: P 1:1025(1024) ack 9 win 112 <nop,nop,timestamp 2377434026 90132>

(missing packet here : 1025:2453 containing 1428 bytes)

08:39:57.889315 IP6 2001:xxx.43 > 2a01:e35:8b50:2c40::7.39738: FP 2453:2723(270) ack 9 win 112 <nop,nop,timestamp 2377434027 90132> 08:39:57.890100 IP6 2a01:e35:8b50:2c40::7.39738 > 2001:xxx.43: .ack 1025 win 8211 <nop,nop,timestamp 90213 2377434026,nop,nop,sack 1 {2453:2723}>

(session hangs here, unterminated because of the missing bytes)

This is difficult to debug as modern Unices have a “TCP host cache” keeping track of Path MTUs on a host-by-host basis, causing the problem to suddenly disappear. in unpredictable ways depending on the size of transmitted data.

2. A sample successful session with working trial-and-error Path MTU discovery

10:09:55.291649 IP6 2a01:e35:8b50:2c40::7.40948 > 2a01:e0d:1:3:58bf:fa61:0:1.43: S 1032533547:1032533547(0) win 65535 <mss 1440,nop,wscale 3,sackOK,timestamp 5487603 0>

10:09:55.291787 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.40948:S 3695299654:3695299654(0) ack 1032533548 win 65535 <mss 1440,nop,wscale 3,sackOK,timestamp 3185067848 5487603>

10:09:55.316234 IP6 2a01:e35:8b50:2c40::7.40948 > 2a01:e0d:1:3:58bf:fa61:0:1.43: . ack 1 win 8211 <nop,nop,timestamp 5487628 3185067848>

10:09:55.317965 IP6 2a01:e35:8b50:2c40::7.40948 > 2a01:e0d:1:3:58bf:fa61:0:1.43: P 1:9(8) ack 1 win 8211 <nop,nop,timestamp 5487628 3185067848> 10:09:55.417301 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.40948: . ack 9 win 8210 <nop,nop,timestamp 3185067974 5487628>

Now the big packet that was missing in the broken session above:

10:09:56.084457 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.40948: . 1:1429(1428) ack 9 win 8210 <nop,nop,timestamp 3185068641 5487628>

The 6rd gateway replies with an ICMP6 message:

10:09:56.085221 IP6 2a01:e00:1:11::2 > 2a01:e0d:1:3:58bf:fa61:0:1: ICMP6, packet too big, mtu 1480, length 584

Missing data is retransmitted by the server using a lower packet size (and an entry is created in the server’s host cache to remember that):

10:09:56.085489 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.40948: . 1:1409(1408) ack 9 win 8210 <nop,nop,timestamp 3185068642 5487628> 10:09:56.085522 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.40948: . 1409:1429(20) ack 9 win 8210 <nop,nop,timestamp 3185068642 5487628>

Then the connection goes on to correct completion (no use showing the packets here).

Interestingly, trying an identical request then shows that the MSS negotiation takes the host cache into account, with a MSS set to 1420 instead of 1440 from the start in the server reply:

10:10:14.053218 IP6 2a01:e35:8b50:2c40::7.20482 > 2a01:e0d:1:3:58bf:fa61:0:1.43: S 2231600544:2231600544(0) win 65535 <mss 1440,nop,wscale 3,sackOK,timestamp 5506365 0>

10:10:14.053382 IP6 2a01:e0d:1:3:58bf:fa61:0:1.43 > 2a01:e35:8b50:2c40::7.20482: S 2676514636:2676514636(0) ack 2231600545 win 65535 <mss 1420,nop,wscale 3,sackOK,timestamp 1128201317 5506365>

3. The simple fix

The fix is dead simple: just make sure that your filters are configured so that ICMP6 “packet too big”, type number 2, messages are correctly transmitted end-to-end, and correctly handled.

 

What to do on June 6th / IPv6 Launch day?

June 6th, 2012 is the “World IPv6 Launch” day: see http://www.worldipv6launch.org/

As it stands, it is presented as mainly oriented toward ISPs and hardware makers, giving the impression that home users are not concerned.

Actually IPv6 has begun deployment years ago, but has failed so far to be on the radar of most organizations, slowing its adoption.

So let’s get things straight, you can participate from your home:

  • if your home ISP doesn’t yet provide you with IPv6 connectivity yet, he will have to, in the not-too-distant future. Call them and ask them when!
  • if your home ISP does already provide you with IPv6, activate it on your Internet connection and on your computer! In France, Free Telecom and Nerim already have been providing IPv6 connectivity for years.
  • if you run a personal server, activate IPv6 on it if available, and if not, ask for support!

It may be a little too soon to pester mobile phone operators (3G and 4G) to get IPv6 connectivity from them. They are telcos, after all… but if you feel like it, don’t hesitate to ask them, too, what their IPv6 deployment schedule is.

For French users, the G6 association has a nice set of resources on IPv6: http://g6.asso.fr/

IPv4 est presque mort

Les statistiques d’épuisement des adresses IPv4 libres indiquent actuellement 91 jours de « stock » dans les registres régionaux, soit un épuisement début mars 2011.

En fait il s’agit d’un épuisement « au plus tard » car le rythme d’allocation a une nette tendance à l’accélération : le 14 juillet 2010 (date mémorable du plantage de france.fr le jour même de son lancement), le compteur indiquait 365 jours soit un épuisement tombant le 14 juillet 2011.

Autrement dit, certains ont déjà commencé à paniquer et stockent des adresses en douce. Mise à jour : Stéphane Bortzmeyer me signale que la raison en est que 5 plages d’adresses n’avaient pas été prises en compte dans les statistiques. Article plus détaillé signalé sur Twitter par @nkgl.

Le temps que les adresses des registres régionaux (continentaux) « percolent » jusqu’aux allocataires (les sites utilisateurs), il y aura environ un an de répit.

Ensuite, les seules adresses faciles à obtenir étant les adresses IPv6, les premiers sites accessibles seulement en IPv6 commenceront à apparaître.

Autrement dit, il ne suffit pas d’avoir déjà sa petite adresse IPv4 pour se sentir non concerné. Internet va inévitablement perdre pendant quelques années sa connectivité complète, les inévitables sites en IPv6-seul ne pourront plus joindre les sites en IPv4-seul tant que ces derniers ne seront pas passés à IPv6 eux aussi. On peut simplement espérer que cette période transitoire durera le moins longtemps possible. Il est probable que de nombreux vendeurs d’huile de serpent et de technologies de semi-transition plus ou moins bancales vont apparaître, dans la lignée du bug « an 2000 ».

J’invite les non-convaincus qui se disent que rien ne presse à lire l’excellent article en anglais de Geoff Huston qui démonte quelques mythes tenaces..

TCP-Estimated round-trip test

In an attempt to evaluate different methods for measuring the performance of a TCP/IP connection, I’ve bumped into FreeBSD‘s getsockopt(TCP_INFO) system call, cloned from a similar call invented by Linux, which kindly returns interesting data about the current TCP connection.

I was mainly interested about round-trip time (RTT, called tcpi_rtt) and its standard deviation, mistakenly called tcpi_rttvar even though it’s not a variance.

I’ve written a small proof-of-concept tool accessible at http://eu.org:4500/ to display operating system information retrieved from the current HTTP access. The page currently runs on a FreeBSD 9-CURRENT machine; feel free to try it out, it works either in IPv4 or IPv6. Here’s a sample as of today:

This experimental page displays raw system TCP estimates, in microseconds.

Address: 2a01:e35:8b50:2c40::4
Estimated round-trip time: 15437
Estimated standard deviation: 27937

Note that the measures are very rough. First, the real resolution is about 1 millisecond (one kernel tick), not 1 microsecond. Then, several RTT samples are smoothed into the provided values, with a bigger weight for more recent samples. I left the actual values obtained from the kernel, for clarity, even though giving them up to a 1 microsecond resolution is somewhat misleading.

Then, of course, the results also depend on the number of samples, which tends to be low: the above page waits for the client HTTP headers to be fully received, then emits its own headers in reply, then waits for one second to give some time for the TCP ack(s) to come back, then displays the then-current estimations.

The results are probably sufficient for TCP’s internal needs, but they may differ wildly from real RTT values. Plus, the real RTT value depends on packet size, which TCP doesn’t seem to take into account. The above example is taken from my local network and displays over 15 ms for the RTT, whereas the real RTT is well below 1 ms (0.23 min, 0.4 average with 0.01 standard deviation, according to ping). The results are not always wildly up, I’ve noticed the opposite effect from a remote mobile phone displaying ~100 ms whereas the ping time was more like ~200 ms…

Feel free to use it and add comments below.

Sans commentaire

Je rencontre quelques problèmes avec la détection anti-spam des commentaires sur ce blog. Il peut arriver que des commentaires soient incorrectement placés en modération, voire carrément classés comme spam sans autre forme de procès. J’essaie de les repêcher mais il m’est déjà arrivé d’en louper, n’hésitez pas à me relancer si vous ne voyez pas les vôtres apparaître après plusieurs jours. Je n’ai pas encore exploité l’ensemble des réglages possibles (whitelist, blacklist, etc) sur WordPress et spam-karma, donc la situation évolue.

La raison profonde en est que depuis quelques temps, j’ai dû mettre en place un système compliqué de relais (aka proxy) inverse pour les accès IPv4 à ce blog, ce qui perturbe l’anti-spam qui se trouve ne plus voir directement l’adresse IPv4 de l’auteur d’un commentaire.

Pour la petite histoire, cette usine à gaz technique destinée à tout faire fonctionner sur une seule adresse IPv4 n’aurait aucune raison d’être si tout le monde utilisait IPv6, ce fameux truc-qui-ne-sert-à-rien.

Accessing the IPv6 web with squid 3.1

That’s it, Squid 3.1 is in the FreeBSD ports. Squid 3.1 is still a beta but it’s nearing a release.

This means I can now access ipv6.google.com (and obviously other IPv6 sites), through my local proxy. Unfortunately I’m redirected to the French version, which seems to lack the dancing “Google” letters. The good old Kame turtle works, on the other end. How’s that for a killer-app?

IPv6 is the unleaded gas (or the organic food) of IP: it doesn’t seem to bring anything concrete, unless you look at the bigger picture.