Peeking Bear ┬┴┤•ᴥ•ʔ

You should set priority for your policy routing rules

Adding policy routing rules without priority makes me broke my network setup last week.

Context: I have two devices which each already have a WireGuard tunnel to my public server, which ip is within the range of routed subnet. And I try setting up a tailnet on them with headscale as coordination server. Also with the public server advertise same routes in the tailnet.

Two devices setup the WireGuard tunnel and the policy routing through different mechanisms, one uses systemd-networkd, other uses wg-quick + NetworkManager-dispatcher, and out of my laziness, I didn't specify a priority for the rules.

And the problem happened when I try bringing up tailscale on the one with networkd setup, it just... not responding after this. Since it's behind a NAT, and I didn't prepare any backup mechanism for this kind of situation, all I could do is waiting until I have physical access to it.

During the wait, I examined another device which has similar setup which only difference is this uses different mechanism. Which normally accepts the routes advertised in the tailnet and routed through them. And it didn't yield too much useful information excepts possible routing loop.

When I could directly access the not responding one, the only difference I saw was that the order of the rules is different. The rule routing traffic through original gateway server sits AFTER the tailscale ones, instead of BEFORE like the working one.

After messing with the settings and reading some documents, I found that if priority left unset during rule setup, the Linux kernel will just make it takes precedence over existing rules. And since the NetworkManger-dispatcher and systemd-networkd triggers at different point during bootup so former one will just sits before the tailscale rules and works fine, the latter can't since it will fall down to right before the default ones and make it behind the later added tailscale rules unless you explicitly set the priority in *.network files.

Maybe I should drop a raspi around that machine in case of this situation though.

#TIL #policy-routing