Logging in to a VPN shouldn’t cause DNS to stop working, right? Well, you’d like to think that, wouldn’t ya? And the problem wasn’t PEBKAC this time, either. Finding the right setting to fix the problem did take some time, but I got there in the end.
A computer by any other name
While doing some work for a customer1 recently, I needed to log in to their VPN to check on some stuff. I’d not used their VPN for about a year, so I expected some hiccups along the way. I mean, there’s always something, right? Network settings are changed, systems get updated, and the slow plod of time always causes some problem or other. But I wasn’t expecting DNS not to work when connected to their VPN.
Here was the situation. I was happily using the internet (as you do) and needed to access one of the customer’s services that is only available within their network. Thus, I needed to log in to their VPN. I entered my login details into their VPN client and voila! I was in their network. All as expected, right? Wrong. Any address I entered into my web browser gave me the same error: server not found. And yes, I was using the correct address. This situation made it rather difficult to access the services I needed to use.
Bring out the machine that goes ping
One of my first instincts when confronted with such a situation is to try to ping something. A habit (and probably a bad one at that) is to ping www.heise.de because, well, it’s always up. That resulted in this error:
$ ping -4 www.heise.de
ping: www.heise.de: Name or service not known
Ok, not good. How about an address closer to home? For instance, the customer’s main website? It turned out that pinging that address also wouldn’t resolve a name. Crikey! What to do now?
The hard part here was debugging the problem as an outside user. What else could I check? The IP address that the VPN client software had given me looked ok:
$ ip addr
<snip>
10: cscotun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1390 qdisc
pfifo_fast state UNKNOWN group default qlen 500
link/none
inet x.x.x.x/22 brd x.x.x.255 scope global cscotun0
valid_lft forever preferred_lft forever
inet6 fe80:<la-la-la>/126 scope link noprefixroute
valid_lft forever preferred_lft forever
The list of routes also looked good:
$ ip route list
default via 192.168.2.1 dev wlp2s0 proto dhcp metric 600
x.x.x.x dev cscotun0 proto unspec scope link
<snip>
Logging out of the VPN, I contacted the IT staff at the customer’s site and told them of my sorrows. Unfortunately, as far as the network engineers on the other side could tell, everything looked good. Nothing in their DNS had changed. After all, the DNS servers were still the same as they’d always been. These servers also appeared in my route list. Hrm.
Persistence pays off
Some time passed, and after some more discussion, we (the IT staff and I) still weren’t any closer to resolving the issue. Thus, I decided to dig a bit further on my side to see if there was anything I could do to fix things. After all, it wouldn’t be the first time that I’d set some configuration value on my laptop and then forgotten that I’d set it, and it ended up provoking some weird problem a long time later. Lots of digging ensued, but there wasn’t anything obvious that seemed to be causing the issue. Through stubborn perseverance, I eventually honed in on the NetworkManager service’s configuration as a possible origin of my woes.
One thing with DNS on a Linux box is that (historically) this is configured in /etc/resolv.conf. Having a look at this file with the VPN off showed something like this:
$ cat /etc/resolv.conf
# Generated by NetworkManager
search internet-provider.com
nameserver 192.168.2.1
nameserver fe80::1%wlp2s0
Then, when logged in to the VPN, NetworkManager changed this file’s contents to:
$ cat /etc/resolv.conf
# Generated by NetworkManager
search internet-provider.com customer.com
nameserver 192.168.2.1
nameserver fe80::1%wlp2s0
Ok, so the only change is to add the customer’s domain to the resolver’s search settings. This shouldn’t affect name resolution. In fact, I would expect it to help with name resolution.
Using a bit of old-fashioned prompt-engineering (i.e. googling), I stumbled upon NetworkManager’s dns configuration parameter. It controls the DNS processing mode and is set in the [main] configuration section of the /etc/NetworkManager/NetworkManager.conf file.
The comment in /etc/resolv.conf:
# Generated by NetworkManager
somehow gave me the idea that maybe NetworkManager didn’t need to update this file after all. And that maybe some other software component, which was also trying to update /etc/resolv.conf, would get the chance to set things properly. Thus, I tried setting dns=none to stop NetworkManager modifying resolv.conf.
I.e., I changed /etc/NetworkManager/NetworkManager.conf to look like this in the [main] section:
[main]
... other settings
dns=none
Restarting the service to re-read the config,
$ sudo service NetworkManager restart
and sure enough, something was changing /etc/resolv.conf when in the VPN and was changing it back after leaving the VPN. My guess, retrospectively, is that this was the VPN client software. Since, in the default configuration, NetworkManager thinks that it should have full control over this file, it kept changing /etc/resolv.conf back to what it thought was the right config, which then screwed up DNS for me.
Here’s what /etc/resolv.conf looked like for me now after connecting to the VPN:
$ cat /etc/resolv.conf
domain customer.com
nameserver <dns-ip-1>
nameserver <dns-ip-2>
search customer.com internet-provider.com
Something about this change told me I was onto something. Especially the new nameserver values signalled an improvement.
Importantly, I could now ping www.heise.de while in the VPN:
$ ping -4 www.heise.de
PING (193.99.144.85) 56(84) bytes of data.
64 bytes from www.heise.de (193.99.144.85): icmp_seq=1 ttl=249 time=9.13 ms
64 bytes from www.heise.de (193.99.144.85): icmp_seq=2 ttl=249 time=10.1 ms
^C
--- ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 9.134/9.607/10.080/0.473 ms
Brilliant!
Now I could access the services I needed to use and continued merrily on my way.
Everything works until it doesn’t
… until …
Time did that thing again that it always does: change stuff. sigh
For other network-related reasons, I needed to update some packages on my (admittedly very outdated) dev system. While watching the progress of an apt upgrade command whiz past, a package name caught my attention: resolvconf. Oh dear.
Guess what? DNS no longer worked. It even turned out that when I wasn’t connected to the VPN that dns=none needed to be removed for name resolution to work. But dns=none needed to be set (and NetworkManager restarted) for DNS to work inside the VPN. Bugger.
Man, this was so not a solution. I can’t be changing the NetworkManager config and restarting it each time I change a network connection!
Read the fine manual
Thus, lots more reading ensued, including more of the NetworkManager.conf man page. One of the big hints in the section about the dns option is the comment that:
If the key is unspecified, default is used, unless /etc/resolv.conf is a symlink to /run/systemd/resolve/stub-resolv.conf, /run/systemd/resolve/resolv.conf, /lib/systemd/resolv.conf or /usr/lib/systemd/resolv.conf. In that case, systemd-resolved is chosen automatically.
An idea formed in my mind. What if I used systemd-resolved? Let’s give it a go! YOLO and all that jazz.
It turned out that the systemd-resolved package wasn’t installed on my laptop. No worries, that’s easily fixed:
$ sudo apt install systemd-resolved
Next, I changed the dns setting in NetworkManager.conf to use systemd-resolved:
[main]
... other settings ...
dns=systemd-resolved
After restarting NetworkManager:
$ sudo service NetworkManager restart
… everything worked! By that I mean I could change the network I was using (i.e. to the VPN and back again) and still have name resolution. Phew!
So, it turned out that it was a local setting causing problems. It wasn’t, however, a setting that I’d mucked about with before all these problems appeared, so I couldn’t blame myself (at least, not completely), as is often the case.
The solution turned out not to be to set dns=none and let the VPN software (or whatever) work out what to do with DNS resolution and whatnot in /etc/resolv.conf. The better, more sustainable solution turned out to be to use systemd-resolved to handle things for me. Great! Now I can get back to work!
It’s not DNS
As it turns out, “it’s not DNS” is a meme, at least among sysadmins. There are even t-shirts and haikus and stuff!
It’s not DNS
There’s no way it’s DNS
It was DNS
Things ya learn!
Persistence still pays off
So what did we learn? Computers are hard. Networks are also hard. The fact that the internet, with so many computers connected to it, functions at all is something short of a miracle. It’s also a PITA when things that usually “just work”–like DNS–don’t. But, stubbornness, persistence, and liberal use of RTFM saved the day in the end.
- If you need a software developer who is stubborn, persistent, and thorough, give me a yell! I’m available for freelance Python/Perl backend development and maintenance work. Contact me at paul@peateasea.de and let’s discuss how I can help solve your business’s hairiest problems. ↩
Top comments (0)