DNS Resolver and cURL

Hi All,

I’m trying to use curl (as per the sample on github) to get some data from Legato to a https site. There’s something weird going on with the way that the DNS look-up is being done inside my Legato app.

Legato 16.10.1 (it’s a FX30 and that’s the latest available at the moment); eth0 is set up to have a non-routable IP address but not plugged into any network; connection via USB; and a SIM card in and connected to the network (manually using CM data connect) and cm data returns that the network is connected.

I don’t think it’s curl on the WP, as I’ve used tcpdump to monitor traffic on port 53 (dns).

When I use curl from command line, the command completes sucessfully. tcpdump reports:


07:27:21.040890 IP 100.71.231.xxx.35965 > google-public-dns-a.google.com.domain: 42855+ A? www.example.com.au. (51)
07:27:21.041958 IP 100.71.231.xxx.35965 > google-public-dns-a.google.com.domain: 31148+ AAAA? www.example.com.au. (51)
07:27:21.945178 IP 100.71.231.xxx.56804 > google-public-dns-a.google.com.domain: 63014+ PTR? xxx.231.71.100.in-addr.arpa. (45)
07:27:22.576309 IP google-public-dns-a.google.com.domain > 100.71.231.xxx.35965: 42855 1/0/0 A 103.229.49.xx (67)
07:27:22.736327 IP google-public-dns-a.google.com.domain > 100.71.231.xxx.35965: 31148 0/1/0 (127)
07:27:22.785679 IP google-public-dns-a.google.com.domain > 100.71.231.xxx.56804: 63014 NXDomain 0/1/0 (102)
07:27:22.787205 IP 100.71.231.xxx.33975 > google-public-dns-a.google.com.domain: 43778+ PTR? 8.8.8.8.in-addr.arpa. (38)
07:27:23.845743 IP google-public-dns-a.google.com.domain > 100.71.231.xxx.33975: 43778 1/0/0 PTR google-public-dns-a.google.com. (82)

When I run curl from my app using libcurl as per the sample, curl fails with a ‘unable to resolve’ error. tcpdump reports:


05:39:53.369138 IP localhost.localdomain.41887 > localhost.localdomain.domain: 2730+ A? www.example.com.au. (51)
05:39:53.369565 IP localhost.localdomain.41887 > localhost.localdomain.domain: 31712+ AAAA? www.example.com.au. (51)
05:39:58.374082 IP localhost.localdomain.41887 > localhost.localdomain.domain: 2730+ A? www.example.com.au. (51)
05:39:58.374418 IP localhost.localdomain.41887 > localhost.localdomain.domain: 31712+ AAAA? www.example.com.au. (51)
05:40:03.376432 IP localhost.localdomain.54734 > localhost.localdomain.domain: 1008+ A? www.example.com.au. (51)
05:40:03.376768 IP localhost.localdomain.54734 > localhost.localdomain.domain: 35839+ AAAA? www.example.com.au. (51)
05:40:08.382292 IP localhost.localdomain.54734 > localhost.localdomain.domain: 1008+ A? www.example.com.au. (51)
05:40:08.382628 IP localhost.localdomain.54734 > localhost.localdomain.domain: 35839+ AAAA? www.example.com.au. (51)

So it looks like the DNS resolver is not doing the correct thing for the legato app.

As per the sample, I’ve made sure that the hosts file, resolve.conf etc are copied into the app via the ‘require’ mechanism. I’ve also looked in the /legato/current/… directory while my app has been running and can see that the files have been copied across and appear to be the same as those in the main filesystem.

I’m using the google DNS resolvers - 8.8.8.8 and 8.8.4.4 are listed in /etc/resolve.conf

Note: I’m not using the ‘data connection’ api in my Legato app, but I am certain that the network is connected and working as I can test it from the command line.

Any thoughts? I get on a plane on Sunday night to demo to clients Monday/Tuesday next week.

ciao

Hi @davidc,

Could you have a look at the apps/sample/httpGet sample app and mount the same files?
You can also try a strace -f -p <pid of your app exec>.

Worst case scenario, if it doesn’t work, you can disable the sandbox through sandbox: false in the adef for your demo.

I think this problem might be related to issue LXSWI9X1517-185. I worked around this problem by bundling a resolv.conf into my app instead of requiring the system’s resolv.conf. I bundled a resolv.conf which uses 8.8.8.8 for DNS. This could be problematic if you are required to use the DNS assigned by DHCP.

FYI LXSWI9X1517-185 is ‘Can’t stat bind mounted file after removing source file from aufs’.

Tricky issue, as I imagine that the inode of /etc/resolv.conf changes overtime while the mount --bind is only done once by the supervisor, when the app starts. So it might be okay at some point but as soon as /etc/resolv.conf gets rewritten the bind in the sandbox is dead. :confused:

I thought that initially, but I set my app to manual start and did cm data connect before starting my app and I still had the problem. So I wouldn’t expect that in that case the resolv.conf would have been changed after starting my app.

HI.

Thanks for getting back to me.

@CoRfr: I’m travelling at the moment, and will try your suggestion to use the sample files and see what happens.

I have hacked together a workable solution for the time being. After looking through what tcpdump was reporting, I realized that the DNS resolver in the sandbox was always trying to use the ‘hosts’ (localhost) method to resolve the name, and not falling over to use the ‘bind’ (resolv.conf) method.

I first tried by ‘requiring’ /etc/host.conf in my application - but nothing changed. In the end I got around this by bundling a custom host.conf file in my component:

# custom /etc/host.conf
order bind
multi on

as well as the custom resolv.conf as suggested by @dfrey

But what I don’t understand is why the resolver library in the app is not failing over from the ‘hosts’ method to the ‘bind’ method as specified in the default host.conf file - which works fine when run from the command line?

ciao, Dave

Hi,

Just a thought on this, try changing the order of hosts: in /etc/nsswitch.conf to put dns first, it may solve the problem

Hi All.

@mahtab: Thanks for this. I hadn’t thought of the nssswitch setup.

Back to fighting with this.

I tried @dfrey suggestion of bundling a resolv.conf file with the application to see if that works.

But, as I have to run the application un-sandboxed, this local resolv.conf does not seem to be applied to the application at runtime.

I’ve proved this by using the resolver data gathered from the resolver library - see man resolver 3. My local resolv.conf has two entries - 8.8.8.8 and 8.8.4.4, my application is reporting three entries that appear to be the same as those in the root /etc/resolv.conf.

I’ve got issues with a client where my application is not being able to connect to an internet service after the rmnet0 interface comes up. The error being reported is that the host cannot be resolved - which leads me to think that I am still having an issue with resolv.conf not being configured properly.

ciao, Dave

Ok, a couple of points - do use the data connection api and register a handler, that’s the first thing. Resolv.conf gets updated whenever a data connection is established, i.e. as soon as the wan interface gets assigned an ip and gateway, so usually you never have to even think about resolv.conf. Otherwise i’m pretty sure it’ll be empty and you’re going to fail every dns request. If you really don’t want to register a data connection though, then there is an alternative.

Edit the /etc/dnsmasq.conf file. Look for the line

#resolv-file= 

Uncomment it and make it point to your own resolv.conf file with the dns servers you want, e.g. have a file called resolv.mine.conf in the /etc/ directory. You will also need to specify user=root a bit further down in dnsmasq.conf. In the actual resolv.conf file, you can optionally add the line:

nameserver=127.0.0.1

that will make dns requests on other interfaces look at your resolv.mine.conf file, if you need them to.
logread and grep ‘dnsmasq’ to check it’s started up ok. Doing that will also tell you what happens whenever something tries to look up a host and what’s generally happening with dns.

Hiya,

I think I’ve got to the bottom of the problem.

The client SIM card was registering both a IPv4 AND a IPv6 address, and being assigned 2 nameservers for both IPv4 and IPv6. The nameservers were being used IPv6 first, then IPv4.

And … cURL was only looking at the first two nameservers being issued up by resolv.conf … which happened to be the IPv6 nameservers … and the hosts cURL was looking up didn’t have IPv6 addresses. So cURL was (correctly) failing because the DNS lookup was failing.

Interestingly, using the underlying linux resolv library calls indicated that the first two nameservers being reported for IPv4 were 0.0.0.0, instead of the IPv6 nameservers being ignored for IPv4 lookups.

Solution: set up the APN profile to connect to IPv4 only. This seems to be working OK.

Thanks for everyone’s help.

ciao, Dave

1 Like

Hiya,

A quick update on this.

There apparently was an issue with the way that Legato network connect() was setting the nameserver and resolver on the underlying linux OS when there was a mixed IPv4 and IPv6 profile.

This appears to have been fixed in Legato 18.05.1.

Works correctly for me now.

ciao, Dave

Hi @davidc, I am currently stuck with 16.10.m3 because of the FX30s. Do you have a recommended work around for this? As I can’t use 18.05.1 and it seems the next release for the FX30 will still only be on 17.

Is only using IPV4 on the APN settings the solution? Or are there other issues? Could you provide a link to the bug this was fixed under by any chance?

Thanks,

Karl

Hi @CoRfr, is the below bug viewable somewhere? I can’t find it in the Legato Bugs…

Hi Karl,

Sorry for the delay in getting back to you.

I never got this resolved. My contacts seemed to lose interest and there was the continued promise of an update to a later version of Legato that would fix the issue … which still isn’t here.

In the end I just forced the FX30 to be IPV4 only which solved the issue about 80% of the time. The other issue that this caused was that the APN registration would sometimes be automagically changed to another APN registration which caused data connectivity issues.

This appeared to be something deep in either the module firmware or in Legato that was reading the SIM and making an APN decision that overrode anything that was manually input. I was repeatedly assured that it couldn’t happen - but it certainly was.

Sorry that I don’t have any better news for you.

ciao, Dave

Hi @davidc thanks for your response. Agreed that this is disappointing. R17 is taking ages to be released. Nobody seems to be able to answer what is causing the bug. I to have set mine to IPV4 only and that mostly works… but I am getting a name resolution failure every few days/weeks and the app is not able to get itself out of that mode.

Strangely nslookup works from the command line, but getaddrinfo return a system error.

Don’t know where to go with that now…

Thanks,
Karl