fdMonitor only works sporadically

Hello, we have a problem with our UART data monitoring. We’re using le_fdMonitor_Create to monitor the device file /dev/ttyHS0. The event either triggers always, sometimes it is triggered a few times but stops working then or it is not triggered at all. We implemented a software watchdog which tries to reconnect to the device if no data has been received for 30 seconds. After the watchdog triggers, the connection is successfully reset, however, the behaviour described above remains the same. Usually, after some watchdog triggers (and UART reconnects) the connection remains stable and we receive all the data. We know that we sometimes receive weird data on our UART wire which results in cryptic characters when monitoring the file with “cat < /dev/ttyHS0”. Could that be a problem for the Legato TTY abstraction? Could there be something else that could cause that behaviour? We can also sometimes even see that data is not received via “cat < /dev/ttyHS0” even if no apps are running. Really strange…

The Legato version is 19.11.6_1b12ff63cbf103c6cae4fff1a4e0fc4e
The device is a WP76xx

if you want to isolate whether this is related to legato API, you can simply use open(),read(),write() generic API to control the UART:

I created this standalone solution which either monitors the UART via fdMonitor or in a while loop, depending on the USE_MONITOR switch. Both result in the same behaviour as descriped originally. To me it seems like sometimes the device is not ready at startup because if the UART app is started automatically it never works. If it’s started manually immediately it also doesn’t work but if it’s started after a while it works.
UART-Test.zip (6.8 MB)

do you see problem with generic API like open(),read(),write() to control the UART?

You can also try the following command to disable the UART auto suspend behavior :

echo -1 > /sys/devices/78b0000.uart/power/autosuspend_delay_ms

No, that didn’t work either. Maybe it’s a problem with some files which were manipulated. How can I reset the chip to its default setup?

Could it also be a problem that I removed all apps with this command? Because on a fresh chip it seems to work.

for a in $(app status | cut -d" " -f2); do app stop $a; app remove $a; done

You can downgrade to r12 and reset the module to default state

Thanks we will use a new chip for the moment and try to reflash the broken ones later. It seems like the system was corrupted at some point.

Ok, it still gets corrupted for some reason. Is there something that gets reset on a hard-reset? Because after a hard-reset it works again until the next reboot.

do you see problem with generic API like open(),read(),write() to control the UART?

Does it relate to this issue where data is received during reboot?

I’m really getting frustated with these tools… I downloaded the swiflash tool from http://downloads.sierrawireless.com/tools/swiflash/swiflash.zip via wget (because for some reason the download through Brave didn’t work; neither on Windows nor on Linux). I ran swiflash.bat -m wp76xx -r under Windows from an Administrator console (from a path with no whitespaces in the path because the tool apparently can’t handle whitespaces) only to get this log.

Okay, so I switched to Nobara Linux (a Fedora distribution) for which no package is provided. Therefore, I followed this instruction. Downloaded the zip (again via wget because the link didn’t work in Brave (however, it seems to work e.g. with Firefox)), unpacked it, installed udev rules, restarted the device, ran the command swiflash -m wp76xx -r only to end up with this error

Bildschirmfoto vom 2023-02-24 07-24-06

It’s absolutely frustrating and at the same time annoying if not even the tools provided by Sierra work out of the box. How should we do serious engineering if the tools don’t work as expected?

Yes, as I mentioned in this post. The post you linked is also from me and we already solved that issue which was caused by the second device which was continuously sending data on the UART port while Legato tried to open the device which resulted in a crashing device.

do you downgrad to R12 first? (seems you did not…)

Other user can make it work:

After countless attempts it seems like I successfully downgraded it by using the one-click tool from https://source.sierrawireless.com/resources/airprime/software/wp76xx/wp76xx-firmware-release-12/ (…at least the tool didn’t throw an error).

Then I upgraded it again to 16.3 using the one-click installer from https://source.sierrawireless.com/resources/airprime/software/wp76xx/wp76xx-firmware-release-16,-d-,3/

After that the device didn’t ask me to set a password and even the command history is still available… So I guess it was not reset at any point.

So I repeated the downgrade to R12 procedure, ran swiflash.bat -m WP76XX -r (yes, I also tried swiflash.bat -m wp76xx -r) and ended up with the same error…

So the error that the UART is not working properly after a soft-reset still exists. So once again I’m asking, is there something the gets removed on a hardreset? Otherwise I don’t so a reason, why it works after a hard- but not after a soft-reset.

but you don’t see problem with another new chip for softreset and hardreset, right?
If so, problem only happens on this particular old module.

Not sure if this can revert your old module back to default state: (be careful of the mtd number)

Yes, we also see it on new chips. At least if the chip is flashed on my colleagues machine. Could it be a problem if the device is not restarted properly after the update (e.g. by just pluging out the supply)? The issue might be a problem with our software, however, if we cannot reset the device to a clean state, we cannot find out at which point the system gets corrupted. Resetting it to the version (https://source.sierrawireless.com/resources/airprime/software/wp76xx/wp76xx-firmware-release-10,-d-,1,-d-,0,-d-,1/) also didn’t help.

I think using swiflash to reset to factory state cannot work for your module in old FW like R12.
(i guess it is because your module is using new memory)

Have you tried this?

But if you can also see this issue in new chips, then that means it is not related to default state.

Have you isolated if this is related to legato?

Can you take a new module and run with pure linux application compiled by the toolchain instead of legato application?

Just to give an update on this topic. We got a little bit further. We have two devices of which one is sending data to the Sierra Wireless via UART. If the services that send the UART data are shut down and the WP76xx is rebootet, UART works flawlessly but if the services are running, something seems to happen with the Sierra’s UART but we haven’t figured out what yet. It might be some control character problem because sometimes we receive something like “^M” as character. I don’t know but it’s very strange. Is there a plan to update the Kernel any time soon? Because we are using completely new devices which still run on Kernel 3.18.140 which is heavily outdated and seems rather buggy to me.

did you try this yocto cwe?

No, I haven’t tried that yet.

What does that mean? Because that’s what we get before the UART stops working. Not sure if it has any relation to the issue, just asking. Is it just an information, that the interrupt was triggered?