DataConnectionService: No interface up indication after network registration restored

Hi

I’m slowly going mad with DataConnectionService.
Legato version 20.04 on Mangoh RED.

Basic application, register a connection handler (le_data_AddConnectionStateHandler) and request data interface (le_data_Request). App starts and gets correct message indicating data interface connected.

Then I find that if network registration is lost (remove antenna for example), I get the correct message to the connectionHandler (isConnected = 0), date interface disconnected.

If I then restore network registration (replace antenna) the connection handler never gets a message and the data channel does not seem to start.

I traced back to dcsCellular.c in function le_dcsCellular_RetryConn() to this message:
LE_DEBUG(“Cellular connection %s already up with no need to retry”, cellConnName);
return LE_DUPLICATE;

If I remove the return line (HACK - to allow code to run further), then all works as expected, I get connection lost and restored reliably. I’m pretty sure I’m doing something else wrong, or else there would be thousands of tickets for this. Loosing network registration is a common occurrence in any application. But then I see a lot of posts saying data connection breaks after several hours etc… maybe related.

Can anyone confirm this operation before I delve into the complex world of Legato source and start posting trace logs?

Here is the simple app

#include “legato.h”
#include “interfaces.h”
static bool wasConnected;
static le_data_RequestObjRef_t modemDataConnectionRef = NULL;

static void ConnectionStateHandler
(
const char *intfName,
bool isConnected,
void *contextPtr
)
{
if (isConnected)
{
LE_INFO(“ConnectionStateHandler: Interface %s CONNECTED, wasConnected=%d”, intfName, wasConnected);
}
else
{
LE_INFO(“ConnectionStateHandler: Interface %s DISCONNECTED, wasConnected=%d”,intfName, wasConnected);
}
wasConnected = isConnected;

}

COMPONENT_INIT
{
LE_INFO(“FMP DataConnectionServices Test\n”);

le_data_AddConnectionStateHandler(ConnectionStateHandler, NULL);
modemDataConnectionRef = le_data_Request();

}

Logs at point registration is restored:

Jun 27 22:02:27 swi-mdm9x15 user.info Legato: INFO | dcsDaemon[1078]/dcsCellular T=main | dcsCellular.c DcsCellularPacketSwitchHandler() 732 | Packet switch state: previous 5, new 0
Jun 27 22:02:27 swi-mdm9x15 user.info Legato: INFO | dcsDaemon[1078]/dcsCellular T=main | dcsCellular.c DcsCellularPacketSwitchHandler() 732 | Packet switch state: previous 0, new 1
Jun 27 22:02:27 swi-mdm9x15 user.info Legato: INFO | dcsDaemon[1078]/dcs T=main | dcs_db.c dcs_EventNotifierTechStateTransition() 311 | Notify all channels of technology 2 of system state transition to up
Jun 27 22:02:27 swi-mdm9x15 user.debug Legato: DBUG | dcsDaemon[1078]/dcsCellular T=main | dcsCellular.c le_dcsCellular_RetryConn() 1328 | Cellular connection 1 already up with no need to retry

How about using the modemDemo sample application which will have the offline option?

root@swi-mdm9x28-wp:~# app runProc modemDemo1 send --exe=send – 1234567 “Sim”
root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# cat /legato/systems/current/appsWriteable/modemDemo1/smsC
hat.txt

SIM 1 is inserted and unlocked. ICCID=8985200012741552068 IMSI=454003074155206

root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# cm data
Index: 1
APN: hkcsl
PDP Type: IPV4V6
Connected: no
root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# app runProc modemDemo1 send --exe=send – 1234567 "Online
"
root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# cat /legato/systems/current/appsWriteable/modemDemo1/smsC
hat.txt

SIM 1 is inserted and unlocked. ICCID=8985200012741552068 IMSI=454003074155206

Requesting data connection.

root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# cm data
Index: 1
APN: hkcsl
PDP Type: IPV4V6
Connected: yes
Interface: rmnet_data0
Family[IPv4]: inet
IP[IPv4]: 10.128.167.61
Gateway[IPv4]: 10.128.167.62
Dns1[IPv4]: 10.145.148.1
Dns2[IPv4]: 10.144.148.133
root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# ping www.google.com
PING www.google.com (172.217.26.132): 56 data bytes
64 bytes from 172.217.26.132: seq=0 ttl=59 time=16.461 ms
64 bytes from 172.217.26.132: seq=1 ttl=59 time=26.180 ms
^C
www.google.com ping statistics —
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 16.461/21.320/26.180 ms
root@swi-mdm9x28-wp:~# app runProc modemDemo1 send --exe=send – 1234567 “Offlin
e”
root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# cat /legato/systems/current/appsWriteable/modemDemo1/smsC
hat.txt

SIM 1 is inserted and unlocked. ICCID=8985200012741552068 IMSI=454003074155206

Requesting data connection.

Releasing data connection.

root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# ping www.google.com
ping: bad address ‘www.google.com
root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# app runProc modemDemo1 send --exe=send – 1234567 "Online
"
root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# ping www.google.com
PING www.google.com (172.217.26.132): 56 data bytes
64 bytes from 172.217.26.132: seq=0 ttl=59 time=19.727 ms
64 bytes from 172.217.26.132: seq=1 ttl=59 time=18.251 ms
^C
www.google.com ping statistics —
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 18.251/18.989/19.727 ms
root@swi-mdm9x28-wp:~# cat /legato/systems/current/appsWriteable/modemDemo1/smsC
hat.txt

SIM 1 is inserted and unlocked. ICCID=8985200012741552068 IMSI=454003074155206

Requesting data connection.

Releasing data connection.

Requesting data connection.

root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# cm info
Device: WP7607
IMEI: 359779081234565
IMEISV: 6
FSN: VN730485080103
Firmware Version: SWI9X07Y_02.28.03.03 000000 jenki[ 374.277531] i2c-msm-v2 78b8000.i2c: NACK: slave not responding, ensure its powered: msgs(n:1 cur:0 tx) bc(rx:0 tx:2) mode:FIFO slv_addr:0x3a MSTR_STS:0x0c1300c8 OPER:0x00000090
ns 2019/05/21 03:33:04
Bootloader Version: SWI9X07Y_[ 374.297141] i2c-msm-v2 78b8000.i2c: NACK: slave not responding, ensure its powered: msgs(n:1 cur:0 tx) bc(rx:0 tx:2) mode:FIFO slv_addr:0x3a MSTR_STS:0x0c1300c8 OPER:0x00000090
02.28.03.03 000000 jenkins 2019/05/21 03:33:04
MCU Version: 002.011
PRI Part Number (PN): 9908958
PRI Revision: 001.000
Carrier PRI Name: GENERIC
Carrier PRI Revision: 002.068_000
SKU: 1104301
Last Reset Cause: Crash
Resets Count: Expected: 195 Unexpected: 16

Hi jyijyi

Thank you for your response. I was already looking at the mdc interface as an alternative, by what I’m trying to establish is whether DataConnectionService is buggy which I’ve already concluded, yes it is. I’m finding other issues with Legato framework services but can find no evidence of this on the forum, so I’m trying to establish if just a problem with what I’m doing or are others experiencing the same issues.

I read on another post, Dataservices is going to be deprecated, does anyone know if this is the really the case?

I am experiencing the same issue. The callback is called when down, but not when going back up.
For me it appears to happen when the DNS is changed.

I can manually poll the connection status using the same method that “cm data info” uses and can confirm that the data connection has gone back up.
When I try to use the data connection I get “no route to host” errors.
Looking at the routing table I can see the same issue outlined in this post.

Hi shib.

You are 100% correct, thanks for the pointer.

So what happens is when the network signal gets restored, a new IP address is issued by network (Im not using a private APN for the testing phase just yet). The route and gateway is never updated by dataConnectionServices and therefore the device routing table is not configured correctly with the new information (uses old IP and gw), therefore the data connection won’t work.

Looking at the Service code I found that the default route is set by the ChannelEventHandler. I found that dcsServer.c ChannelEventHandler never gets called at any point during transition to UP state, and therefore SetDefaultRouteAndDns never get called once the connection is restored, and this ties up with what you and I are seeing. At some point there should be an LE_DCS_EVENT_UP to the channel handler, but this never happens, so the route does not get set up properly

In my humble opinion this is a bug in DataConnectionServices. What I can’t understand is how this basic problem is still around, surely there would have been thousands of tickets.

I think that on transition change, the should be a notification to ChannelEventHandler that the interface is up instead of a channel retry so I added this function in dcs_db.c

static void DcsApplyTechSystemUpEventAction
(
le_dcs_channelDb_t *channelDb
)
{
le_dls_Link_t *evtHdlrPtr;
le_dcs_channelDbEventHdlr_t *channelAppEvt;
le_dcs_channelDbEventReport_t evtReport;

    LE_INFO("FMP: DcsApplyTechSystemUpAction");


evtHdlrPtr = le_dls_Peek(&channelDb->evtHdlrs);
while (evtHdlrPtr)
{
    // traverse all event handlers to trigger an event notification
    channelAppEvt = CONTAINER_OF(evtHdlrPtr, le_dcs_channelDbEventHdlr_t, hdlrLink);
    LE_DEBUG("Send Up event notice for channel %s to app with session reference %p",
             channelDb->channelName, dcs_GetSessionRef(channelAppEvt->appSessionRefKey));
    evtReport.channelDb = channelDb;
    evtReport.event = LE_DCS_EVENT_UP;
    le_event_Report(channelAppEvt->channelEventId, &evtReport, sizeof(evtReport));
    evtHdlrPtr = le_dls_PeekNext(&channelDb->evtHdlrs, evtHdlrPtr);
}

}

and modified dcs_EventNotifierTechStateTransition to call this function instead of dcsTech_RetryChannel.

LE_INFO(“Notify all channels of technology %d of system state transition to %s”,
tech, techState ? “up” : “down”);
if (!techState)
{
func = &DcsApplyTechSystemDownEventAction;
}
else
{
LE_INFO(“FMP: Added DcsApplyTechSystemUpEventAction, removed Channel Retry”);
func = &DcsApplyTechSystemUpEventAction;
//func = &dcsTech_RetryChannel;
}

This corrects the particular problem I’m facing, but not sure what other issues it will introduce as don’t know if retry channel function belongs here, and hence I don’t recommend this to anyone with my 2 weeks super knowledge. Would be nice if someone from Legato Development can comment, particularly if I’ve lost the plot.

Bump.

Would be nice if someone from Legato Development can comment

here says we can use le_data_Request() instead.

I am already using le_data_Request().
What other ways are there to establish a data connection?

My understanding was le_data was the easy way, but if you wanted to do things manually you can use le_mdc.

How about you do data offline sequence when you receive the disconnect callback?

This sounds like a good approach.

I haven’t tried it yet but will update once I do.

Hi,

What is the “data offline sequence”? @shib @jyijyi

We seems to have the same issue on a wp7611 with ATT modem FW.
Legato version: 19.11.2_ae979affcbb00c5b919be71fc9cb1b98
Firmware Version: SWI9X07Y_02.37.00.00 6c0fe9 jenkins 2020/01/17 01:29:47

For the record, exactly the same code seems to work on our wp7702 with PTCRB and GCF modem FW.

It means

app runProc modemDemo1 send --exe=send – 1234567 “Offline”

Hi,

Nice!

And if we wanted to incorporate it in an application that previously has run le_data_request() and also monitors the connection with a callback, how would we do?

When the callback of disconnected state comes we call some API, I hope.

I guess we could run
system("app runProc modemDemo1 send --exe=send – 1234567 “Offline”);

But hopefully there is a correct C-API sequence for this?

As we have run le_data_request, maybe we should use
le_data_release()?
When? Directly on disconnect callback? Or wait a while? On other another modem we don’t see this problem.

Previous indications has said that le_data_request should take care of everything (also reconnects). But this, what seems like a bug, may prevent it, in this case?

Grateful for any help, @jyijyi

you can see the offline code in the sample :

Hi,

Thanks for the help @jyijyi !

With regards to issues like the above, we suspect that we may introduce new timing problems if we do le_data_release directly on data event disconnected and the do le_data_Request very soon after. Do you know of any (timing) guidelines for le_data_Request and Release?

And @fpereiraEWC how did you do? Do you run happily with your added code, and it seems to work fine? Or did you add le_data_Release()?
BR

You might need to do a stress test on online and offline sequence and see if there is timing requirement

Hi again,
@fpereiraEWC did you find any drawbacks with the proposed code? Are you still running it?
We will test it now.

Best regards,
Hans

We found too many issues with Legato, so did not move into production with this product. The basic testing we did with this particular problem and our solution did not reveal any issues, but like I said, we never went into production.

Hi,

Thank you for you answer @fpereiraEWC ! I can fully understand your decision.

Hey,

Is there a porper workaroud to this problem yet?
This is obviously a bug in the DataConnectionService…