MCP2515 CAN Data loss

Hi there ,
I am using MCP2515 based CAN IOT card . It was observed that there is a data loss if I try to send 10 DATA frames with different ID’s with a delay of 10ms . It was observed that out of 10 only 6-7 Data are received where as if i try to send only 4 Data Frames with 10ms of delay i don’t see any Data loss .

Note : Peak CAN is the master device which will send the CAN data and WP7608(MCP2515) is the slave device which will receive the CAN data . (CAN dump is used to receive CAN data )

Basically the SPI driver is not reading the messages from mcp2515 chip fast enough and it was observed that IRQ thread irq/259-mcp251x priority is 50 which is lower (That means the IRQ-Handler can be served or might not be served at the exact time of interrupt) . Is there a way to overcome this issue ? Is it possible to change the rtprio of the irq/259-mcp251x ?

Thanks & Regards

Akshay
1 Like

How about setting the task priority of the driver to a value larger than 50?
e.g.


struct sched_param PARAM = {.sched_priority = MAX_RT_PRIO };

PARAM.sched_priority = 51;

sched_setscheduler(task, SCHED_FIFO, &PARAM);


Another way is to use local_irq_disable() and local_irq_enable() to stop the interrupt from other priority when driver trying to read the data.

Hi @jyijyi ,
I tried increasing the priority for spi to 51 but it doesn’t seems to be helpful . Can you please let me know any other ways to overcome the data Loss ?

That means there is other higher priority thread is running.

how about using local_irq_disable() and local_irq_enable() to stop the interrupt from other priority when driver trying to read the data???

Hi @jyijyi ,
I will try local_irq_disable and enable and will update you . Onto the priority change
I have changed the priority in spi.c as shown below

struct sched_param param = { .sched_priority = 51 };

but when observed in the device it was still showing 50

ps -eo pid,pri,rtprio,cmd | grep mcp
9377 90 50 [irq/131-mcp251x]
9378 39 - [mcp251x_wq]
10167 19 - grep mcp

Am i going in the right direction ?

Seems you did not change the task priority…

Attached “priority_test.rar” is my testing code.
In low_priority_helloworld.c, I have set the task priority to be 51.
In high_priority_helloworld.c, I have set the task priority to be 50 which is having lower priority than the other one.

If you run the two kernel drivers together, you can check by:


root@fx30:~# ps -eo pid,pri,rtprio,cmd | grep LED
4755 91 51 [LED_flash_threa]
4786 90 50 [LED_flash_threa]
5613 19 - grep LED


I can set the task priority!

Moreover, the high_priority_helloworld printing (dmesg | grep priority) will never be interrupted by another one. It is because I have added the code “local_irq_disable” and “local_irq_enable”.
If I comment out the code “local_irq_disable” and “local_irq_enable” in high_priority_helloworld.c, the high_priority_helloworld printing (dmesg | grep priority) will be interrupted by another one.

I think this can show “local_irq_disable” and “local_irq_enable” are actually working fine to block another task with high priority to interrupt own task.

priority_test.rar (169.7 KB)

Hi @jyijyi ,
Can you give a try by changing the priority for spi_irq ? I am still facing issue in changing it .

Regards

  • Ak

have you tried my priority_test.rar?
which spi.c are you modifying ?
Have you tried “local_irq_disable” and “local_irq_enable” as this can block all the interrupt even others has higher priority?

Hi @jyijyi ,
I am modifying yocto/kernel/drivers/spi/spi.c . I have tried local_irq_disable and local_irq_enable but no luck .

Regards

  • Ak

what do you mean by no luck?
It is interrupted by other IRQ?
which IRQ is interrupting?

Hi @jyijyi ,
No luck in the sense I am still missing the CAN Data .

how can I reproduce the issue on mangoh board?
i.e. to have the IRQ:
9377 90 50 [irq/131-mcp251x]

You can try sending CAN data from another Device .
In my case I am using a PCAN (Peak System) from which i am trying to send around 3-4 CAN frames at 10 ms .

In your logs you can see that the priority is 50 ( 9377 90 50 [irq/131-mcp251x]) . We need to figure out a way to increase it to 51 .

Regards
Ak

Can you give the full procedure?
After you modify yocto/kernel/drivers/spi/spi.c, how do you start the irq/131-mcp251x?
Here I have tried the mangoh source, but I cannot see the SPI IRQ as yours, do you know why?

root@swi-mdm9x28-wp:~# ./start_can.sh red 0
Unload of module can_iot.ko has been successful.
sh: write error: Device or resource busy
sh: write error: Device or resource busy
Load of module can_iot.ko has been successful.
Cannot find device “can0”
ifconfig: SIOCGIFFLAGS: No such device
root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# dmesg | grep mcp
[ 34.212888] can_iot_init: mcp2515 (gpio:79 irq:131).
[ 50.480982] can_iot_init: mcp2515 (gpio:79 irq:131).
root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# ^C
root@swi-mdm9x28-wp:~# ps -eo pid,pri,rtprio,cmd | grep irq
3 19 - [ksoftirqd/0]
20 90 50 [irq/47-cpr]
133 90 50 [irq/31-tsens_in]
140 90 50 [irq/43-7864900.]
141 90 50 [irq/42-mmc0]
148 90 50 [irq/205-modem]
1768 19 - grep irq
root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# lsmod
Tainted: G
can_iot 1308 0 - Live 0xbf012000 (O)
wl18xx 92042 0 - Live 0xbf139000
wlcore_sdio 5584 0 - Live 0xbf134000
wlcore 202433 1 wl18xx, Live 0xbf0f3000
mac80211 286130 2 wl18xx,wlcore, Live 0xbf09b000
cfg80211 258877 3 wl18xx,wlcore,mac80211, Live 0xbf04a000
mt7697serial 8102 0 - Live 0xbf00d000 (O)
mt7697q 19687 0 - Live 0xbf040000 (O)
mcp251x 9652 0 - Live 0xbf023000 (O)
mangoh_red 9598 0 - Live 0xbf039000 (O)
ltc294x 5575 0 - Live 0xbf034000 (O)
led 2100 0 - Live 0xbf030000 (O)
cp2130 20350 1 mt7697q, Live 0xbf027000 (O)
bq27xxx_battery 25279 0 - Live 0xbf01b000 (O)
bq24296 10844 0 - Live 0xbf014000 (O)
bmp280_i2c 2607 0 - Live 0xbf010000 (O)
bmp280 9509 1 bmp280_i2c, Live 0xbf009000 (O)
bmi160_i2c 1851 0 - Live 0xbf005000 (O)
bmi160 5432 1 bmi160_i2c, Live 0xbf000000 (O)

Hi @jyijyi ,
We can enable canutils in yocto and use candump to receive the CAN frames
below is the sample procedure .

root@swi-mdm9x28-wp:~# ./start_can.sh
$CAN driver is loading start and can0 need to up
echo: write error: Device or resource busy
$CAN driver is load and can0 UP
root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# lsmod | grep mcp
mcp251x 8827 0 - Live 0xbf01c000 (O)
root@swi-mdm9x28-wp:~# lsmod | grep can
can_iot 922 0 - Live 0xbf000000 (O)
root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# ifconfig can0
can0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
UP RUNNING NOARP MTU:16 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:10
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

**root@swi-mdm9x28-wp:~# candump can0 **
interface = can0, family = 29, type = 3, proto = 1
**<0x18f00801> [8] 88 88 88 88 88 88 88 88 **
**<0x18f00801> [8] 88 88 88 88 88 88 88 88 **
**<0x18f00801> [8] 88 88 88 88 88 88 88 88 **
**<0x18f00801> [8] 88 88 88 88 88 88 88 88 **
<0x18f00801> [8] 88 88 88 88 88 88 88 88

In the above sample I am sending 1 frame every 10ms
0x18f00801 => Frame ID
88 88 88 88 88 88 88 88 => CAN Data

irq/131-mcp251x is IRQ-thread

Are you using mangoh red board with CAN IOT card?
Here I cannot see can0:

root@swi-mdm9x28-wp:~# ./start_can.sh red 0
Unload of module can_iot.ko has been successful.
sh: write error: Device or resource busy
sh: write error: Device or resource busy
Load of module can_iot.ko has been successful.
Cannot find device “can0”
ifconfig: SIOCGIFFLAGS: No such device
root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# lsmod | grep mcp
mcp251x 9652 0 - Live 0xbf023000 (O)
root@swi-mdm9x28-wp:~#
root@swi-mdm9x28-wp:~# lsmod | grep can
can_iot 1308 0 - Live 0xbf037000 (O)
root@swi-mdm9x28-wp:~# ifconfig can0
ifconfig: can0: error fetching interface information: Device not found

Do you know why?

No , I am using custom board . But the POC was done on MangoH Red Board itself .
I think you need to check the start_can.sh .

@jyijyi ,
You can have look at this blog where in similar issue is discussed .
==> NVIDIA Jetson TK1 – SPI to CAN Interface – mcp251x / mcp25625

Regards

  • Ak

I think you should add local_irq_disable and local_irq_enable in this file ( mcp251x_can_ist) but not the spi.c

Hi @jyijyi ,
local_irq_disable & local_irq_enable was tried in mcp251x.c itself and not in spi driver . SPI driver was referred just to check if priority has to be increased from there
Below is the code snippet

**local_irq_disable();** // Added by Akshay to disable other irqs	
mutex_lock(&priv->mcp_lock);
while (!priv->force_quit) {
	enum can_state new_state;
	u8 intf, eflag;
	u8 clear_intf = 0;
	int can_id = 0, data1 = 0;

	mcp251x_read_2regs(spi, CANINTF, &intf, &eflag);

	// mask out flags we don't care about 
	intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR;

	// receive buffer 0 
	if (intf & CANINTF_RX0IF) {
			js= jiffies;
		printk("ITBLR2:INSIDE-IRQ-THREAD-JEFS=%lu\n",js);
		mcp251x_hw_rx(spi, 0);
		
		// Free one buffer ASAP
		// (The MCP2515 does this automatically.)
		 
		if (mcp251x_is_2510(spi))
			mcp251x_write_bits(spi, CANINTF, CANINTF_RX0IF, 0x00);
	}

	// receive buffer 1 
	if (intf & CANINTF_RX1IF) {
		mcp251x_hw_rx(spi, 1);
		// the MCP2515 does this automatically 
		if (mcp251x_is_2510(spi))
			clear_intf |= CANINTF_RX1IF;
	}

	// any error or tx interrupt we need to clear? 
	if (intf & (CANINTF_ERR | CANINTF_TX))
		clear_intf |= intf & (CANINTF_ERR | CANINTF_TX);
	if (clear_intf)
		mcp251x_write_bits(spi, CANINTF, clear_intf, 0x00);

	if (eflag)
		mcp251x_write_bits(spi, EFLG, eflag, 0x00);

	// Update can state 
	if (eflag & EFLG_TXBO) {
		new_state = CAN_STATE_BUS_OFF;
		can_id |= CAN_ERR_BUSOFF;
	} else if (eflag & EFLG_TXEP) {
		new_state = CAN_STATE_ERROR_PASSIVE;
		can_id |= CAN_ERR_CRTL;
		data1 |= CAN_ERR_CRTL_TX_PASSIVE;
	} else if (eflag & EFLG_RXEP) {
		new_state = CAN_STATE_ERROR_PASSIVE;
		can_id |= CAN_ERR_CRTL;
		data1 |= CAN_ERR_CRTL_RX_PASSIVE;
	} else if (eflag & EFLG_TXWAR) {
		new_state = CAN_STATE_ERROR_WARNING;
		can_id |= CAN_ERR_CRTL;
		data1 |= CAN_ERR_CRTL_TX_WARNING;
	} else if (eflag & EFLG_RXWAR) {
		new_state = CAN_STATE_ERROR_WARNING;
		can_id |= CAN_ERR_CRTL;
		data1 |= CAN_ERR_CRTL_RX_WARNING;
	} else {
		new_state = CAN_STATE_ERROR_ACTIVE;
	}

	// Update can state statistics 
	switch (priv->can.state) {
	case CAN_STATE_ERROR_ACTIVE:
		if (new_state >= CAN_STATE_ERROR_WARNING &&
		    new_state <= CAN_STATE_BUS_OFF)
			priv->can.can_stats.error_warning++;
	case CAN_STATE_ERROR_WARNING:	// fallthrough 
		if (new_state >= CAN_STATE_ERROR_PASSIVE &&
		    new_state <= CAN_STATE_BUS_OFF)
			priv->can.can_stats.error_passive++;
		break;
	default:
		break;
	}
	priv->can.state = new_state;

	if (intf & CANINTF_ERRIF) {
		// Handle overflow counters
		if (eflag & (EFLG_RX0OVR | EFLG_RX1OVR)) {
			if (eflag & EFLG_RX0OVR) {
				net->stats.rx_over_errors++;
				net->stats.rx_errors++;
			}
			if (eflag & EFLG_RX1OVR) {
				net->stats.rx_over_errors++;
				net->stats.rx_errors++;
			}
			can_id |= CAN_ERR_CRTL;
			data1 |= CAN_ERR_CRTL_RX_OVERFLOW;
		}
		mcp251x_error_skb(net, can_id, data1);
	}

	if (priv->can.state == CAN_STATE_BUS_OFF) {
		if (priv->can.restart_ms == 0) {
			priv->force_quit = 1;
			can_bus_off(net);
			mcp251x_hw_sleep(spi);
			break;
		}
	}

	if (intf == 0)
		break;

	if (intf & CANINTF_TX) {
		net->stats.tx_packets++;
		net->stats.tx_bytes += priv->tx_len - 1;
		can_led_event(net, CAN_LED_EVENT_TX);
		if (priv->tx_len) {
			can_get_echo_skb(net, 0);
			priv->tx_len = 0;
		}
		netif_wake_queue(net);
	}

}
mutex_unlock(&priv->mcp_lock);
**local_irq_enable();** // Added by Akshay to enable other irqs