Technology | Why single-module CAN architecture is obsolete? Learn about the advantages of dual-module redundancy in one article.

DMR功能與CAN冗餘網路應用解析

preamble

In 2004, the CiA (CAN in Automation) 307 file "Framework for maritime electronics"[1] first proposed a standard for the use of two CAN bus cables in mission critical applications. However, this standard was only applicable to classic CANopen and has now been withdrawn. Based on the original theory of the CiA 307 file, a new standard was born, called DMR (Dual-modular Redundancy). The DMR is not only suitable for CAN and CAN FD networks, but also for CAN devices and CAN networks that require safety-critical, mission-critical and high-availability communications. The DMR is designed to be independent of any specific application layer (e.g. CANopen, CANopen FD, J1939, etc.).

summarize

As described in the paper [2], a designed High Availability (HA) integrated system should have sufficient redundancy and isolation to prevent a single failure from leading to the loss of essential functions or multiple major functions. In addition, any network integrating control or monitoring systems should be designed to be single point of failure tolerant to ensure that a single failure does not affect the overall operation.This requires that the network and its necessary components and cables be fully redundant.

Therefore, CAN networks for HA applications (e.g. avionics, maritime, etc.) must have at least two independent CAN interfaces, each driving physically separate CAN lines.

A single fault in a redundant CAN network is defined as follows:

  • Single interrupt for CAN buses
  • Single failure of CAN transceiver
  • Single failure of the CAN controller (including the bus shutdown state)

(Internal application failures, such as heartbeat events in CANopen applications, are not considered CAN network failures.)

Based on the above requirements, CAN devices supporting redundant communication should have two CAN nodes (see below).Each CAN node contains a Data Link Layer (DLL), a Physical Coding Sublayer (PCS), and a Physical Media Attachment Sublayer (PMA). Note that the two CAN nodes share certain configurations (e.g., bit rate).

圖1:具有DMR功能的CAN介面

Figure 1: CAN interface with DMR function

The purpose of the DMR function is to replicate data packets between the application and the transmitter CAN node and to de-duplicate data packets received by two CAN nodes on the same CAN interface.This means that the communications on both CAN buses must be identical.

CAN Line Naming

The CAN interface connects two cables, one to the first CAN node of all devices, called the default CAN line (DCL), and one to the second CAN node, called the redundant CAN line (RCL). These names are derived from the original CiA 307 standard and the correct connection must be ensured during system installation.

圖2:冗余CAN網路中的報文

Figure 2: Messages in a redundant CAN network

On reception, the DMR function forwards CAN packets to the application program on only one selected line, which is called the active CAN line (ACL).By default, during system initialization, the default CAN line (DCL) is the ACL and the other is the passive CAN line (PCL).

Figure 2 shows an example of a redundant CAN network consisting of CAN devices 1 through 3 used to generate traffic. Due to execution time problems within the DMR or frame retransmission problems on the network, it is not guaranteed that all CAN messages are sent in the same order on both CAN lines, and it is the task of the DMR function to deal with this inconsistency in order.

DMR Functional Principle

As mentioned earlier, the purpose of the DMR function (see Fig. 3) is to replicate the packets to be sent between the two CAN nodes of the CAN interface at the transmitter side and to de-duplicate the packets received by the two CAN nodes of the CAN interface at the receiver side.The Hardware Abstraction Layer (HAL) of the CAN node uses the mailbox model for the hardware abstraction, where a mailbox usually has a unique direction either to receive or to send. A mailbox usually has a unique direction, either receive or transmit.

圖3:DMR功能

Figure 3: DMR Functions

The DMR function provides a counter and a timer for each mailbox and increases or decreases the counter upon successful send/receive on DCL or RCL, respectively.After the mailbox timer reaches the timeout value, the counter should return to 0 to ensure that the number of transmissions on the DCL and RCL remain balanced.

At the same time, the DMR process checks the actual DLL status of each CAN node, e.g., bus off, etc. It also generates DMR events for deviation limits or timeout value expiration. Limits such as counter deviation and mailbox timeout are configurable.

DMR Fault Detection

DMR fault detection is performed periodically during DMR operation to ensure that both CAN nodes send or receive the same number of messages by calculating the difference between the mailbox counter and the deviation limit.The counter may be unbalanced (non-zero) at a given point in time due to CAN error frames or execution time issues.

圖4:故障檢測

Figure 4: Fault Detection

If the mailbox counter is not quite within the deviation limits but still unbalanced, the DCL and RCL mailbox timeout values are checked for expiration and a determination is made as to which line is at fault.

For example, Figure 5 illustrates a single interruption of the CAN line DCL at device DEV3. Device DEV3 switches to RCL when it realizes that the message was not sent successfully through the ACL (DCL) after the mailbox timeout; the other devices determine the fault based on the difference in the receive counters.

圖5:故障情況示例

Figure 5: Example of Failure Condition

At the same time, the DMR function synchronizes the FIFO buffer of the Layer 2 driver and deletes any pending messages on the fault line.The DMR itself does not send alarm events, and it is up to the application to notify the fault externally.

DMR state machine

The DMR function has a built-in state machine that is controlled through the DMR processing algorithm as shown in Figure 6.

圖6:DMR狀態機

Figure 6: DMR Status Machine

In the redundancy enabled state, both DCL and RCL work normally; in the redundancy passive state, although both lines can transmit, a fault is detected on reception; in the redundancy disabled state, only a single line is available.

When the network is disconnected, both CAN interfaces are inoperative.The DMR status machine helps in error diagnosis and distinguishes between receive and transmit fault conditions. The following table shows the status of each device in the example in Figure 5:

EquipmentDMR Status
DEV1Redundant Passive
DEV2Redundant Passive
DEV3Redundancy Disable

Table 1: DMR states in the Figure 5 example

Each device can forward this information to a network control application via a higher-level protocol to help locate faults.

DMR Connection Service

The purpose of the DMR connection service is to check for miswiring (DCL to RCL) and to support fault recovery.For this purpose, each CAN line uses a unique CAN identifier (see Figure 7), and these two CAN identifiers are only used for the DMR function.

圖7:DMR連接服務

Figure 7: DMR Connection Service

The CAN-ID of the DMR connection service is configured by the local application program and should not interfere with the application protocol. Different CAN-IDs can be used to check if both DCL and RCL are connected, but since it is difficult to accurately detect the source of the fault due to the absence of a source address in the telegram (DLC=0), only the number of "good" and "bad" telegrams is provided for the application layer to determine.

Fault Recovery

An important feature of DMR is automatic fault recovery, as shown in Figure 8.During the Fault Recovery operation, the system inserts a 2-second delay to try to recover from the disconnection, then switches the CAN line to running status and transmits the DMR connection service.

圖8:DMR故障恢復

Figure 8: DMR Fault Recovery

Since the CAN-ID of the DMR connection service is sent from the network layer, it prevents interference with the upper layer protocols; when the transmission is successful, the DMR status is updated and redundancy can be rejoined if the original faulty line has been restored.

DMR Testing

To validate the DMR functionality, we developed a DMR test plan and used the test setup shown in Figure 9.During the test, a fault condition is applied at markers F1 through F10.

圖9:DMR測試設置示意圖

Figure 9: Schematic diagram of DMR test setup

MarkerOperating conditionsFault conditions
F1 ... F8CAN bus connectionCAN bus disconnected
F9 ... F10Terminal Resistance AvailableShorted CAN line

Table 2: Failure conditions for test setups

During the test period, the CAN network serial transmission rate was 125 kBit/s and the average bus load was 85%. The following is an example of "RCL startup failure":

POWER ON DUT
@SubTest 01
VerifyNmtState($dut)
IF NMT STATE $dut ! = OPERATIONAL THEN ERROR
Example 1: RCL Startup Failure
      
表3:RCL啟動故障的驗收標準

Table 3: Acceptance Criteria for RCL Activation Faults

Based on the new CiA 701 standard, the DMR functionality has been tested and validated on various platforms and will be adapted for CANopen CC and CANopen FD in the future, with DMR parameters accessed via standardized data elements in the object dictionary.

Product Recommendation

Kvaser Memorator Pro 2xHS v2

Kvaser Memorator Pro 2xHS v2

  • Efficient Recording: Supports massive CAN data recording and automatic segmentation.
  • Intelligent Trigger: Trigger function for precise capture of critical events.
  • Easy to Operate: Automated journal splitting can be achieved through simple configuration.
Go to product page
Kvaser Memorator Pro 5xHS

Kvaser Memorator Pro 5xHS

  • Extra Large Capacity: For larger CAN network logging.
  • Multi-trigger: Supports multiple trigger settings to meet complex application requirements.
  • Flexible Configuration: Users can customize the segmentation conditions and time intervals according to their needs.
Go to product page