Some customers have reported that in their larger network deployments they begin to see joining become less reliable, especially as the network grows out to several hops (from the coordinator / Trust Center [TC]). The common thread in these cases is that failed joins (where the Trust Center Join Handler is not being called and no key is passed to the joining device) occur when the “joinee” (the node accepting the association request and sending the APS Update Device command) is more than one hop away from the trust center (ie. joining node is more than two hops from the TC). This kind of problem is often found to be an address resolution issue (since the node sending the Update Device command is a non-neighbor of the TC and therefore is unlikely to be tracked in any of the various neighbor/child/binding/address tables for address resolution), which would in turn affect the APS decryption process for these APS-encrypted Update Device messages (since the unencrypted portion of these frames contain a sender node ID but no sender EUI64, whereas the decryption process used by ZigBee relies on knowledge of the EUI64).
If you believe you are encountering this kind of problem, the best way to solve it is through the use of Ember’s “security address cache” utility code. If you are using an EZSP NCP platform for your TC, this utility code is already built into it, so the resolution is a bit simpler, but a similar implementation is still possible on SoC platforms as well.
Below are the steps needed to (hopefully) resolve this and the explanation of what each step accomplishes:
•Ensure that the TC app is set up with a non-zero EMBER_ADDRESS_TABLE_SIZE. (Default is 8, but at 10 bytes each you may have reduced this already to save RAM.) On EZSP NCP platforms, this setting is controlled by EZSP_CONFIG_ADDRESS_TABLE_SIZE.
Rationale: The best way to guarantee that a node’s long and short address info is being tracked is for the app to create an entry into the Address Table (where entries are created by the app and the node ID info can be refreshed by the stack as new short ID info for that entry’s EUI64 comes to light).
•(For EZSP NCP platforms only) Ensure that the TC app is set up with a non-zero value for EZSP_CONFIG_TRUST_CENTER_ADDRESS_CACHE_SIZE. Note that (unlike the SoC model) this number of entries is not dependent on the address table entries configured by EZSP_CONFIG_ADDRESS_TABLE_SIZE. (The NCP firmware keeps these separate and makes the internal address table large enough to accommodate both sets of entries.)
Rationale: The value chosen here corresponds to the “Y” value in the description of securityAddressCacheInit() further down in this list of steps. See that step’s description for more details about how this is used at runtime.
•Ensure that the TC app is set up as a concentrator (high RAM or low RAM is OK) and uses source-routing, including the EMBER_APPLICATION_HAS_SOURCE_ROUTING build symbol. (Looks from the log like it’s doing this already.)
Rationale: TC needs a route to each device (at least during the (re)joining process) to be able to send the key to its joinee during the authentication process that follows association or rejoining. Discovering that route in the middle of the process takes too long and is too disruptive, so gathering route info from Route Records that arrive along the Many-to-one routes (MTORs) ensures that the route is available before it’s needed.
•(For SoC platforms only) Include the Security Address Cache utility module (from app/util/security/security-address-cache.c) in your TC’s code project. Make sure you’re not using the security-address-cache-STUB module by accident.
Rationale: This module contains code to add address table entries on the fly as new Route Records arrive so that if the Route Record is preceding an APS Update Device message from a previously untracked sender, that sender’s address info can be cached to facilitate proper resolution of the EUI64 for decryption (and response) of the APS Update Device message during the TC’s post-join authentication process. To avoid contention with existing address table entries that the application may be using for its own means, it uses a subset of address table entries as a “cache” that be overwritten as new Route Records arrive.
•(For SoC platforms only) Add a call to securityAddressCacheInit(x,y) in your TC app’s initialization routines. X is the index into the Address Table [AT] where you want the address table entries to start being added by the security address caching mechanism, and Y is the number of entries to dedicate to this cache. Make sure to increase EMBER_ADDRESS_TABLE_SIZE to make room for the cache in addition to any other AT entries your app might require for its own use. Y should be as large as the maximum number of simultaneous post-join/rejoin authentications you expect the TC to be handling at any point in time; you can get away with 1, but usually we recommend 2 or 3 to be safe (since really any Route Record to the TC will trigger this code and cause a new AT entry to be created or an old one replaced; we can’t predict which Route Records will precede an Update Device message).
Rationale: You need to tell the security address cache module which portion of your AT it can use for the AT entries it’s creating. At the stack level, these are just like any other entries in your AT, but having the utility code add them to the table during the Route Record process ensures they are available when the stack needs them for the subsequent unicasts.
•(For SoC platforms only) Add a call to securityAddToAddressCache(source, sourceEui) in your TC’s implementation of emberIncomingRouteRecordHandler(). (If you used our sample implementation in app/util/source-route.c, it’s already doing this.)
Rationale: This code creates a new entry in the “security address cache” subset of the AT (or overwrites the oldest one in the cache if cache is already full) to track the identity (node ID and EUI64) of the device sending you the Route Record so that you have long/short address resolution for this device in future communications (so long as the AT entry is still in the table, regardless of whether it is part of the subset added by the security address cache mechanism).
…If you follow the above steps (see our Sink sample application in app/sensor directory for example of a coordinator/TC app design that does all of this), you should have the security address cache mechanism working properly, and this should ideally resolve the issue described in this FAQ.
•The reason many customers don’t notice this behavior until later in their testing/deployments is that often they weren’t testing networks where joins could occur at 3 hops from the TC (or they had enough potential paths to the TC that a node that failed for this reason once or twice might get lucky and join a closer parent thereafter).
•In this instance and in general, it’s often a good idea to check out our Sensor/Sink application example (or our AppFramework V2 utility code in 4.2 and later releases) for an illustration of best practices where application design is concerned. The Sink app uses the address table to track devices to which it is partnered, and it also uses the security address cache as a failsafe for address resolution during APS security processes (like post-join/rejoin authentication).
•Remember that the AT (and the security address cache contained within) is stored in RAM, so any entries created by the caching mechanism will be wiped out when you reboot. This is mostly a concern if you use High RAM concentrator mode, where Route Records are only sent until the first point (after an MTORR) that the concentrator demonstrates knowledge of his source route to the target node (by source-routing a unicast or ACK to him). In that case, you’d want your TC (if it’s a High RAM concentrator) to send an MTORR upon rebooting so that all devices send a fresh Route Record to the TC before unicasting to him.
•Tracking nodes in the Binding Table (which is non-volatile, except for the current node ID information) is also an acceptable way of ensuring address resolution for non-neighbor, non-child devices (and might be useful if you already were going to create binding table entries for most nodes anyway and don’t want to use extra RAM to remember them in the AT also). However, if you wanted the security address cache module to use the Binding Table instead of the AT, you’d need to change Ember’s address caching code to achieve that.
•As alluded to above, direct children and neighbors of the TC get a “free pass” on the address resolution because the TC has to track their long/short addresses anyway for the NWK Layer security that happens over each hop of ZigBee communication. However, our security address cache code doesn’t check for these exceptions since the node’s status as neighbor of the TC could potentially change without warning. (Neighbor relationships are dynamic, unlike child/parent relationships that only change with join/rejoin/leave activity.)
…Hopefully, the above information helps you clarify what’s going on, how to resolve it, and why the resolution is necessary.
If you still see problems with multi-hop joining after following this advice, please contact Ember Support.