I've been leaving devices under test running overnight and for at least the last three nights I've come back to the office to find that the WGM110 has shut down with an 'evt_sme_wifi_is_off' event with error code 0x187, hardware failure.
Last night this happened after about 2.5 hours, according to the log timestamp. The module was not connected to an AP at the time, but was sitting in idle state after having failed a WPS scan right before I left for the day.
There was no traffic at all between the host and the WGM110 during that time. The only other thing I could think to check is the power supply, and it's stable with no more than 10 mV of ripple.
The module on this board was just replaced yesterday, and the problem appears to be the same as the last module.
What are the possible causes of the 'hardware failure' error message?
could you please let us know if this issue also occurs on WSTK board?
Please keep in mind that in WGM110 datasheet there are a paragraphs: Power supply requirements and PCB design guidelines. Could you please check if all of described design requirements are fulfilled by your board?
My WSTK board seems to have given up the ghost. I've been using it for flashing external WMG110s but the one on the BRD4320A doesn't seem to respond to JTAG. The board is detected by Simplicity Commander but it can't flash it. If you can tell me where I can get a replacement BRD4320A I'll give that a try.
My board meets all of the requirements laid out in the datasheet, as far as I can tell. The antenna end of the module is flush with the edge of the board, the keep-out area under the antenna is kept clear, and the ground plane extends 9mm on either side of the module, which was as much as the form factor allowed. VDDPA is 4.2 volts when on external power or 3.7 volts nominal (1 cell Li-ion) otherwise, and VDDCPU is 3.3 volts.
My original question still stands: What conditions cause a 'hardware failure' error?
Bumping this again since it's been close to a year and I still don't have an answer, and I still have devices getting 'hardware failure' messages seemingly at random.
What are the possible causes of a 'hardware failure' message? And what's the recommended troubleshooting procedure when the error doesn't seem to be connected to any particular activity?
the procedure for this issue provided in the past is still valid.
If this error appears, you may perform two operations: try to turn the chip on, and if this fails, then reset the module.
In general, the 0x187 error means the hardware failure error. This error means that the wifi chip cannot respond with requested data (no matter what the requested data is).
Our module responds with hardware failure in three cases:
- when wifi chip reports an error (not specified what type of error is this)
- when module does not responds for our requests for more than 1 minute
- when there are appearing failed MLME-SET requests.
The issue here is not that I can't reset the module, it's that my customers are getting upset that they can't maintain a connection for more than about a day. Turning it off and back on again is not troubleshooting and doesn't do anything to identify or fix the underlying problem.
From my perspective, this shows every indication of being a problem with the WGM110 itself. I'm trying to narrow down the possibilities. Is the Wi-Fi chipset made by SiLabs? Is there reference material available that might give more information on the errors it might encounter? If the Wi-Fi chipset reports more detailed error information to the WGM110's Giant Gecko processor, would it be possible to get that information passed through to BGAPI, either as additional error codes or perhaps another event with the raw error information as payload?
Would it help SiLabs take this problem more seriously if I could demonstrate it with the SLWSTK6120A kit? I can't port my application to the WSTK, but I could tap into TXD/RXD/CTS/RTS and connect those from my board to the WSTK, so that the WGM110 is running on (presumably) proven hardware. I don't have schematics for the WSTK, though, and I'm not clear how the VCOM feature is set up. Can you let me know the recommended procedure for connecting an external host MCU?
My own board is using an external antenna, and power is provided through two independent supplies, mixed through a dual diode and connected to the WGM110 with a short 24 mil trace. Power seems to be clean and stable. That would seem to leave RFI as the most likely external source of trouble, but the board is pretty quiet, with a linear regulator and very few high-speed traces, all of them on the opposite side of the host MCU from the WGM110.
Bumping this again because it's still an issue. It's a good day when a WGM110 manages to run for more than a few hours without a 'hardware failure' message.
I'll reiterate my question - would it be helpful if I connected the UART lines from one of my own boards to the WSTK board, to test if the failure still happens that way? And if so, what data collection would you like me to perform to show the problem?
Also, is it likely that this problem can be fixed? If it's the wifi chip reporting a failure, can that be fixed with a firmware update? Or are they just going to stay broken?
It's been 3 years since I started working with the WGM110 and it's still not a dependable device. Now that the WGM160 is out, I'm worried that it will never be fixed, and the blow to my company's reputation will be significant.
If your engineers would like to add some additional diagnostic output, I would be happy to let it run and capture any messages it produces when the failure occurs. Or if you'd prefer, I could prepare another board to ship to your engineers to illustrate the problem.