Bug bounty: Kill the 'Cyan flash of death'

OMNOMNOM

for router settings, I don't have any particular ideas in mind, unfortunately

1 Like

Whatever you’re comfortable posting that would be great, so we can get more eyes on it. If there’s info you don’t want to share, send me a PM

I am not getting a CFOD but my Spark Core randomly reboots.

My program on the SparkCore stores some data into a global variable. I have an Android application that reads the status of the variable through the cloud and sometimes I noticed that the value gets reset to 0.

At first I thought it was because I was powering my SparkCore through a NAS at home, but now I connected it to a USB Phone Charger and the same thing happens.

Good thing is that it doesnt freeze. It resets and reconnects to the cloud.

Again, not sure if its related but I figured I’d share it.

Carsten

1 Like

I flashed https://github.com/spark/core-firmware/blob/feature/debug-cfod/build/core-firmware.bin to my Spark via the dfu-util which also enabled Tinker on the spark (so I didn’t push a new sketch via the cloud). I then turned a python script loose on it via cron which queried for the A0 value every minute via Cloud API/REST and logged the results (see table below.)

After the CFOD occurred, the Spark would not reset or reconnect on it’s on, only responding to a RESET or a power cycle. Once reset, it would immediately re-connect.

I’m using an Asus RT-N16 and running “EasyTomato Firmware Version 0.8” (http://www.easytomato.org). I’m using WPA2 with AES on Channel 6 (2.437 GHz). My SSID contains a “-” (dash) and a " " (space) character. I’m running in “Auto” mode and have B & G clients - I don’t believe I have any N clients. I’m using 192.168.1.0/32 for an IP range and am located in the US.

                     Seconds since Epoch	 json result	        Time (UTC)	Uptime (Minutes)
Start	1389798541		                1/15/14 15:09	
End	1389815832	 error: Timed out. 	1/15/14 19:57	288.18
				
Start	1389816842	                	1/15/14 20:14	
End	1389818951	 error: Timed out. 	1/15/14 20:49	35.15
				
Start	1389819002		                1/15/14 20:50	
End	1389819431	 error: Timed out. 	1/15/14 20:57	7.15
				
Start	1389819782		                1/15/14 21:03	
End	1389828071	 error: Timed out. 	1/15/14 23:21	138.15
				
Start	1389828122		                1/15/14 23:22	
End	1389829631	 error: Timed out. 	1/15/14 23:47	25.15
				
Start	1389831482		                1/16/14 0:18	
End	1389831731	 error: Timed out. 	1/16/14 0:22	4.15
				
Start	1389831782	         	        1/16/14 0:23	
End	1389834132	 error: Timed out. 	1/16/14 1:02	39.17
				
Start	1389839222		                1/16/14 2:27	
End	1389843851	 error: Timed out. 	1/16/14 3:44	77.15

Dave O

My setup (if it helps in anyway):

I have two cores in “production”.
One controls my blinds and has a spark function setup to receive commands. This one goes down with CFOD every 36 hours approx.

The other one is a simple light switch. with a dimmer control (actually, a PSP joystick). It analogRead's a button pin and an X and Y pin for the joystick on each loop iteration. It has no reason to talk to the outside world. Its encased in a box, so I cannot confirm its actually flashing cyan, but it dies every few hours, maybe between 2 and 6 hours. I have this connected to an LED strip. Sometimes the LED strip comes on by itself when the freeze occurs. But if I power cycle the core the strip is always off at the start (as designed).

My router is an Airport Extreme (not most recent gen, the one before it). Running B/G/N on 2.4 and 5GHz with visible SSID. Internal IP range is 10.0.0.1 - 255 with a static external IP. Airport does DHCP. Connected to ADSL.

In the last week i’ve been having packet loss issues (about 3.5%) with my ISP. I only noticed this as my VPN to the office has started to drop since late last week, not sure if thats affecting it. I cant confirm if I did NOT have packet loss issues prior to last week, but they were not affecting me in any noticeable way. However the cores have been dropping out since before I started to notice the packet loss issue.

The Cores pretty much have line of sight to the base station, and are within 5 - 6 metres of it. Let me know if there is any other info I can provide.

Symptom: CFOD
Router: Apple Airport Extreme (MB763LL/A)
Wifi Repeater: Apple Airport Express (MC414LL/A)
Wireless Protocol: 802.11n
Location: Chattanooga, TN

Primary Network SSID: Shea Weber
Primary Network Security: WPA2 passphrase
Primary Network Range: 10.0.0.1/24

Guest Network SSID: Digital Minions
Guest Network Security: WPA2 passphrase
Guest Network Range: Don't remember off the top of my head

I didn't create a guest network until it was suggested as a possible fix in the other thread. Since then, I haven't seen a drop at all. To my knowledge, no other wireless devices on my primary network have had any issues.

Thanks for that, I'll try it later and report back after the weekend if I see an improvement too.

1 Like

No improvement. Still got 'em…

:frowning:

Sometimes LED fades saying “i’m ok” but loop() has stopped!

Frido.

1 Like

@timb, would you mind telling me what display you have ? I have been looking for a nice display for the spark and was looking at serial displays, but I2C sounds much better.

1 Like

Symptom: CFOD
Router: Asus RT-N66U running dd-wrt
Wireless Protocol: 802.11n
Location: PA
Network Security: WPA2 passphrase

1 Like

Can you tell us how the Spark.Core is suppose to behave when the network is lost? I understand the CBoD that occurs when it is trying to reconnect, but why does the user application stop? I would think that we want the Spark.Core to continue running the user application and continue to try and re-establish a connection with the cloud. It would be great if we could register an event handler that gets notified when the cloud connection is either lost or re-established. Having a device that hangs when the network is down is not very useful. I understand the current problem with CBoD is related to having a good connection, but when happens when the real connection is down, we can’t have the system go into a state where the user application does not get any cycles. Anyway those are my thoughts.

So I continue to get the CBOD using the updated firmware base. I have a Linksys E4200 router and a 1.5Mb DSL connection (sigh - I used to have a 40Mb connection back East). My application just blinks LEDs and responds to Cloud function calls to change the state and rate of blinking. It outputs an iteration counter to the serial port.

Hey @mtnscott,

There is separate work in progress to decouple the wifi connection from user code. See this thread:

1 Like

I don’t know if this report will help, but I have not seen a CFOD since I stopped calling TCPClient.stop. My playing around does not currently use the cloud features other than for the build IDE and downloading to the core. I have several little toy applications that scape web pages or RSS feeds for interesting things and display the info on a 2x16 LCD.

I was having the core keep a count of the number of times it hit a web page which I scheduled for every 5 minutes and display that count on the LCD. The biggest number I saw was 109, which is just over 9 hours of uptime.

I also have a UDP NTP client that I have been working on and it runs overnight without crashing as well. I don’t recall ever seeing UDP have a CFOD.

My apps DO spontaneously crash sometimes, but the core reboots gracefully and reconnects in a normal way as if I hit the reset button. During development, I have walked off the end of memory and had to reset the WiFi credentials and reflash tinker to get it to work sometimes. I would say these crashes are my fault, to the best of my understanding.

I am using an el-cheapo Netgear router with WPA2, since WEP didn’t work for me. I am having the core print its MAC and IP addresses to the display at startup, so I know that I am getting at 10.x.x.x address.

I don’t know if my good luck comes from the router (doubt it) or the lack of cloud IO or the lack of TCPClient.stop calls.

@mtnscott If you want a nice little graphical backpack that will work over UART, I2C or SPI, I’d take a look at Digole! They sell both a range of LCD’s and OLEDs with integrated backpack and just the backpacks by themselves. I’m using the 1.3" White OLED with the Core right now and it’s working great! One of the nice things is that the protocol is universal amongst all the different displays, so one library fits all.

I’d also recommend checking out the 1.8" Color OLED Module, 2.7" Backlit LCD Module, 1.8" White Backlight LCD Module and the Universal KS0108 Adapter.

Another handy feature is the built-in (user replaceable) UG8-compatible fonts and the ability to upload a startup bitmap or animation.

The raw command set is pretty simple and generally consists of ASCII characters followed by X number of option bytes. For example:

Wire.beginTransmission(0x4E);
Wire.print("CL");
Wire.print("SF");
Wire.write(18);
Wire.print("TT");
Wire.print("Cloud Uptime");
Wire.write(0x00);
Wire.endTransmission();

CL = Clear
SF = Set Font, 18 = Font
TT = Text (Followed by the text you want to display and 0x00 for EOL.)

Anyway, I’ve almost got the full Digole Arduino Library ported over to the Spark Core. If you want to step up to a graphic display and don’t need a touchscreen, I highly recommend picking up at least one of the Digole OLED units! As an aside, Digole ships displays from Canada and China, it should tell you somewhere on the product page; the stuff coming from Canada normally ships to the US in about a week!

I am not sure whether my report will be helpful but I am experiencing CFOD as well since today.

I have not started using the web IDE yet but I used the core with the relay shield and Tinker. It worked normally before, then I did not powered it up for a few days. Today, I powered it up, sent a few command from Tinker, and it worked for a few seconds before the CFOD occured. I thought something was wrong and I used the reflash Tinker command. And after that, none of any command went through.

The LED cycle: white -> green -> breathing cyan for 30 seconds -> flashing cyan for a few minutes -> red -> flashing cyan again.

I do not think the problem is with the router since it worked before, but here is the wifi details:

Router: TP-Link TL-WR41N
Mode: 11bgn mixed
Channel: Auto (current channel 7)
Location: MY
Network Range: 192.168.1.1/24
Security: WPA/WPA2 - Personal

I think that this problem is more than just losing connection with the cloud. Polling a variable read will CFOD my core consistently. Not polling it - but still running the same program - and it runs for days (although from my previous post you will see that it apepars on line but a variabel read returned nonsense).

I was looking into this but so far I am unable to duplicate running my RGB brightness demo for 96 hours on the jtag shield with 1A usb wall power supply. I have never seen this problem occur actually.

Can anyone provide a reliable way to duplicate? Running tinker and polling variables at a fixed rate?

@dorth, Can you provide the python script your using to poll the core and report uptimes?

In my case, the PWS makes no difference, I have used a 1A PWS in addition to using it connected to my Macbook Pro. I expose some functions, no variables to the cloud. I get CBOD within 30m consistently, once it lasted for 1h, but never longer than that. If I let it sit, I will get a flashing red briefly during the CBOD, it then goes back to CBOD.

Here is the python script I run with cron (every minute) to query an analog value from the Tinker app running on the core. If I poll every 15 minutes, my core will run for well over 24 hours. If I run cron every minute, it will CFOD within a few hours (see my post above for my results).

The script uses "requests" for REST handling and you need to have that installed (http://docs.python-requests.org/en/latest/. This is running on an Ubuntu box, but should be fairly portable.

Python Script to Poll Spark

Dave O