Simple UDP program breaks the core

I just wanted to add that I played around with this for a while last night but not get any good results.

Adding instrumenting code to the above, I was able to see that for my tests, it would send 6 packets and then the user loop no long ran, and then things went further south. You do have to look at the return value from udp.write() since it can and does return -1 and -2 values indicating problems (I will write some doc on this). But in my tests, this only happened occasionally.

I don’t know what is different about this faster transmit rate vs. my NTP UDP code that sends typically one packet per hour that runs for days.

1 Like

Hmm - 6 is a magic number.

There are approximately 6 buffers free when your average CC3000 starts. I suspect something is not ensuring the buffers are freed.

This could be coincidence, but I wanted to document this.

3 Likes

@bko thank you! Glad you ran the program and had the same problems. Well glad I am not nuts or my patching was not defective, you get the idea. What do the -1 and -2 values represent? And if you get that should we be taking some action or just keep carrying on? Also is there any merit to the comment about patching to 1.28 but didn’t update any of the host driver files. Not sure what version we have in the spark firmware but when I get home I’ll see if there are any diffs.

@AndyW Nice. How did you figure that out? Maybe I could add monitoring to that in the loop and we could see if it is a buffer issue.

There should be no host driver mods required for this CC3000 firmware patch - at least that is my understanding.

I spent a lot of time decoding the SPI traffic between the STM32 and the CC3000 as part of the other CFOD investigation.

1 Like

Trying to do some deeper troubleshooting. How do I get the core debug prints to work? I tried defining DEBUG_BUILD and the output function but no luck… Anyone? And by the codes here I am getting HardFault :frowning:

/********************* Red Flash Codes *********************
1(Faults,RGB_COLOR_RED,HardFault) 
12 (Faults,RGB_COLOR_RED,NMIFault)
3 (Faults,RGB_COLOR_RED,MemManage)
4 (Faults,RGB_COLOR_RED,BusFault)
5 (Faults,RGB_COLOR_RED,UsageFault)
6 (Cloud,RGB_COLOR_RED,InvalidLenth)
7 (System,RGB_COLOR_RED,Exit)
8 (System,RGB_COLOR_RED,OutOfHeap)
9 (System,RGB_COLOR_RED,SPIOverRun)
10 (Softare,RGB_COLOR_RED,AssertionFailure)
11 (Softare,RGB_COLOR_RED,InvalidCase)
************************************************************/

#define DEBUG_BUILD

UDP Udp;

unsigned char TxMsg[12] = { 1, 254, 1, 254, 1, 254, 43, 212, 71, 184, 3, 252 };
unsigned char recbuf[12];
bool isDebugReady = false;

void debug_output_(const char *p) { 
    if (isDebugReady)
        Serial.print(p); 
}

void setup() {
    Serial.begin(9600);
    pinMode(D7, OUTPUT);
    digitalWrite(D7, LOW);

    Udp.begin(9000);
    while (!Serial.available()) {
        RGB.color(0, 255, 0);
        delay(100);
        RGB.color(0, 0, 255);
        delay(100);
        RGB.color(255, 0, 0);
        delay(100);
        Serial.println("Send some data to continue to loop.");
    };
    isDebugReady = true;
}

void loop() {
    int32_t packetLen = Udp.parsePacket();
    
    //Dump any packets RX
    while (packetLen > 0) {
        Udp.read(recbuf, packetLen >= 12 ? 12 : packetLen);
        packetLen = Udp.parsePacket();
    }

    //TX send our test
    Udp.beginPacket(IPAddress(10,0,0,3), 9000);
    Udp.write(TxMsg, 12);
    Udp.endPacket();
    
    //I am still alive
    digitalWrite(D7, LOW);
    delay(200);
    digitalWrite(D7, HIGH);
}

Hi @SomeFixItDude

The red flashing panic codes are doc’ed here

You will need to do local builds in order to do a DEBUG_BUILD, I believe. Local build instructions are here:

I have the local build working but what I can’t find is how to enable the debug build. I defined it in the top of my program same one I just posted. I am somewhat of a cpp newbie. Every time I issue the make command I see in the parameters -DRELEASE_BUILD How do I actually get it to use the debug output statement I put in there. I think this would help debug not only this issue but other issues.

Thanks Again,
Brian

Try this:

make DEBUG_BUILD=y clean all

1 Like

@bko that did it! Here are the results. The program ran a little bit longer but I think that was because I was staring at it.

Debug Output:

earlier:<DEBUG> int Spark_Receive(unsigned char*, int) (419):bytes_received 2
earlier:<DEBUG> int Spark_Receive(unsigned char*, int) (419):bytes_received 16

    0000339797:<DEBUG> virtual size_t UDP::write(const uint8_t*, size_t) (148):sendto(buffer=20000000, size=12)=12
    0000340013:<DEBUG> virtual size_t UDP::write(const uint8_t*, size_t) (148):sendto(buffer=20000000, size=12)=12
    0000340230:<DEBUG> virtual size_t UDP::write(const uint8_t*, size_t) (148):sendto(buffer=20000000, size=12)=12
    0000340445:<DEBUG> virtual size_t UDP::write(const uint8_t*, size_t) (148):sendto(buffer=20000000, size=12)=12
    0000340662:<DEBUG> virtual size_t UDP::write(const uint8_t*, size_t) (148):sendto(buffer=20000000, size=12)=12
    0000340873:<DEBUG> int Spark_Connect() (671):sparkSocket Now =0
    0000340879:<DEBUG> int Spark_Disconnect() (742):
    0000340885:<DEBUG> int Spark_Disconnect() (751):Close
    0000344490:<DEBUG> set_socket_active_status (837):Sd=0, Status SOCKET_STATUS_IACTIVE
    0000344499:<DEBUG> int Spark_Disconnect() (753):Closed retVal=0
    0000344506:<DEBUG> set_socket_active_status (837):Sd=0, Status SOCKET_STATUS_ACTIVE
    0000344515:<DEBUG> int Spark_Connect() (678):socketed sparkSocket=0
    0000344526:<DEBUG> int Spark_Connect() (732):connect
    0000352532:<ERROR> hci_event_handler (257):Timeout now 352532 start 344532 elapsed 8000 cc3000__event_timeout_ms 8000
    0000352544:<ERROR> hci_event_handler (258):Timeout waiting on tSLInformation.usRxEventOpcode 0x1007
    0000352554:<DEBUG> int Spark_Connect() (734):connected connect=-1
    0000352561:<DEBUG> int Internet_Test() (624):socket
    0000352567:<DEBUG> set_socket_active_status (837):Sd=2, Status SOCKET_STATUS_ACTIVE
    0000352576:<DEBUG> int Internet_Test() (626):socketed testSocket=2
    0000352583:<DEBUG> int Internet_Test() (648):connect
    0000360589:<ERROR> hci_event_handler (257):Timeout now 360589 start 352589 elapsed 8000 cc3000__event_timeout_ms 8000
    0000360601:<ERROR> hci_event_handler (258):Timeout waiting on tSLInformation.usRxEventOpcode 0x1007
    0000360611:<DEBUG> int Internet_Test() (650):connected testResult=-1
    0000360619:<DEBUG> int Internet_Test() (660):Close
    0000360625:<DEBUG> set_socket_active_status (837):Sd=2, Status SOCKET_STATUS_IACTIVE
    0000360634:<DEBUG> int Internet_Test() (662):Closed rv=0
    0000360640:<ERROR> void SPARK_WLAN_Loop() (684):Resetting CC3000 due to 2 failed connect attempts
    0000360661:<DEBUG> virtual size_t UDP::write(const uint8_t*, size_t) (148):sendto(buffer=20000000, size=12)=12
    0000360873:<DEBUG> void SPARK_WLAN_Loop() (528):Resetting CC3000!

Going to get some rest, but will look at this output tomorrow. Not sure if much is gained by it or not.

-Bri

Just quick update. Unless someone else tries before me, I am thinking it would be a great test to try this with disabling the cloud. Just to see if using udp by itself causes the CC3000 to die. If the core works fine then maybe it is a problem with cloud service itself or how the core interacts with the cloud?

-Bri

The cloud hates me!

So if I call Spark.disconnect() as my last line of the setup function, I have no problems at all with resetting. I decreased my delay and have sent lots of packets to and from the core (discarding them and not discarding them) and I don’t freeze or reset. I have been going strong now for 4+ hours.

So this makes me think the problem is in the cloud service layer or the heartbeat with the core. Perhaps when we are having communication problems with cloud instead of just resetting take a tiered approach. First try to rebuild the connection, next reattempt after some delay. Maybe even continually doing that retry loop all while allowing the user code to run. I think auto-resetting is a little harsh and doesn’t allow me to take care of any tasks to prepare for reset.

Anyway I think we should concentrate on the actual cloud interaction. I don’t seem to be tripping up the core with any network activity that is local. Which I think is great news, I think we can more easily change the cloud interactions.

Thoughts? Maybe some suggestions on how to debug cloud interactions?

I know you describe how it works in Spark UDP but the OP is correct in his assumption that the no data should be sent until UDP.endPacket() is called. When, in the underlying C code, a sendto() is called on a UDP socket it is then that one UDP datagram is sent. Spark UDP is broken in this regard as the documentation says (or implies unmistakably) that one UDP datagram is sent for each beginPacket() endPacket() pair. It is imperative that the Spark UDP class does its own buffering, and that sendto() is called only once per UDP datagram.

With UDP I would expect that the newer data should replace the older data - the unread buffer should be cleared if there is no more buffer space rather than the newer data being lost. With TCP the older data in the full buffer should be retained and the newer data discarded.

Well, the CC3000 doesn’t always do what is expected, and we work with TI to try and fix as many of these problems as possible,

There are severe problems in code of the Spark UDP class which, if fixed, would make Spark UDP much better. Not all the issues are in the CC3000. That the UDP packet must be assembled by UDP.write() and the C sendto() not called until UDP.endPacket() is one of those. Indeed, this one fix will improve things a whole lot.

As UDP datagrams have a maximum length (all that is guaranteed by the protocol spec is 576 bytes) it would be entirely valid AND HIGHLY DESIRABLE for UDP.write() to return an error indicator if this length is exceeded.

But I actually see no need for UDP.startPacket() and UDP.endPacket(). Better if these were discarded and no false impression gained by their availability: UDP.write() is all that is required. All that UDP.write() should do is call sendto(). No need then for the UDP class to do any buffering.

I’m not disputing that, but we have no direct control over the internal buffer reuse scheme in the CC3000.

AFAIK, fixing UDP in the :spark: firmware is on the list, the problems are well documented and understood.

ok back on topic! Is there any way to debug the cloud interactions with the core? Is the cloud code available yet? Lets get the cloud working when using udp :smile:

-Bri

1 Like

Heya @SomeFixItDude,

Local cloud is coming this summer, and I’d love any help or feedback :slight_smile:

The local cloud beta list thread is here: https://community.spark.io/t/where-is-the-source-code-for-the-cloud/1381

Thanks,
David

2 Likes

With deep update installed. This simple program (almost the above). User loop stops running in under a minute. Two minutes later rapid CFOD. Another minute goes by flashing red, back to CFOD. Rinse and repeat.

UDP Udp;

unsigned char TxMsg[12] = { 1, 254, 1, 254, 1, 254, 43, 212, 71, 184, 3, 252 };
unsigned char recbuf[12];


void setup() {
    Udp.begin(9000);
    pinMode(D7, OUTPUT);
    digitalWrite(D7, LOW);
}

void loop() {
    int32_t packetLen = Udp.parsePacket();
    
    //Dump any packets RX
    while (packetLen > 0) {
        Udp.read(recbuf, packetLen >= 12 ? 12 : packetLen);
        packetLen = Udp.parsePacket();
    }
    
    Udp.beginPacket(IPAddress(10,0,0,2), 9000);
    Udp.write(TxMsg, 12);
    Udp.endPacket();

    digitalWrite(D7, HIGH);
    delay(200);
    digitalWrite(D7, LOW);
    delay(200);
}

Side note, there is nothing listening at 10.0.0.2.

Please note after you flash this code you can’t flash OTA it rarely starts to take the flash and always never finishes. I am updating the title of the problem to encompass more info. Hope someone can help debug this problem.

Thanks

2 Likes

Hi @SomeFixItDude

I feel your pain but deep update really didn’t change the behavior of this code. It failed before deep update and it fails after in the same way for me.

I have a slightly modified version of this test that looks at the return values from beginPacket() and write() so it does not CFOD, but does have failures where write() returns -1 indicating failure. Over in the CFOD thread I pointed out that not receiving is what is triggering the bug. I changed the test to send NTP packets to a host that responds and it never fails. I adjusted for packet size since NTP is 48 bytes. I don’t think it is a race condition per se since I have slowed the rate down to one packet every 2 sec.

I think there is something in the UDP TX host driver that depends on the UDP RX side doing something.

I believe this code runs out of sockets on the TI CC3000 since it always fails after exactly 6 packets are sent. From that point forward, it will send one packet and then fail (catching the failure with return codes) and repeat from there every other packet failing, since my code does UDP.stop() on getting -1 from UDP.write() releasing the socket.

I will keep looking at it but we may need to get the Spark team involved.