Waring: v2.0 is not reliable

More
03 Dec 2012 20:30 #3591 by PhracturedBlue
Replied by PhracturedBlue on topic Waring: v2.0 is not reliable
One thing that I think makes the Complex page more likely to cause a reboot is that it also runs a 'live' calculation of actual inputs in order to draw the graphs (actually all the mixers do, but the complex page does more due to the bar graph).

This may result in the adc query being interrupted by the mixer which would also call the adc which could cause issues.

it won't explain the issue on the main page, but it is certainly worth investigating further

Please Log in or Create an account to join the conversation.

More
04 Dec 2012 02:40 #3598 by Tom Z
Replied by Tom Z on topic Waring: v2.0 is not reliable
I have flown a lot with the V2 firmware for the Devo 10 and haven't had a reboot during flying. I am not using telemetry at all for any of the helicopters or quads I am flying. My TX power is at 10mw or 100mw depending on the model I am using.
I have both simple and complex mixers depending on the models I fly.

Please Log in or Create an account to join the conversation.

More
04 Dec 2012 04:07 - 04 Dec 2012 04:08 #3601 by PhracturedBlue
Replied by PhracturedBlue on topic Waring: v2.0 is not reliable
Well, I've spent a couple very frustrating hours with my Devo8, and cannot reproduce reboots on either the main page or the mixer page.
I've tried with an without telemetry, I've not seen reboots with either my Ladybird or MiniCP (nor DSM2 or Hubsan).

I can reproduce the reboot on the changing the transmit power. That one is due to sending SPI commands from the main loop and from the interrupt loop concurrently. I've released a fix for that one.

I'll keep trying, but if I can't find any way to trigger a reboot, all I can do is guess at what might fix it.

I did see a reboot while changing pages while binding (after shutting off the dialog). I'm not sure if that is related to the other issues or not.
Last edit: 04 Dec 2012 04:08 by PhracturedBlue.

Please Log in or Create an account to join the conversation.

More
04 Dec 2012 04:33 #3602 by Hexperience
Replied by Hexperience on topic Waring: v2.0 is not reliable
PB, just a shot in the dark, but I noticed that the config that caused the reboot was set for 12 channels and had a few virtual channels as well. Were you using a config like that in your tests? Again, just a lay-mans guess, but could it be that a more complex config with more channels is causing a slow down in some way? Enough to set off the watchdog?

There are 10 types of people in this world. Those that understand binary and those that don't.

Please Log in or Create an account to join the conversation.

More
04 Dec 2012 04:50 #3604 by PhracturedBlue
Replied by PhracturedBlue on topic Waring: v2.0 is not reliable

Hexperience wrote: PB, just a shot in the dark, but I noticed that the config that caused the reboot was set for 12 channels and had a few virtual channels as well. Were you using a config like that in your tests? Again, just a lay-mans guess, but could it be that a more complex config with more channels is causing a slow down in some way? Enough to set off the watchdog?

Yep. In fact I was using that exact config. Of course, I couldn't actually fly my birds around with it, but it is fine for binding, testing the telemetry, pumping the throttle, and generally playing around.

Please Log in or Create an account to join the conversation.

More
04 Dec 2012 05:17 #3605 by suvsuv
Replied by suvsuv on topic Waring: v2.0 is not reliable
Moving the watchdog-feeding logic into ISR might be a solution eventually. But it will hide many performance issues in channel calculation and GUI redrawing。 Maybe we could create 2 version
1)release version: watchdog-feeding inside ISR
2) developer version: watchdog-feeding stays as current

Please Log in or Create an account to join the conversation.

More
04 Dec 2012 05:56 #3606 by PhracturedBlue
Replied by PhracturedBlue on topic Waring: v2.0 is not reliable
well, if there is actually a conflict between the main loop and the interrupts, moving the watchdog will have the effect of causing the transmitter to hang.
Of course, I don't know what is actually going on. My next step is to add timing code to determine how long each of the interrupt handlers takes to execute. that will, at least, eliminate one of the possible causes.

Please Log in or Create an account to join the conversation.

More
04 Dec 2012 12:34 #3616 by suvsuv
Replied by suvsuv on topic Waring: v2.0 is not reliable
Another noticeable reboot is during loading model ini files. I experienced this kind of rebooting several times , and I got confirmed that it can be reproduced in devo8s as well.
To reproduce it, just keep loading model files, then the TX will reboot from time to time.
I think we can insert a watch-dog reset in the middle of ini_handler();

Please Log in or Create an account to join the conversation.

More
05 Dec 2012 04:07 #3646 by PhracturedBlue
Replied by PhracturedBlue on topic Waring: v2.0 is not reliable
I've added timing debug code to deviation, and here's an example of what I see:
Avg: radio: 0 mix: 0 med: 0/5 low: 8/102
Average # msec of the past 100 loops for each of:
radio: time to run the radio callback routine
mix: time to run the mixer evaluation routine
med xx/yy:
xx = time to run the 'medium' priority section (button/touch) in the main loop
yy = time between calls to the 'medium' priority loop (should be '5')
low xx/yy:
xx = time to run the 'low' priority section (gui update) in the main loop
yy = time between calls to the 'low' priority loop (should be '100')
Max: radio: 0 mix: 0 med: 301/301 low: 301/301
Same as for Avg, except it is the largest value seen since the last time status was printed

Here are some results
Avg: radio: 0 mix: 0 med: 0/5 low: 9/100
Max: radio: 0 mix: 0 med: 20/20 low: 20/100

Avg: radio: 0 mix: 0 med: 0/5 low: 6/100
Max: radio: 0 mix: 0 med: 20/20 low: 20/100

Avg: radio: 0 mix: 0 med: 0/5 low: 6/100
Max: radio: 0 mix: 0 med: 20/20 low: 20/100

Avg: radio: 0 mix: 0 med: 0/5 low: 14/100
Max: radio: 0 mix: 0 med: 34/34 low: 34/100

Avg: radio: 0 mix: 0 med: 0/5 low: 11/100
Max: radio: 0 mix: 0 med: 27/27 low: 27/100

Avg: radio: 0 mix: 0 med: 0/5 low: 7/103
Max: radio: 0 mix: 0 med: 452/452 low: 452/452

Avg: radio: 0 mix: 0 med: 0/5 low: 4/103
Max: radio: 0 mix: 0 med: 422/422 low: 422/422

Avg: radio: 0 mix: 0 med: 0/5 low: 0/100
Max: radio: 0 mix: 0 med: 0/8 low: 0/100

Avg: radio: 0 mix: 0 med: 0/5 low: 3/102
Max: radio: 0 mix: 0 med: 356/356 low: 356/356

Avg: radio: 0 mix: 0 med: 0/5 low: 29/118
Max: radio: 0 mix: 0 med: 459/459 low: 459/459

What I've found is that it takes < ~35msec to redraw the main screen on a Devo8 with the stock layout moving sticks around (such that most objects are being redrawn frequenty)
It takes ~500msec to redraw the screen when changing pages
It takes ~300msec to disable the binding dialog
At least in normal circumstances, the radio transmission and mixer evaluation are not a significant amount of time.
Telemetry has no measurable impact on timing.
Setting the protocol to 'None' does not change these numbers, which confirms that the radio interrupt is not generally a significant performance impact.

So...drawing the GUI is really slow. I'm not sure why, as last time I measured it, I think it was ~30msec. I need to further investigate this.

Also, we can see that a 1sec watchdog is generally sufficient. if we hit it, then something besides the GUI delayed execution by an additional 500msec beyond the typical situation.
In the end, we didn't find any smoking guns from the timing info, though spending some time looking at GUI efficiency is warranted.

Please Log in or Create an account to join the conversation.

More
07 Dec 2012 03:45 #3725 by PhracturedBlue
Replied by PhracturedBlue on topic Waring: v2.0 is not reliable
As I have been unable to figure out the cause of the reboots, I have reverted back to calculating the mixer in the main loop.
This should be virtually the same as in 1.2. There is some noticeable lag in the controls, since it can take 30msec to refresh the main screen, and 500msec when switching pages) but hopefuly it will no longer crash.
I also increased the watchdog timeout and added a watchdog reset to the model-reading.

Please Log in or Create an account to join the conversation.

  • rbe2012
  • rbe2012's Avatar
  • Offline
  • So much to do, so little time...
More
07 Dec 2012 06:22 - 07 Dec 2012 06:22 #3734 by rbe2012
Replied by rbe2012 on topic Waring: v2.0 is not reliable
Yesterday I noticed two reboots of my Devo8 (unpatched 2.0.0) - the first after configuring quick pages and switching around with left and right and the second only short time later after going to tx config and tapping in the headline. Both have not been reproducable.
Don't know if this can help you, seems to be randomly.
Last edit: 07 Dec 2012 06:22 by rbe2012.

Please Log in or Create an account to join the conversation.

More
07 Dec 2012 09:15 #3744 by FDR
Replied by FDR on topic Waring: v2.0 is not reliable

PhracturedBlue wrote: As I have been unable to figure out the cause of the reboots, I have reverted back to calculating the mixer in the main loop.
This should be virtually the same as in 1.2. There is some noticeable lag in the controls, since it can take 30msec to refresh the main screen, and 500msec when switching pages) but hopefuly it will no longer crash.
I also increased the watchdog timeout and added a watchdog reset to the model-reading.


Does it mean, that the servo delay/jittery may come back?
Or that has been solved already?

Vlad, could you recheck?

Please Log in or Create an account to join the conversation.

More
07 Dec 2012 13:20 - 07 Dec 2012 13:33 #3749 by vlad_vy
Replied by vlad_vy on topic Waring: v2.0 is not reliable

FDR wrote: Does it mean, that the servo delay/jittery may come back?


Yes, it has place. Slight delays (not smooth) have place at main page with digital channel value in a box, no delays at mixer page, very long delays at channel monitor page (protocol DSM2, JR RD921).
Last edit: 07 Dec 2012 13:33 by vlad_vy.

Please Log in or Create an account to join the conversation.

More
07 Dec 2012 13:47 - 07 Dec 2012 14:01 #3752 by vlad_vy
Replied by vlad_vy on topic Waring: v2.0 is not reliable
I wonder, Tx reboot in time mixer setup has place with already saved model files (with version 2.00) or it has place with old model files from version 1.1.2 ???

Next question, reboots in time of flight have place with already saved model files (with version 2.00) or it has place in time of first flight with old model files from version 1.1.2 (without any saving) ???
Last edit: 07 Dec 2012 14:01 by vlad_vy.

Please Log in or Create an account to join the conversation.

More
07 Dec 2012 14:24 - 07 Dec 2012 14:25 #3754 by vlad_vy
Replied by vlad_vy on topic Waring: v2.0 is not reliable
I have the answer, with saved model file, as before, I have unsystematic reboots (v2.00).
Last edit: 07 Dec 2012 14:25 by vlad_vy.

Please Log in or Create an account to join the conversation.

More
07 Dec 2012 15:17 - 07 Dec 2012 15:17 #3755 by Fredyp
Replied by Fredyp on topic Waring: v2.0 is not reliable
Hi!

Sorry for my english....

I have the reboot bug on my Devo8s when i select "Expo&DR" or "Complex" on chanel 7. I don't test other chanel but i make a little movie of this. I recreate this bug many time and it is always when i click the "Ok" button.

Click on the link below to see the movie. You will see the reboot at the end of the movie.



Just for help you!

Bye! and thank you for your great job!

Ps: this movie is not public. Only those who know the link can view the video.

Fred.
Last edit: 07 Dec 2012 15:17 by Fredyp.

Please Log in or Create an account to join the conversation.

More
07 Dec 2012 23:11 - 07 Dec 2012 23:11 #3767 by Tom Z
Replied by Tom Z on topic Waring: v2.0 is not reliable
Does this reboot bug only happen if you make changes in the TX while in flight
or does it reboot in flight at random?
Last edit: 07 Dec 2012 23:11 by Tom Z.

Please Log in or Create an account to join the conversation.

More
07 Dec 2012 23:14 #3768 by PhracturedBlue
Replied by PhracturedBlue on topic Waring: v2.0 is not reliable
At least 2 people have seen it reboot while in flight (though there are many others who have not seen that behavior).

The current code should no longer behave that way, but at a cost of worse response time from the Tx. I hope I'll soon figure out the real root cause and find a proper fix.

Please Log in or Create an account to join the conversation.

More
08 Dec 2012 00:46 #3770 by Fredyp
Replied by Fredyp on topic Waring: v2.0 is not reliable
For now never reboot in flight.

I use my tx with Deviation 2.00 in Phoenix Flight simulator some hours with no problems.

Thank you!
Fred.

Please Log in or Create an account to join the conversation.

More
08 Dec 2012 05:46 - 08 Dec 2012 14:29 #3771 by PhracturedBlue
Replied by PhracturedBlue on topic Waring: v2.0 is not reliable
I believe I have identified the causes of the random reboots.
The old code could rad the ADC from both interrupt context (for the stick inputs) and from the main-loop context (for the tx voltage). We use the same ADC converter to read both values. While it would be very rare, if the interrupt happened while reading the voltage, that would likely cause a lockup.

I have changed the code to use DMA for reading all ADC values so that there should no longer be any chance of contention for the ADC.

Additionally on the mixer page, there was the same potential because we were re-reading the ADC values to compute the graph. I've fixed that too.

Since I think I've fixed the bug, I've put the mixer calculation back in the interrupt, since it gives much better responsiveness.

The new code absolutely needs testing. Included are dfu files for each tx (no need for a new filesystem). I'd like feedback on whether it is possible to trigger a reboot with this firmware. But of course, be careful, since it is brand new

EDIT: removed files. Newer ones are here:
www.deviationtx.com/forum/3-feedback-que...liable?start=40#3786
Last edit: 08 Dec 2012 14:29 by PhracturedBlue.

Please Log in or Create an account to join the conversation.

Time to create page: 0.073 seconds
Powered by Kunena Forum