Star Sonata
http://forum.starsonata.com/

Weekly Dev Meeting - 6th April
http://forum.starsonata.com/viewtopic.php?f=9&t=61739
Page 1 of 2

Author:  Star Sonata Bot [ Wed Apr 06, 2016 5:51 pm ]
Post subject:  Weekly Dev Meeting - 6th April

Discussion topic for post: http://www.starsonata.com/blog/weekly-d ... 6th-april/

Author:  ELITE [ Wed Apr 06, 2016 5:55 pm ]
Post subject:  Re: Weekly Dev Meeting - 6th April

Quote:
We’ve picked a date for our universe reset and you’re getting lots of forewarning on it! The chosen date for the reset is the 7th May. A proper announcement will come nearer the time.


ayyy

Author:  redalert150 [ Wed Apr 06, 2016 6:02 pm ]
Post subject:  Re: Weekly Dev Meeting - 6th April

Quote:
No Patch :(

Author:  Chrono Warrior [ Wed Apr 06, 2016 6:03 pm ]
Post subject:  Re: Weekly Dev Meeting - 6th April

redalert150 wrote:
Quote:
No Patch :(


- Chrono

Author:  ELITE [ Wed Apr 06, 2016 6:05 pm ]
Post subject:  Re: Weekly Dev Meeting - 6th April

Chrono Warrior wrote:
redalert150 wrote:
Quote:
No Patch :(


Author:  ShawnMcCall [ Wed Apr 06, 2016 9:53 pm ]
Post subject:  Re: Weekly Dev Meeting - 6th April

Chrono Warrior wrote:
redalert150 wrote:
Quote:
No Patch :(


- Chrono


- Shawn

Author:  sabre198 [ Thu Apr 07, 2016 12:36 am ]
Post subject:  Re: Weekly Dev Meeting - 6th April

Drone looks sexy

Author:  Godsteel [ Thu Apr 07, 2016 1:56 am ]
Post subject:  Re: Weekly Dev Meeting - 6th April

What about default camera angle? Iirc someone was to rise this topic on this dev meeting.

Author:  Auxilium [ Thu Apr 07, 2016 2:48 am ]
Post subject:  Re: Weekly Dev Meeting - 6th April

Pretty sure Jey said he's planning to patch on Friday evening / Saturday morning and make a blog post about it before then, but don't quote me on that.

Author:  andsimo [ Thu Apr 07, 2016 3:43 am ]
Post subject:  Re: Weekly Dev Meeting - 6th April

Out of curiosity, what is a memory corruption bug, and how did the one Jey found look like?

I have always thought about it as something to do about; a variable is set to a certain memory size and the programming violates this size?

For example a boolean is given the string 'true' instead of the constant true, and thus assigning more bytes than allowed to the variable?

Author:  Godsteel [ Thu Apr 07, 2016 4:03 am ]
Post subject:  Re: Weekly Dev Meeting - 6th April

andsimo wrote:
Out of curiosity, what is a memory corruption bug, and how did the one Jey found look like?

I have always thought about it as something to do about; a variable is set to a certain memory size and the programming violates this size?

For example a boolean is given the string 'true' instead of the constant true, and thus assigning more bytes than allowed to the variable?

Compiler wouldn't allow that. The easiest example would be to have an array of size 5 and access 6th+ item.

Out of curiosity, in a world where most of human knowledge is available to anyone with access to internet, why did you choose to learn from forum of niche game with about 50 active forum users, rather than searching "memory corruption" phrase and checking online encyclopedias?

Author:  sabre198 [ Thu Apr 07, 2016 4:51 am ]
Post subject:  Re: Weekly Dev Meeting - 6th April

cos this forum has Kane

Author:  andsimo [ Thu Apr 07, 2016 5:02 am ]
Post subject:  Re: Weekly Dev Meeting - 6th April

Godsteel wrote:
Out of curiosity, in a world where most of human knowledge is available to anyone with access to internet, ....
How could I then hope to see the memory corruption Jey found?

Author:  Jey123456 [ Thu Apr 07, 2016 6:53 am ]
Post subject:  Re: Weekly Dev Meeting - 6th April

andsimo wrote:
Out of curiosity, what is a memory corruption bug, and how did the one Jey found look like?

I have always thought about it as something to do about; a variable is set to a certain memory size and the programming violates this size?

For example a boolean is given the string 'true' instead of the constant true, and thus assigning more bytes than allowed to the variable?


If it was a simple array going out of bound it would have been much much easier to find and fix XD (you can use a technique generally called memory fencing where you add data after each memory allocation then validate that data to find bound issue).

This specific case was much more obscure, ill be the first to admit i do not understand 100% what is going on in it. But after lots of digging in the memory and gcc docs, i found that it was an issue with a very very rare memory race condition often called l2 cache coherency. Its a problem that can occur on a multi processor system (not multicore since multicore processor share l2 cache between all cores, but multiple processors where you have separate / independent l2 cache).

For a reason that i'm not 100% sure but has likely to do with the inline call combined with some pretty repetitive sequential memory access inside our spaceobjects container (sobmap) the data ended up being kept in l2 cache longer than it should be, the memory writes were done in a different order than written in the code and when a write was done on a cpu even tho it was the same thread, if the thread was moved to another processor (not core but processor), the sobmap rebalanced and then moved back to the original processor, it did not always invalidate the l2 cache in time and ended up reading the memory from before the sobmap rebalance.

When i started suspecting the issue, to help with fixing it / gathering data faster, i setup a virtual machine with 16 processors each having 2 cores (so that i ended up with 16x emulated l2 cache, albit small) and modified the server code to constantly change sobmaps in every galaxies every frame (forcing constant sobmap rebalance) this in turn caused the error to happen within 24hours of the test launch and gave me a huge amount of memory to comb through to figure out exactly what was happening.

Once i finally put my finger on the pattern that was in cause here (the 4 bytes just before the "current" node in the map were always pointing toward a node that was before the current node even tho that pointer should have been the "next" node) i was able to write a scripts to compare the bug with some of the odd 150ish old unresolved server crash dump i have on disk and confirm that the same pattern was found in 91 of those 150 dumps, which was a great sign, it meant that i had finally found a way to reproduce one of the major remaining "odd' crashes, that is, the crashes that when you look at the crash data, it doesnt make any sense / tell you anything as to what caused it.

After that, it was merely a matter of adding a bit of log to track down thread context switch to be sure it was indeed a multiprocessor issue alongside a validation on the data so that it would crash right away when said memory mismatch happened and sure enough, it never happened when the thread was not switched to another context, nor did it ever happen when it was switched to another core on the same processor, but it did happen rarely on processor context changes (less than 1 out of 10000 of the times).

Once i indeed confirmed that the issue was what i suspected i merely added some memory barrier to our sobmap rebalance that enforce all invalidated l2 cache flush (and wait for it to complete) before and after a rebalance, this added a few nanoseconds to the sobmap tree rebalance but made it multiprocessor safe at the same time, another option would have been to set its content to volatile but that would have had a much more serious performance impact since it then never read from cache and always re-query from ram (which is slow as hell in comparison).

This is obviously just an overview of the problem / process to fix it. I skipped a lot of details to try to keep is simple to understand (as simple as that kind of problem can be to understand)

Author:  DarkSteel [ Thu Apr 07, 2016 7:14 am ]
Post subject:  Re: Weekly Dev Meeting - 6th April

Everytime Jey posts one of these I'm reminded why he's a god among men

Page 1 of 2 All times are UTC - 5 hours
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/