Page 1 of 1

Crash just standing around

Posted: Sat Aug 08, 2015 8:39 pm
by John Adams
We had 7 players online, testing some group/raid stuff, when I noticed at sometimes I would chat and it would not show up for 20-30s. And one time, it repeated the same thing I said about 5 times even though I only said it once. There were no visible signs of any mutex/lock issues during the chat delays, and it wasn't just chatting... it was group invites and /raidinvite (I'm positive I'll find another SleepMS() in there somewhere).

There was a lot of combat happening, but it was scrolling along fine. Then, we got this:

Code: Select all

20:28:45.996 E Mutex    Timeout trying to acquire mutex ChunkServerList::remove_chunk_s's write lock lock after 10000 milliseconds
Seemingly out of nowhere, and before I could even paste it into chat for Faux to see, the world ended. But because I was click happy, I missed the stack - but it was right above this:

Code: Select all

    //Remove old chunk servers    if (!remove_chunk_servers.empty()){        m_remove_chunk_servers.WriteLock();        m_chunk_servers.WriteLock();        for (auto& itr2 : remove_chunk_servers)            chunk_servers.erase(itr2->GetChunkID());        m_chunk_servers.WriteUnlock();        remove_chunk_servers.clear();        m_remove_chunk_servers.WriteUnlock();    }    if (window_title_update->Check())        UpdateWindowTitleSecond(Size()); 
Code was paused on if(window_title, but clearly the crash was the line above in the WriteUnlock. These are very rare, and I cannot seem to easily reproduce them. Pretty old bug, hopefully I can convince someone to review this code to see if it can be improved.

Re: Crash just standing around

Posted: Mon Aug 10, 2015 7:49 am
by John Adams
This has now happened two more times. I'll need someone to look into it asap.

Seems to only happen when I walk away from the console for the night. Sorry to those who had no access to NT since 6pm last night.

Re: Crash just standing around

Posted: Mon Aug 10, 2015 9:13 am
by zippyzee
Do you believe it is actually caused by the chunkserver removal code or something that happened around that time? I have no real clue as to what is going on that would be a problem in that section.

Can we add in some debug messages inside that code block to let us know what is hanging, if that is the case?

Re: Crash just standing around

Posted: Mon Aug 10, 2015 11:27 am
by John Adams
If I were to take a wild guess (and usually it's completely wrong hah) but maybe someone is locking a process in chunkserver but not unlocking it, so when it shuts down it just crashes after a timeout. We certainly can add logging, I just don't have time right now.

Re: Crash just standing around

Posted: Mon Aug 10, 2015 12:00 pm
by zippyzee
Don't worry; when I get back to messing with things I'll look into it. If someone beats me to it, then great!

Re: Crash just standing around

Posted: Mon Aug 10, 2015 4:43 pm
by Blackstorm
ChunkInfoList::GetLoadedChunkIDs()
-> m_InfoList.ReadUnlock();