[Solved] Reproducable CTD
Moderators: Community Managers, Developers
Re: Reproducable CTD
Are you still able to reproduce this, Volt?
I'm trying right now. Here is what I do:
Log in Account 1. Wait until it's fully loaded in.
Log in Account 2. Once it is logged in (both accounts are saved to IoD), I force close with the red X.
Wait...
Account 2s Client objects, both world and chunk, destruct about 30 seconds later. The WorldCharacter object stays in game.
About 30 second after this, the WorldCharacter object is destroyed, and Account 2s replicated pawn is removed from the game.
Account 1 remains in game.
What am I doing wrong to reproduce this?
I'm trying right now. Here is what I do:
Log in Account 1. Wait until it's fully loaded in.
Log in Account 2. Once it is logged in (both accounts are saved to IoD), I force close with the red X.
Wait...
Account 2s Client objects, both world and chunk, destruct about 30 seconds later. The WorldCharacter object stays in game.
About 30 second after this, the WorldCharacter object is destroyed, and Account 2s replicated pawn is removed from the game.
Account 1 remains in game.
What am I doing wrong to reproduce this?
Re: Reproducable CTD
Hm.
I used account 1 = a halfling in Temple of Dailuk and account 2 = a high elf in Leth Nurae. Brute force quit'ed the client for account 2, and client for account 1 crashed out milliseconds before WS console showed "Character HighElf Set to Offline". I.e. issue reproduced.
But ..
Then I decided to mimic just what I think you did Lokked, just to verify a second time. I made two lesser giants on IoD. Brute force quit'ed the client for account 2. Issue NOT reproduced. I logged them back in, moved them around the world, did about 4 more tests. Issue NOT reproduced.
Finally, I logged back my halfling and my high elf. Issue reproduced.
I am running out of time for now, but this is intriguing. I can't see the pattern to why it is reproducable on everyting else I tried except the two lesser giants I created on IoD.
Tested on my local installation. Rev 850 with a new 850 database. Yes, all characters are freshly made, 1 yesterday and the other 3 today.
I used account 1 = a halfling in Temple of Dailuk and account 2 = a high elf in Leth Nurae. Brute force quit'ed the client for account 2, and client for account 1 crashed out milliseconds before WS console showed "Character HighElf Set to Offline". I.e. issue reproduced.
But ..
Then I decided to mimic just what I think you did Lokked, just to verify a second time. I made two lesser giants on IoD. Brute force quit'ed the client for account 2. Issue NOT reproduced. I logged them back in, moved them around the world, did about 4 more tests. Issue NOT reproduced.
Finally, I logged back my halfling and my high elf. Issue reproduced.
I am running out of time for now, but this is intriguing. I can't see the pattern to why it is reproducable on everyting else I tried except the two lesser giants I created on IoD.
Tested on my local installation. Rev 850 with a new 850 database. Yes, all characters are freshly made, 1 yesterday and the other 3 today.
"Gaze in amazement adventurer"
Re: Reproducable CTD
Xinux and I were playing with this last night. He got it to happen once, but I'm not sure what he did different that time.
I did try in 2 different chunks, however, I only tried it after logging both chars into IoD and then rifting to a chunk and force closing.
Did you look at your VGclient's log file after CTDing? I hope to be able to reproduce this so I can look. I wonder if there are any clues about an unopened channel being used. Here is why I've found, but may not be exactly related to this issue, because I'm only able to make this happen when Force Quitting and logging back in:
- Client A and B are logged in, within view of each other.
- Client B force quits and logs in again, immediately.
- Client B cannot see Client A, but can see chat and emotes from them. Client A CAN see Client B.
This is because, somehow, Client Bs list of what PCs have been replicated to it has not been cleared. The server thinks Client B has already received the IsOpen packet for Client A, so it doesn't send the packet to open the unreal channel, and thus Client B can not replicate Client A. What's fascinating (or lucky), is that Client B doesn't crash. Normally, when VGClient receives a non-Open packet on a closed channel, it crashes, as we've seen if VGClient logs before. I think what's happening is an NPC/PPO is opening that particular channel in Client B, and that channel is being sent Client A's replication information (which probably doesn't parse correctly anyways).
More Sleuthing is required.
I did try in 2 different chunks, however, I only tried it after logging both chars into IoD and then rifting to a chunk and force closing.
Did you look at your VGclient's log file after CTDing? I hope to be able to reproduce this so I can look. I wonder if there are any clues about an unopened channel being used. Here is why I've found, but may not be exactly related to this issue, because I'm only able to make this happen when Force Quitting and logging back in:
- Client A and B are logged in, within view of each other.
- Client B force quits and logs in again, immediately.
- Client B cannot see Client A, but can see chat and emotes from them. Client A CAN see Client B.
This is because, somehow, Client Bs list of what PCs have been replicated to it has not been cleared. The server thinks Client B has already received the IsOpen packet for Client A, so it doesn't send the packet to open the unreal channel, and thus Client B can not replicate Client A. What's fascinating (or lucky), is that Client B doesn't crash. Normally, when VGClient receives a non-Open packet on a closed channel, it crashes, as we've seen if VGClient logs before. I think what's happening is an NPC/PPO is opening that particular channel in Client B, and that channel is being sent Client A's replication information (which probably doesn't parse correctly anyways).
More Sleuthing is required.
- John Adams
- Retired
- Posts: 4582
- Joined: Wed Aug 28, 2013 9:40 am
- Location: Phoenix, AZ.
- Contact:
Re: Reproducable CTD
Good lord; Are we "crossing streams"? (Ghostbuster theme...)
I remember Collector getting confused if 2 clients were on the same box, and I thought that was completely impossible due to how it hooks the window by handle... and you would think, 2 different processes use 2 different handles, no? Apparently not to Vanguard. Is this the same kind of thing, where 2 clients from the same IP are somehow "merging"? I thought this was impossible, and why God invented ports and connections.
If our comms are this screwed up, it's a miracle anything is working at all. Awesome sleuthing.
[img]http://mkgmediagroup.com/wp-content/upl ... anclap.gif[/img]
I remember Collector getting confused if 2 clients were on the same box, and I thought that was completely impossible due to how it hooks the window by handle... and you would think, 2 different processes use 2 different handles, no? Apparently not to Vanguard. Is this the same kind of thing, where 2 clients from the same IP are somehow "merging"? I thought this was impossible, and why God invented ports and connections.
If our comms are this screwed up, it's a miracle anything is working at all. Awesome sleuthing.
Re: Reproducable CTD
This was hard but here is how it goes:
The server continuously checks for "hearbeats" on both the world and the chunk connection for a VGClient. The chunk connection plays pretty nice, regularly sending packets that can be interpreted as heartbeats. The world connection on the other hand may send these relatively seldom. I have seen examples were it took 31 seconds for the world connection to get a packet it interpreted as a heartbeat. If the server has waited 40 seconds for a heartbeat, it will close the connection.
So, we have a function where heartbeats are "propagated" from the chunk connections to world connection. This is smart. Problem is, it had a bug and did not properly identify which chunk connections belonged to which world connections, unless all accounts were in the same chunk.
Effect being that the first logged in account would often get heartbeats propagated from many or all other accounts. Meaning 1) that the first logged in account could get a 40 sec old heartbeat from another account that just exited (and thus CTD according to this bug report), 2) the other accounts would often not get any heartbeats propagated from their chunk clients to their world clients (assumingly making them at risk to be disconnected prematurely).
Committed as Rev 878.
Lokked, about your finding about unreal channels (two posts up), I suggest you bring that to a new thread. Sounds like there is more bug smashing to do.
The server continuously checks for "hearbeats" on both the world and the chunk connection for a VGClient. The chunk connection plays pretty nice, regularly sending packets that can be interpreted as heartbeats. The world connection on the other hand may send these relatively seldom. I have seen examples were it took 31 seconds for the world connection to get a packet it interpreted as a heartbeat. If the server has waited 40 seconds for a heartbeat, it will close the connection.
So, we have a function where heartbeats are "propagated" from the chunk connections to world connection. This is smart. Problem is, it had a bug and did not properly identify which chunk connections belonged to which world connections, unless all accounts were in the same chunk.
Effect being that the first logged in account would often get heartbeats propagated from many or all other accounts. Meaning 1) that the first logged in account could get a 40 sec old heartbeat from another account that just exited (and thus CTD according to this bug report), 2) the other accounts would often not get any heartbeats propagated from their chunk clients to their world clients (assumingly making them at risk to be disconnected prematurely).
Committed as Rev 878.
Lokked, about your finding about unreal channels (two posts up), I suggest you bring that to a new thread. Sounds like there is more bug smashing to do.
"Gaze in amazement adventurer"
Re: [Solved] Reproducable CTD
Nice work Volt.
Re: [Solved] Reproducable CTD
Nice work, Volt! This is really exciting! Thank you so much for investing the time and figuring this out! The solution evaded me and was driving me bonkers. I think this solution of yours will boost morale across the board.
I will create a new thread for the Unreal Channel issue.
I will create a new thread for the Unreal Channel issue.
Re: [Solved] Reproducable CTD
"Gaze in amazement adventurer"
- John Adams
- Retired
- Posts: 4582
- Joined: Wed Aug 28, 2013 9:40 am
- Location: Phoenix, AZ.
- Contact:
Re: [Solved] Reproducable CTD
What a nice holiday gift. Excellent work, as always.