World Crash - Reload NPCs
Moderators: Community Managers, Developers
Re: World Crash - Reload NPCs
I know I've been meaning to redo the entire reloading process, but you know how it goes.
I'd done it this way to minimize reload times. With the current spawns, it takes upwards of 30 seconds, and during this time the whole Chat Server is blocked. 30 seconds doesn't seem like long, but... it's long :p I figured splitting into Chunks would lessen the reload times.
However, in doing it this way, I'd clusterfucked the reload of the _placement and _entries data structs, so they needed to be in a separate .reload if I was to do this. It is the _entry and _placement structs that take so long to reload (there are 180k of each). It makes no sense now. It needs a slight redo (nothing major, as I'll detail below).
Here's what I'm going to do, and we'll see how it works. If it doesn't work, we'll revert to the original plan:
Plan Now: remove the subsets of spawns - Have only .reload spawns [all] . What will be reloaded by chunk or by 'all' is the _placement and _entry records, and this is because I don't believe you can ixnay the chunk information from the _placement and _entry tables (if you plan to, let me know how), and it will still might be valuable to load 3000 records instead of 180,000, if all that's needed is the current chunk's spawns. The entire spawn, spawn_appearance, spawn_attachments, objects (will we ever rename this to ppo?), sound, music, mover, vehicle tables will be reloaded regarding of specifying 'all'. Only the spawn_location_placements and spawn_location_entry tables will be picked out.
Plan Original: Just have everything reloaded each time. It a matter of deleting like 10 lines to make this happen.
I'd done it this way to minimize reload times. With the current spawns, it takes upwards of 30 seconds, and during this time the whole Chat Server is blocked. 30 seconds doesn't seem like long, but... it's long :p I figured splitting into Chunks would lessen the reload times.
However, in doing it this way, I'd clusterfucked the reload of the _placement and _entries data structs, so they needed to be in a separate .reload if I was to do this. It is the _entry and _placement structs that take so long to reload (there are 180k of each). It makes no sense now. It needs a slight redo (nothing major, as I'll detail below).
Here's what I'm going to do, and we'll see how it works. If it doesn't work, we'll revert to the original plan:
Plan Now: remove the subsets of spawns - Have only .reload spawns [all] . What will be reloaded by chunk or by 'all' is the _placement and _entry records, and this is because I don't believe you can ixnay the chunk information from the _placement and _entry tables (if you plan to, let me know how), and it will still might be valuable to load 3000 records instead of 180,000, if all that's needed is the current chunk's spawns. The entire spawn, spawn_appearance, spawn_attachments, objects (will we ever rename this to ppo?), sound, music, mover, vehicle tables will be reloaded regarding of specifying 'all'. Only the spawn_location_placements and spawn_location_entry tables will be picked out.
Plan Original: Just have everything reloaded each time. It a matter of deleting like 10 lines to make this happen.
Re: World Crash - Reload NPCs
I will make a second post here about a related crash, however, which will be resolved as I refactor the above.
The way our Mutex has been set up for Npc Objects (rather the NpcList object) is that Mutexing is done internally on most functions (when I grab Add a NPC to the npc_list, the mutex locking/unlocking is done inside the Add function). This means that we cannot Mutex Lock an entire process that shouldn't be interrupted. It may be that the thread switches at the wrong time. When this crash happens, it has always been, for me, in ChunkServer::CheckSpawnSendsAndRemoves() in the SendSpawnIteration macro.
The way our Mutex has been set up for Npc Objects (rather the NpcList object) is that Mutexing is done internally on most functions (when I grab Add a NPC to the npc_list, the mutex locking/unlocking is done inside the Add function). This means that we cannot Mutex Lock an entire process that shouldn't be interrupted. It may be that the thread switches at the wrong time. When this crash happens, it has always been, for me, in ChunkServer::CheckSpawnSendsAndRemoves() in the SendSpawnIteration macro.
Re: World Crash - Reload NPCs
Commited Rev 852. I was able to queue up enough .repop commands that I was running around IoD for 6-7 minutes with no crash, and did this multiple times. Previously, although the crash was random, queueing up about 50 .repop commands would be enough to see the crash.
Note that this blocks thread access to the npc_list HARD, but during normal gameplay, the .repop operation wouldn't be used excessively, so this is acceptable.
Log Notes:
UPDATE ON UNSTABLE REV 843:
The issue identified on REV 843 (Crash on .repop) should be resolved.
I've grown the Mutex around NPCList (var npc_list) to include the option to manually Lock and Unlock the Mutex, as opposed to using each function's individual Lock and Unlock.
This is so groups of operations can be protected via the Mutex, rather than just individual functions, should the need arise.
To do this, when calling functions on npc_list, add a true bool as the last argument (ie. npc_list.Add(Npc object, true). This will NOT protect the operations by Mutex, so you must lock the mutex prior to calling in this fashion (and unlock once complete).
Note that this blocks thread access to the npc_list HARD, but during normal gameplay, the .repop operation wouldn't be used excessively, so this is acceptable.
Log Notes:
UPDATE ON UNSTABLE REV 843:
The issue identified on REV 843 (Crash on .repop) should be resolved.
I've grown the Mutex around NPCList (var npc_list) to include the option to manually Lock and Unlock the Mutex, as opposed to using each function's individual Lock and Unlock.
This is so groups of operations can be protected via the Mutex, rather than just individual functions, should the need arise.
To do this, when calling functions on npc_list, add a true bool as the last argument (ie. npc_list.Add(Npc object, true). This will NOT protect the operations by Mutex, so you must lock the mutex prior to calling in this fashion (and unlock once complete).
Re: World Crash - Reload NPCs
[quote="Lokked"]Commited Rev 852. I was able to queue up enough .repop commands that I was running around IoD for 6-7 minutes with no crash, and did this multiple times. Previously, although the crash was random, queueing up about 50 .repop commands would be enough to see the crash.[/quote]
What about .reload commands? I was running .reload npcs then would do a .repop npcs which is where the crash is happening.
What about .reload commands? I was running .reload npcs then would do a .repop npcs which is where the crash is happening.
Re: World Crash - Reload NPCs
From John's post of the crash log:
WorldServer.exe!memcmp(const void * lhs, const void * rhs, unsigned int siz) Line 255 C
WorldServer.exe!CommandProcess::CommandReloadNPCs(std::shared_ptr<Client> & client, Separator * sep, unsigned char command_index, bool world_client) Line 1270 C++
It highlights that there was still an inappropriate function being used to check whether or not you'd typed in 'all' after '.reload npcs'. I've addressed this. At some point, the method to parse the command string (.reload npcs, or .reload npcs all) was changed to address some Compiler Warnings, and the chosen method did not work. We'd changed it to something more robust, but overlooked the .reload set of functions. Sorry about that!
NOTE that the .reload command now only has 'spawns' for spawn related reloads. I've removed the 'npcs', 'ppos' and 'entries' as they were confusing. Initially, I'd separated out what was reloaded because it takes so long, but now I've refined the .reload process. It now properly only reloads data related to the chunk you are in, and this takes about 10 seconds as opposed to around 45 seconds for the whole server. If you still want to reload the whole server, the command is .reload spawns all.
WorldServer.exe!memcmp(const void * lhs, const void * rhs, unsigned int siz) Line 255 C
WorldServer.exe!CommandProcess::CommandReloadNPCs(std::shared_ptr<Client> & client, Separator * sep, unsigned char command_index, bool world_client) Line 1270 C++
It highlights that there was still an inappropriate function being used to check whether or not you'd typed in 'all' after '.reload npcs'. I've addressed this. At some point, the method to parse the command string (.reload npcs, or .reload npcs all) was changed to address some Compiler Warnings, and the chosen method did not work. We'd changed it to something more robust, but overlooked the .reload set of functions. Sorry about that!
NOTE that the .reload command now only has 'spawns' for spawn related reloads. I've removed the 'npcs', 'ppos' and 'entries' as they were confusing. Initially, I'd separated out what was reloaded because it takes so long, but now I've refined the .reload process. It now properly only reloads data related to the chunk you are in, and this takes about 10 seconds as opposed to around 45 seconds for the whole server. If you still want to reload the whole server, the command is .reload spawns all.
Re: World Crash - Reload NPCs
[quote="Lokked"]From John's post of the crash log:
WorldServer.exe!memcmp(const void * lhs, const void * rhs, unsigned int siz) Line 255 C
WorldServer.exe!CommandProcess::CommandReloadNPCs(std::shared_ptr<Client> & client, Separator * sep, unsigned char command_index, bool world_client) Line 1270 C++
It highlights that there was still an inappropriate function being used to check whether or not you'd typed in 'all' after '.reload npcs'. I've addressed this. At some point, the method to parse the command string (.reload npcs, or .reload npcs all) was changed to address some Compiler Warnings, and the chosen method did not work. We'd changed it to something more robust, but overlooked the .reload set of functions. Sorry about that!
NOTE that the .reload command now only has 'spawns' for spawn related reloads. I've removed the 'npcs', 'ppos' and 'entries' as they were confusing. Initially, I'd separated out what was reloaded because it takes so long, but now I've refined the .reload process. It now properly only reloads data related to the chunk you are in, and this takes about 10 seconds as opposed to around 45 seconds for the whole server. If you still want to reload the whole server, the command is .reload spawns all.[/quote]
Thanks for the clarification
WorldServer.exe!memcmp(const void * lhs, const void * rhs, unsigned int siz) Line 255 C
WorldServer.exe!CommandProcess::CommandReloadNPCs(std::shared_ptr<Client> & client, Separator * sep, unsigned char command_index, bool world_client) Line 1270 C++
It highlights that there was still an inappropriate function being used to check whether or not you'd typed in 'all' after '.reload npcs'. I've addressed this. At some point, the method to parse the command string (.reload npcs, or .reload npcs all) was changed to address some Compiler Warnings, and the chosen method did not work. We'd changed it to something more robust, but overlooked the .reload set of functions. Sorry about that!
NOTE that the .reload command now only has 'spawns' for spawn related reloads. I've removed the 'npcs', 'ppos' and 'entries' as they were confusing. Initially, I'd separated out what was reloaded because it takes so long, but now I've refined the .reload process. It now properly only reloads data related to the chunk you are in, and this takes about 10 seconds as opposed to around 45 seconds for the whole server. If you still want to reload the whole server, the command is .reload spawns all.[/quote]
Thanks for the clarification
- John Adams
- Retired
- Posts: 4582
- Joined: Wed Aug 28, 2013 9:40 am
- Location: Phoenix, AZ.
- Contact:
Re: World Crash - Reload NPCs
[quote="Lokked"]I'd done it this way to minimize reload times. With the current spawns, it takes upwards of 30 seconds, and during this time the whole Chat Server is blocked. 30 seconds doesn't seem like long, but... it's long :p I figured splitting into Chunks would lessen the reload times.[/quote]
This is likely for another topic, but (imo) there is no reason loading 180k records from MySQL should be that slow, especially if they are not ORDER BY or GROUP BY in any way. If they are, we probably need to drop that and just load the data. I'd like to set some benchmarks around the SQL calls (LogTrace(LOG_DATABASE....), which is normally noop), and watch how fast it loads 180k placements. My guess from doing it in SQL GUI: 1.020s
So if the C++ code is adding the extra 29s to that, something is horribly wrong, wouldn't you agree? Which brings me to my point: that this is likely the crux of the matter - and has been for some time (I am sure I've bellyached about spawn Load times before - even asked Scat to look at it, but he has no time).
Thoughts? Separate topic?
This is likely for another topic, but (imo) there is no reason loading 180k records from MySQL should be that slow, especially if they are not ORDER BY or GROUP BY in any way. If they are, we probably need to drop that and just load the data. I'd like to set some benchmarks around the SQL calls (LogTrace(LOG_DATABASE....), which is normally noop), and watch how fast it loads 180k placements. My guess from doing it in SQL GUI: 1.020s
So if the C++ code is adding the extra 29s to that, something is horribly wrong, wouldn't you agree? Which brings me to my point: that this is likely the crux of the matter - and has been for some time (I am sure I've bellyached about spawn Load times before - even asked Scat to look at it, but he has no time).
Thoughts? Separate topic?
Re: World Crash - Reload NPCs
I would not agree with a portion of your statement, John. Yes, the SQL does not take long. What takes long is looping through the returned query and constructing the objects. That's a ton of data structs being built in memory. Jabantiz would know what to compare against with his work on EQ2EMU. I'd had a brief conversation with him and from it determined we were doing pretty good, although it's certainly possible he didn't have all the facts.
Are we talking about the same load times? The load times to startup a chunk is different than the times I'm talking about. 2 things happen, overall, regarding spawns:
- At World Server startup and during the .reload spawns command, the DB is queries and the Spawn Structs are built. These structs are just a collection of the DB records, for the most part. This process takes approx. 45 seconds for the server's worth of structs to be built (Npcs, PPOs, et al). This is only done a single time, unless of course .reload spawns is run.
- When a chunk is loaded or when the .repop command is run, the chunk's Npc, PPO, 3DS, Mover and Vehicle objects are built out of the Spawn Struct information. This takes all the Spawn Struct information and creates the objects, then spawns then in-game. This process takes less than 1 second. You can see how long this takes by running .repop while watching your server console. As soon as you see the green text denoting how many spawns were created in the chunk, it's done. In-game, it takes a half-second or so longer, as the "CheckandSendSpawn() (I may have mislabeled this) function is run approx. every half second.
From memory, and I hope Jabantiz can confirm or correct this, EQ2EMU loads ~35,000 spawns in around 20 seconds and we load 180,000 spawns in 45 seconds.
Are we talking about the same load times? The load times to startup a chunk is different than the times I'm talking about. 2 things happen, overall, regarding spawns:
- At World Server startup and during the .reload spawns command, the DB is queries and the Spawn Structs are built. These structs are just a collection of the DB records, for the most part. This process takes approx. 45 seconds for the server's worth of structs to be built (Npcs, PPOs, et al). This is only done a single time, unless of course .reload spawns is run.
- When a chunk is loaded or when the .repop command is run, the chunk's Npc, PPO, 3DS, Mover and Vehicle objects are built out of the Spawn Struct information. This takes all the Spawn Struct information and creates the objects, then spawns then in-game. This process takes less than 1 second. You can see how long this takes by running .repop while watching your server console. As soon as you see the green text denoting how many spawns were created in the chunk, it's done. In-game, it takes a half-second or so longer, as the "CheckandSendSpawn() (I may have mislabeled this) function is run approx. every half second.
From memory, and I hope Jabantiz can confirm or correct this, EQ2EMU loads ~35,000 spawns in around 20 seconds and we load 180,000 spawns in 45 seconds.
- John Adams
- Retired
- Posts: 4582
- Joined: Wed Aug 28, 2013 9:40 am
- Location: Phoenix, AZ.
- Contact:
Re: World Crash - Reload NPCs
Yeah, we're talking about the same thing; world startup which doesn't really concern me all that much, except that it doesn't feel "right" to be so slow. I think Jabantiz is smoking crack, since he never uses my `eq2raw` database which has over 52,000 parent spawn records, and well over 500,000 placements. Far as I remember, that world starts just as quickly as a nearly empty one. EQ2's slow load isn't spawns; it's items, and spells. Spawns, considering how many tables are being linked and loaded, are pretty amazingly fast.
I do not assume you have written anything incorrectly. I'm first assuming we have a bad design from the ground up between the database functions and loaders. None of this code is a copy/paste from EQ2Emu (except the Rules system I stole) so it's very likely something is just not as efficient as it should be. That's the basis of my claim -- and you can tell me I'm wrong; I'll believe you. I'm just a user, who sees things differently between two like systems.
FYI: right now, EQ2Emu has only 2,800 parents and 15,000 placements in the Dev DB, so that is definitely speedy.
I do not assume you have written anything incorrectly. I'm first assuming we have a bad design from the ground up between the database functions and loaders. None of this code is a copy/paste from EQ2Emu (except the Rules system I stole) so it's very likely something is just not as efficient as it should be. That's the basis of my claim -- and you can tell me I'm wrong; I'll believe you. I'm just a user, who sees things differently between two like systems.
FYI: right now, EQ2Emu has only 2,800 parents and 15,000 placements in the Dev DB, so that is definitely speedy.