Page 1 of 2

The "MySQL has gone away" mystery

Posted: Tue Jun 02, 2015 5:22 pm
by John Adams
I spent a little time this afternoon researching (again) why the MySQL server is refusing connections to New Telon after a certain period of time. I've determined there's a couple of things going on on my end, one of which I have no control but the others (high-availability, load balancing) I can spend some time trying to offer. It's good knowledge, regardless.

The info to follow is specific to the MySQL has gone away issue we see on New Telon occasionally. From this article, I can see a few scenarios why this is happening and thought I would post some thoughts on it for anyone else who experiences this non-descriptive problem.

First, the VGOEmulator problem(s) -
[quote="MySQL dot com"]The problem on Windows is that in some cases MySQL does not get an error from the OS when writing to the TCP/IP connection to the server, but instead gets the error when trying to read the answer from the connection. [/quote]
This was originally an issue long ago, and theFoof solved it by adding the mysql_ping() function to the world code.

In common/Database.cpp:

Code: Select all

void Database::PingDatabase() {    mysql_ping(&mysql);} 
and in world/Net.cpp:

Code: Select all

//Time in MS to ping the database (300000 = 5 minutes)#define MYSQL_PING_INTERVAL 300000..bool Net::Process() {    PacketStruct *packet_struct;    ClientPacketData *data;    if (mysql_ping_timer->Check())        database.PingDatabase(); 
We added this ping, every 5 minutes, to keep the MySQL connection open even if there is no activity. We did not add this to LoginServer, because Login is constantly doing maintenance every minute or so, thus keeping it's MySQL connection alive.

I mention this because I noticed something specific on the Windows 7 machine (New Telon) that I do not see on the LoginServer/VGODev Server machine (Linux) and that is Windows seems to be the only one who occasionally gives this error:
[quote]16:27:57.303 D Char Loading 12 characters for account_id: 72
16:27:57.303 E Database Error running MySQL query (2006): MySQL server has gone away
SELECT `character_id`,`firstname`,`lastname`,`race`,`adventure_class`,`gender`,`adventure_level`,`current_chunk_fk`,
`hidehood`,`hidehelm`,`hidecloak`
FROM `characters`
JOIN `character_details` ON `character_id`=`char_id_fk` WHERE `account_id`=72
AND `is_deleted`=0
LIMIT 12

16:27:57.303 D Char Loaded 0 characters for account_id: 72[/quote]
Once Windows 7 gets this disconnect one time, regardless of our PingDatabase() call every 5 mins, the world never reconnects to MySQL. This is bad, and should probably be looked into (World catching this error and doing something more robust than simply trying to ping again)


The second thing I'm sure is happening (and this is environment specific):
[quote="MySQL dot com"]It is also possible to see this error if host name lookups fail (for example, if the DNS server on which your server or network relies goes down). This is because MySQL is dependent on the host system for name resolution, but has no way of knowing whether it is working—from MySQL's point of view the problem is indistinguishable from any other network timeout. [/quote]
New Telon runs on a VM on Windows 7; The MySQL database server is another stand-alone VM on another ESX host, thus a separate IP; Lastly, the web services are also their own dedicated VM, with a 3rd IP. All use an internal DNS, on yet a 4th VM and IP. In a perfect world, separation does not matter.

In my world, one piece goes down, the entire house of cards falls. Especially when it's my webserver and/or DNS. Either machine get borked, every VM on that cluster node tends to say "meh, get back to me when you get your shit together" and crashes.

The reason the DNS fails is that the VM is on the same ESX host with the Web Server, and for the same reasons that take out web services, IRC and the game machines for days (until someone tells me) this causes DNS requests to stop working. So the innocent New Telon -and- MySQL machine on the 2nd cluster node do not have an issue, but can no longer do DNS lookups to the ESX cluster that is running rampant with a crashing web server.


Action Items:
  • VGO Devs should investigate a more robust MySQL Error trap and connection recovery in the code[/*:m]
  • I will pursue two things: splitting up web/db/game services even further, and investigate HA/LB configurations for my services (should have them regardless) for Apache/IIS and MySQL services[/*:m][/list:u]


    I cannot stop "bad people" from attacking my web server, for whatever reason they feel they need to do that. I thought I had preventive measures in place, but apparently it's not enough (and I have yet to catch what it is). Hopefully once I offer HA/LB, it might be moot and the script kiddies can go bother someone elses free fun time.

Re: The "MySQL has gone away" mystery

Posted: Sun Jun 21, 2015 9:27 pm
by Scatman
MySQL disabled auto-reconnect recently. Maybe enabling this flag would help?

https://dev.mysql.com/doc/refman/5.7/en ... nnect.html

Edit:
Oh, and MariaDB is the next big thing..juussst sayin

Re: The "MySQL has gone away" mystery

Posted: Mon Jun 22, 2015 7:21 am
by John Adams
I do love Maria <3

Re: The "MySQL has gone away" mystery

Posted: Mon Jun 22, 2015 7:44 am
by Kandra
o/~When I'm with her I'm confused, out of focus and bemused. And I never know exactly where I am. o/~

Re: The "MySQL has gone away" mystery

Posted: Sun Aug 30, 2015 12:15 pm
by John Adams
First time *ever* in my history of working with emulators, I just had a MySQL has gone away on my localhost (everything on the local machine) just out of the blue. I'm actually starting to blame Windows or Network over our code, since this is so very random. If it was code, it would be consistent, I would think.

Log:
[quote]12:05:11.268 D Char Saved successful for character 'Bob' (1)
12:05:11.933 D Combat Processing 11 Damage
12:05:13.474 D Chunk In ChunkServer::CheckSpawnRespawns: Calling master spawn list to respawn spawn.
12:05:13.559 D Combat Processing 2 Damage
12:05:15.976 D Combat Processing 8 Damage
12:05:16.561 D Combat Processing 4 Damage
12:05:19.563 D Combat Processing 6 Damage
12:05:20.020 D Combat Processing 7 Damage
12:05:22.565 D Combat Processing 7 Damage
12:05:24.060 D Combat Processing 3 Damage
12:05:25.567 D Combat Processing 1 Damage

-- This seemed to happen before the world even reported there was a SQL error
12:05:28.000 D Char ToggleIsOnline: Setting player offline
12:05:28.000 D Combat Processing 3 Damage
12:05:28.000 D Char ToggleIsOnline: Setting player offline
12:05:28.002 I UDP Client from 127.0.0.1:50124 set to disconnect : Timeout



-- Suspicious that I got kicked so abruptly, THEN a SQL update tried to toggle me offline, and the server was gone. Locking problem?
12:05:28.003 E Database Error running MySQL query (2013): Lost connection to MySQL server during query
UPDATE `characters` SET is_online = 0 WHERE character_id = 1
12:05:28.003 E Database Error running MySQL query (2006): MySQL server has gone away
SELECT COUNT(DISTINCT account_id) FROM characters WHERE is_deleted = 0
12:05:28.003 I UDP Client from 127.0.0.1:50123 set to disconnect : Timeout


-- The rest is complete failure
12:05:28.004 E Stats Error counting player accounts in WorldDatabase::WriteServerStatisticsCounts : 6213
12:05:28.004 E Database Error running MySQL query (2006): MySQL server has gone away
SELECT COUNT(character_id) FROM characters WHERE is_deleted = 0
12:05:28.004 E Stats Error counting player characters in WorldDatabase::WriteServerStatisticsCounts : 6229
12:05:28.004 E Database Error running MySQL query (2006): MySQL server has gone away
SELECT ROUND(AVG(adventure_level)) FROM characters WHERE is_deleted = 0
12:05:28.005 E Stats Error averaging character levels in WorldDatabase::WriteServerStatisticsCounts : 6245
12:05:28.005 E Database Error running MySQL query (2006): MySQL server has gone away
SELECT ROUND(AVG(crafting_level)) FROM characters WHERE is_deleted = 0
12:05:28.005 E Stats Error averaging character levels in WorldDatabase::WriteServerStatisticsCounts : 6261
12:05:28.006 E Database Error running MySQL query (2006): MySQL server has gone away
SELECT ROUND(AVG(diplomacy_level)) FROM characters WHERE is_deleted = 0
12:05:28.006 E Stats Error averaging character levels in WorldDatabase::WriteServerStatisticsCounts : 6277
12:05:28.045 D Chunk In ChunkServer::CheckSpawnRespawns: Calling master spawn list to respawn spawn.
12:05:28.053 D Combat Processing 16 Damage
12:05:28.054 W Combat ProcessAttack() Chunk_client not found for character = Bob


-- Not sure what these New client's are about, I was literally at my desktop after being booted from the world
12:05:28.068 I UDP New client connected from 127.0.0.1:50124
12:05:28.069 I UDP Client from 127.0.0.1:50124 set to disconnect : Application
12:05:28.070 I UDP New client connected from 127.0.0.1:50124
12:05:28.070 I UDP Client from 127.0.0.1:50124 set to disconnect : Application
12:05:28.071 I UDP New client connected from 127.0.0.1:50124
12:05:28.071 I UDP Client from 127.0.0.1:50124 set to disconnect : Application

-- what the? why so many?
12:05:31.000 I UDP Client from 127.0.0.1:50123 has been removed.
12:05:31.000 I UDP Client from 127.0.0.1:50124 has been removed.
12:05:31.001 I UDP Client from 127.0.0.1:50124 has been removed.
12:05:31.002 I UDP Client from 127.0.0.1:50124 has been removed.
12:05:31.002 I UDP Client from 127.0.0.1:50124 has been removed.



12:05:31.005 I Chunk Starting chunk shutdown timer for Khal.
12:05:31.006 D Char ToggleIsOnline: Setting player offline
12:05:31.006 E Database Error running MySQL query (2006): MySQL server has gone away
UPDATE `characters` SET is_online = 0 WHERE character_id = 1
12:05:31.006 E Database Error running MySQL query (2006): MySQL server has gone away
UPDATE `characters`
SET `race`=14, `gender`=0, `current_x`=-327.000000, `current_y`=-54708.000000, `current_z`=1253.000000, `current_pitch`=0, `current_yaw`=25248, `current_roll`=0, `adventure_subclass`=0, `adventure_level`=1, `adventure_xp`=0,`crafting_level`=1, `crafting_xp`=0, `diplomacy_level`=1, `diplomacy_xp`=0, `current_chunk_fk`=85, date_lastseen=UNIX_TIMESTAMP(), is_online = 0,is_anon = 0, is_gm = 0, is_dev = 0
WHERE character_id = 1
12:05:31.007 E Char SaveCharacter: Failed to save character at d:\dev\export\vgoemu\trunk\src\world\worlddatabase.cpp:2345. Removing Client.
12:05:31.007 D Guilds Bob is not in a guild.
12:05:32.280 D Combat Processing 12 Damage
12:05:32.281 W Combat ProcessAttack() Chunk_client not found for character = Bob
12:05:36.010 E Database Error running MySQL query (2006): MySQL server has gone away
SELECT COUNT(`config_name`) FROM `variables` WHERE `config_name`='world_gametime'


-- So, who was taking damage if I was gone?
12:05:36.492 D Combat Processing 8 Damage
12:05:36.493 W Combat ProcessAttack() Chunk_client not found for character = Bob
12:05:40.700 D Combat Processing 4 Damage
12:05:40.700 W Combat ProcessAttack() Chunk_client not found for character = Bob
12:05:44.910 D Combat Processing 17 Damage
12:05:44.911 W Combat ProcessAttack() Chunk_client not found for character = Bob
12:05:49.115 D Combat Processing 17 Damage
12:05:49.116 W Combat ProcessAttack() Chunk_client not found for character = Bob
12:05:53.320 D Combat Processing 13 Damage
12:05:53.322 W Combat ProcessAttack() Chunk_client not found for character = Bob
12:05:57.522 D Combat Processing 9 Damage
12:05:57.523 W Combat ProcessAttack() Chunk_client not found for character = Bob
12:06:01.722 D Combat Processing 5 Damage
12:06:01.724 W Combat ProcessAttack() Chunk_client not found for character = Bob
12:06:05.929 D Combat Processing 1 Damage
12:06:05.930 W Combat ProcessAttack() Chunk_client not found for character = Bob
12:06:10.135 D Combat Processing 17 Damage
12:06:10.136 W Combat ProcessAttack() Chunk_client not found for character = Bob[/quote]
The interesting news here is watching the client get bounced - it literally CLOSED my client, which I think is a result of recent "kick that client if something's wrong" efforts. However, notice the ProcessAttack() does not give up Something we'll handle when we get to tuning Combat.

Re: The "MySQL has gone away" mystery

Posted: Thu Sep 24, 2015 4:26 pm
by Xinux
Happened today.

Code: Select all

08:36:49.576 D Char     Saved successful for character 'Kerfuffle' (930)
08:36:49.597 D Char     Saved successful for character 'Kerfuffle' (930)
08:36:49.619 D Char     Saved successful for character 'Kerfuffle' (930)
08:36:49.627 E UDP      Read timed out after 2000 milliseconds! Disconnecting this client to be safe.
08:36:49.628 D Char     ToggleIsOnline: Setting player offline
08:36:49.782 E UDP      Read timed out after 2000 milliseconds! Disconnecting this client to be safe.
08:36:49.783 D Char     ToggleIsOnline: Setting player offline
08:36:59.657 E Mutex    Timeout trying to acquire mutex Database::m_Query's write lock lock after 10000 milliseconds
08:36:59.657 E Database Error running MySQL query (2006): MySQL server has gone away
SELECT COUNT(DISTINCT account_id) FROM characters WHERE is_deleted = 0
08:36:59.658 E Stats    Error counting player accounts in WorldDatabase::WriteServerStatisticsCounts : 6286
08:36:59.658 E Database Error running MySQL query (2006): MySQL server has gone away
UPDATE `characters` SET is_online = 0 WHERE character_id = 930
08:36:59.660 E Database Error running MySQL query (2006): MySQL server has gone away
SELECT COUNT(character_id) FROM characters WHERE is_deleted = 0
08:36:59.660 E Stats    Error counting player characters in WorldDatabase::WriteServerStatisticsCounts : 6302
08:36:59.660 E Database Error running MySQL query (2006): MySQL server has gone away
select c.friend_id,c.character_id,a.character_id,(a.firstname) name,a.adventure_level,a.is_online,adventure_class,a.crafting_level,a.crafting_class, b.coord_x,b.coord_y,a.date_lastseen,a.is_anon from characters a , chunks b ,character_friends c where a.is_online =1 and b.chunk_id=a.current_chunk_fk and c.friend_id =930 and a.character_id !=930
08:36:59.662 E Database Error running MySQL query (2006): MySQL server has gone away
SELECT ROUND(AVG(adventure_level)) FROM characters WHERE is_deleted = 0
08:36:59.663 E Stats    Error averaging character levels in WorldDatabase::WriteServerStatisticsCounts : 6318
08:36:59.663 E Database Error running MySQL query (2006): MySQL server has gone away
UPDATE `characters` 
SET `race`=7, `gender`=0, `current_x`=-37559.000000, `current_y`=-3548.000000, `current_z`=295.000000, `current_pitch`=0, `current_yaw`=31606, `current_roll`=0, `adventure_subclass`=0, `adventure_level`=1, `adventure_xp`=10,`crafting_level`=1, `crafting_xp`=0, `diplomacy_level`=1, `diplomacy_xp`=0, `current_chunk_fk`=1, date_lastseen=UNIX_TIMESTAMP(), is_online = 0,is_anon = 0, is_gm = 0, is_dev = 0  
WHERE character_id = 930
08:36:59.664 E Database Error running MySQL query (2006): MySQL server has gone away
SELECT ROUND(AVG(crafting_level)) FROM characters WHERE is_deleted = 0
08:36:59.664 E Char     SaveCharacter: Failed to save character at c:\vgoemu\src\world\worlddatabase.cpp:2385. Removing Client.
08:36:59.665 E Stats    Error averaging character levels in WorldDatabase::WriteServerStatisticsCounts : 6334
08:36:59.666 E Database Error running MySQL query (2006): MySQL server has gone away
SELECT ROUND(AVG(diplomacy_level)) FROM characters WHERE is_deleted = 0
08:36:59.667 E Stats    Error averaging character levels in WorldDatabase::WriteServerStatisticsCounts : 6350
08:36:59.680 D Char     ToggleIsOnline: Setting player offline
08:36:59.681 E Database Error running MySQL query (2006): MySQL server has gone away

Re: The "MySQL has gone away" mystery

Posted: Sun Sep 27, 2015 11:12 pm
by Nilso
Hi,
Dunno if it is in context but I had "MySQL server has gone away" when filling (creating the World) the database and having not set innodb parameters and not split insert statements in "chunks" to not overload MySQL.
I mean it may be a commit size too big (?).

Regards.

BTW : Thanks for the actual tremendous work done !

Re: The "MySQL has gone away" mystery

Posted: Thu Oct 01, 2015 5:28 pm
by John Adams
I just happened to be sitting here posting something on VGOEmu when the post Preview seemed to lock up, just hung there. I tried logging into the webserver box real quick to see what MySQL was doing, but couldn't even connect to the machine. I then tried launching vSphere client to see what condition the VM was in, and couldn't even get into that! In my last desperate attempt, I simply tried to ping http://www.google.com, and got nothing.

Meanwhile, New Telon and all my other emulators started throwing MySQL has gone away! errors. Tada. Finally, after all this time, it happened right before my eyes and now I can narrow down what the problem likely is; I have either a bad switch or the NAS is freaking out (which hosts all the virtual machine drives) and literally everything in my empire comes to a complete halt once the AD/DNS server vanishes due to network timeouts, and the rippling effect is that every machine using the database loses DNS and doesn't recover (the lock up lasts over 5 minutes, though I have no idea why yet).

So! I can stop blaming MySQL and go back to blaming my internal network and see if rebuilding a few VMs on a different NAS or local SSD fixes the issue. I am definitely going to move game servers databases to their own machines for a while, too. See if maybe we're just ass-pounding my poor webserver too much with 50+ connections.

Re: The "MySQL has gone away" mystery

Posted: Thu Oct 01, 2015 6:11 pm
by Glaxono
Sounds like a faulty switch or a faulty HBA on your DNS server. And an easily tested with a back-up switch and/or HBA Is your AD/DNS a guest on one of your hosts?

Re: The "MySQL has gone away" mystery

Posted: Fri Oct 02, 2015 7:33 am
by John Adams
The DNS servers vmdk is indeed resident on the NAS, so when it was hiccuping, everything halted. After writing this post (3 hrs later) I discovered Disk 1 in the NAS threw a fault and needed to be swapped. The RAID is rebuilding now, though I still believe there's a problem with intermittency in my home network. At least now there's a clue as to why this is happening.

The simplest thing for us (VGO) to do is fix the reconnect issue with MySQL - as all the other emulators I run reconnect fine - VGO is the only one who, once the connection is lost, it's gone forever.