Page MenuHomeFeedback Tracker

Mass desync resulting in Server FPS 0, sometimes server crash
New, WishlistPublic

Description

Same issue that people have been complaining about for some time is still occurring on my Altis Life server which I develop for/moderate. Occurs with high player counts mainly. Sometimes the server will recover, other times it will completely desync (people running in place, vehicles flying about, etc.) and require a server restart to be playable and for people to get in the server. I've used the #monitor command and it isn't an issue with traffic necessarily, just the server FPS will drop to 0. Sometimes it recovers after a couple minutes, others not.
This is what shows in the RPT
Server: Network message xxxxxx is pending

We tried adding guaranteedUpdates false to our basic cfg, didn't fix the issue.

Details

Legacy ID
1764493516
Severity
None
Resolution
Open
Reproducibility
Always
Category
Multiplayer
Steps To Reproduce

Get a high playercount, then watch the server population get killed by desync

Edit: playercount doesn't even have to be that high before the server starts self-destructing with desync and pending network requests.

Additional Information

Event Timeline

KBW edited Steps To Reproduce. (Show Details)Sep 18 2014, 4:00 AM
KBW edited Additional Information. (Show Details)
KBW set Category to Multiplayer.
KBW set Reproducibility to Always.
KBW set Severity to None.
KBW set Resolution to Open.
KBW set Legacy ID to 1764493516.May 7 2016, 7:27 PM
KBW added a subscriber: KBW.
KBW added a comment.Sep 19 2014, 12:29 AM

New RPT: http://pastebin.com/4jMmWQKX
At this point the server completely locked up and wouldn't allow anyone to join due to the insane desync/pending network requests.

Are you using a lot of Sync calls with extDB?

KBW added a comment.Sep 20 2014, 3:03 AM

No more than other servers i would imagine, and others are not really complaining about the issue.

In regards to sync requests, i would think an overflow of those would lag further database sync requests, not the arms 3 server and it functionality.

It will lag the server, since it will have to wait for each request to return before starting next one. Sounds about right that problem occurs with high player count.

Bohemia added a subscriber: Bohemia.May 7 2016, 7:27 PM

Altis Life uses ASYNC function for extDB....
Really don't know anyone that uses SYNC calls in extDB.

Basicly the query is put into a worker queue asap + client is given a unique id.
Server uses the Unique ID to see if the result is ready or not.
This way there is minimal time wasted with a blocking engine call.

Also Altis Life has 2 global locks so there is only 1 Database Query that is execute at a time.

There are some long Query Compete Times at the end of your RPT log, but that's prob due to your Server FPS taking a hit.

Personally i would look into your basic.cfg + revisit any code that spams the network.
You could also just wait for the next stable arma patch to fix mp sync issues, might be related.

@Torndeco_ Well, maybe it is not how your extension handles calls, but the way how your extension is called. One look at this

https://github.com/TAWTonic/Altis-Life/blob/master/extDB-Build/life_server/Functions/MySQL/fn_asyncCall.sqf

which I suppose is the calling script, and I can tell you straight away you cannot have waitUntil and sleep in the script and expect it to return result as well. You either call it or spawn it. In worse scenario while true loop in this script will ignore sleep delay and spam your extension with calls. That would be enough to choke the server.

Damn session timed out....

Ok simple version...

Altis Life calls extDB it sents it a SQL query
extDB returns a Unique ID to fetch the result later
no longer blocking engine

SQL Query is been run in another thread.

Altis Life sleep random time before it calls extDB.
Altis Life uses Unique ID to fetch the result.

It the returned string == "[3]". This means the SQL Query is still busy...
This is why there is a sleep to stop spamming the extension.
This sleep works + been tested.

Otherwise Altis Life keeps calling the extension using the Unique ID till it receives an empty string.
So basicly you keep calling the extension + adding the results together until you receive an empty string (means no more to fetch).
Altis Life then compiles the string for the result.

---------

This code is relatively unchanged since 1 July when Altis Life first started to use extDB.
Most changes to extDB been logging changes + other Protocols which Altis Life don't use.

Since 1 other report of similar issue of extreme FPS Drop, that user reports it happens @ random interval and happens with just 1 player (trying to debug issue) or high players.
http://www.altisliferpg.com/topic/6270-troubleshooting-server-fps/#entry41051

---------

If this was issue with spamming extension it would be more of an issue...
Plus i would have seen some evidence of it happening in all the trace DEBUG extDB logs, which have timestamped entries for each time arma calls the extension.
Its why there is a sleep in there while loop, + it works

Also his message pending issues happen well before his SQL Complete Times go high in his rpt log entires. Points towards server FPS drop causing the high Complete Times.

But if author wants to be sure...
Grab debug version of extDB, edit extdb-conf.ini change logging to trace.

Note: Bug with trace results (showing input string instead of output).
But it will still show you the timestamps for all the arma callExtensions calls

Worse case scenario arma calls extDB 3 times a second.

Which checks a unordermap if the unique_id (integer) is ready or not.
Which due to the locking in fn_async for Altis Life RPG, normally only has 0->1 entry in the map.
Its not exactly a high intensive task.

Welcome to prove me wrong, but i don't see the problem been arma is spamming the extension or long blocking times by the extension.

----------

Otherwise there would have been alot more reports of this issue in the last 2 months.
Most likely its due to a bug in arma / basic.cfg / or some bad sqf code.

Bad SQF Code like spawning hundreds of fn_async till all the waitUntil in the code cause server fps to drop like a stone.
Only heard of that from someone trying to stress test extDB.

Or possible arma issue especially with the recent mp sync issues / players using up bandwidth for no reason etc..

"Welcome to prove me wrong, but i don't see the problem been arma is spamming the extension or long blocking times by the extension."

This was a suggestion. If the script is called from an event handler (which I assumed initially), waitUntil or sleep would simply not force the script to wait or sleep, hence spamming. But I don't know if this is the case. It could all well be running in scheduled environment from the start. I cannot answer this without looking deeper into the code. Let's hope you are right :)