Page MenuHomeFeedback Tracker

Server gets stuck since BeClient.dll Fix - hitting 2Gb memory limit rapidly
Assigned, WishlistPublic

Description

Issue started appearing after Fridays BeClient.dll fix:
https://twitter.com/FoltynD/status/507894699420831745

First recorded event was @ 05/09/2014 22:30

  1. Server seems to hit the upper memory limit rather fast (2.104.688 KB) When you get 60 players connected, it takes between 4-5 hours for issue to appear.
  2. Server becomes stale. Does not crash, keeps consuming 2104 MB of Ram, keeps consuming 12% (every time) of CPU.

3)Server sometimes spawns a ghost process that needs to be manually shut down via Task Manager.
4)Players can not connect to the server. The server however reports the Player number, that have been present at time of crash inside the Gamebrowser and to Services like Gametracker via API.

  1. EPM Rcon still shows Player-number s+ IP + GUID + Ping at time of Crash, however no command via Rcon gets executed.
  2. Server keeps in this state until restarted manually or interfered with. Does not Crash on its own. generally creates no crash Dumps.

We had it happen a bit on friday (5th) [2 servers - once per server], a lot more on Saturday (6th) [4 servers - once each] and quite often on sunday (7th) [5 servers, 2.5 times on avarage each, until the players got sick of it]

Sitenote: before 1.24 the servers would not need to be restarted within 24-48 hours to stay performant. Never hit the 2 GB (36 hours) or perf-3GB Limit (48 hours). The same holds true for 1.26, when it was not crashing the server because of other unrelated issues.

Details

Legacy ID
602079472
Severity
None
Resolution
Open
Reproducibility
Always
Category
Server
Steps To Reproduce
  1. Start Server.
  2. populate with 60 players
  3. wait 4-5 hours
  4. wait for the coplaints of players asking why they can not join a server, even tho it shows available slots
  5. Server is stuck.
Additional Information

Servers have the following bit in their RPT's:

2014/09/08, 0:22:09 NetServer::finishDestroyPlayer(1768956691): DESTROY immediately after CREATE, both cancelled
2014/09/08, 0:23:43 NetServer::finishDestroyPlayer(1823127005): DESTROY immediately after CREATE, both cancelled
2014/09/08, 0:26:44 NetServer::finishDestroyPlayer(2052970268): DESTROY immediately after CREATE, both cancelled
2014/09/08, 0:27:32 NetServer::finishDestroyPlayer(100935339): DESTROY immediately after CREATE, both cancelled

2014/09/08, 0:32:10 NetServer::finishDestroyPlayer(1034569513): DESTROY immediately after CREATE, both cancelled


Exception code: C0000005 ACCESS_VIOLATION at 77051606
graphics: No
resolution: 160x120x32

Event Timeline

ECID edited Steps To Reproduce. (Show Details)Sep 8 2014, 1:21 AM
ECID edited Additional Information. (Show Details)
ECID set Category to Server.
ECID set Reproducibility to Always.
ECID set Severity to None.
ECID set Resolution to Open.
ECID set Legacy ID to 602079472.May 7 2016, 7:23 PM
ECID edited a custom field.
ECID added a subscriber: ECID.May 7 2016, 7:23 PM
ECID added a comment.Sep 8 2014, 1:44 AM

I managed to create a Manual Dump of a stuck arma3server.exe:

Taskmanager --> Detail --> Arma3server.exe -> Create Dump File.
It is 90Mb packed in RAR (1.7 GB unpacked). It is available on this download link:
https://www.dropbox.com/s/dciovms1qu5vwph/arma3server.manual.Dump.rar?dl=0

When i then tried to analyze the "wait chain", it caused the arma3server.exe to crash and create a crashdump.
It is available here:
https://www.dropbox.com/s/6zuvik7y91dsixk/server_4_Crash_-08_09_2014_infinite.zip?dl=0

Inch added a subscriber: Inch.May 7 2016, 7:23 PM
Inch added a comment.Sep 8 2014, 2:05 PM

Hi there,

May I ask some questions.
What mission is it that you're running?
How many objects does the mission have?

Other than the server side crashes you're getting those memory figures are normal for Altis. I have several servers that reach that working set quite easily.

Just for verification, could you do a file integrity check, it seems odd that you've started crashing server side after a clientside update.
Since the A3 server does not use the client.dll something else must be at fault here.
Could you also re-download the server.dll http://www.battleye.com/
Place this in the working config DIR/BattlEye.

Iceman added a comment.Sep 8 2014, 2:46 PM

Hey,
could you please try to reproduce it with Battleye disabled? We don't think it should depend on in but we'd like to be sure about it.

Thank you very much!

ECID added a comment.Sep 8 2014, 4:20 PM

@Iceman

running one server without BE now. Will report back with news.

ECID added a comment.Sep 8 2014, 9:50 PM

The BattlEye disabled server has now become stuck in the same manner.

disabling BattlEye did, as predicted, have no bearing on the servers crashing.

ECID added a comment.Sep 11 2014, 2:36 AM

Still happening. we have got 8 occurances of crashs on 5 different servers (windows 2012 / Ubuntu Wine) where it has happened during last 2 days.

Inch added a comment.Sep 10 2014, 9:09 PM

You're running eight instances on a E3? That is overselling those machines I expect you to have performance issue OR your customers if they fill their servers at the same time as other instances, you're pushing it with four full instances.

ECID added a comment.Sep 11 2014, 3:22 AM

That is supposed to state "instances of crashs". Should have been more thoughtful with wording - fixed in above thread.

We are running a maximum of 3 production arma3Servers per physical server. For a grant total of 5 production arma3 servers and 2 testing instances (only active when needed) .

Even if it where "8 instances on 5 servers", that is 1,6 arma instances per machine. But lets not split hairs here.

This is not a "hardware overloaded issue". It is about Arma3Server gobbling Ram like its candy, 6 times faster then it used to do (1.24), and the server then becoming stuck and not even crashing, so you can only see that something is wrong, by you joining the server and getting infinite loading screen.

ECID added a comment.Oct 2 2014, 5:37 AM

pretty sure this has been fixed since 1.30 . At least it has not appeared since then.