Page MenuHomeFeedback Tracker

Linux MP Dedicated randomly segfaults during gameplay
Closed, ResolvedPublic

Description

During random gameplay, generally after roughly 3 hours of playtime, the server process will crash with a segfault reported in OS logs. The Arma server itself reports no fault and RPT files contain no crash awareness since the operating system simply terminates the process (SIGSEGV). Nothing about gameplay at the time, server load, or any player action seems significant or relevant in contributing to the crashes - they appear purely random but do only appear to occur after roughly 2-3 hours of playtime.

Typical log in /var/log/messages:
Apr 29 14:20:11 arma303 kernel: arma3server[2760]: segfault at 4b97e500 ip 0000000009b6f64f sp 00000000d03fe0b0 error 4 in arma3server[8048000+2693000]

This crash is mission and mod independent - I've witnessed these crashes on the following missions:

  • BECTI Warfare
  • Community made "classic" Escape (not the Bohemia one: you start in the Hesco prison)
  • Antistasi
  • Antistasi RHS (modded)

The server(s):
Virtual Machine(s) running on ESX6.5
Underlying physical host is a HP Proliant DL360 Gen7. 2x4 Core 2.13Ghz 124GB RAM rackmount server.
VM equipped with 2x CPU (2.13 GHz), 8GB RAM, 64GB Disk (LVM, all virtual drives sitting on directly attached physical drive)
OS's tested:

  • CentOS Linux release 7.7.1908 (Core)
  • CentOS Linux 8
  • CentOS Linux 7 (Alt Arch 32 bit i364)
  • Debian 10 Buster

Packages installed that differ from base minimal installation:

  • ld-linux.so.2 libstdc++.so.6 open-vm-tools python3 sysstat

The arma3server executable being run under standard user privileges with selinux off (permissive and disabled tested) with the following startup line:
/home/arma303/arma3/arma3server -name=arma303server -port=2312 -config=/home/arma303/arma3/server.cfg -mod=@RHSAFRF\;@RHSGREF\;@RHSUSAF >> serverconsolelog.log 2>&1
RCON and server management performed by py3rcon (revolving MOTD messages, automated shutdown and such).
Headless clients appear to make no difference to crash frequency.

Tests performed to isolate the issue:
We've tried simply logging back onto the server after the crash - usually the server will crash at some point within the next 1-3 hours.
Tried rebooting the VM after the crash - no change to crash frequency.
Tried assigning the VM more memory, up to 16GB - no change.
Tried removing all mods (generally 1 or 2 additional max) that arent RHS (meaning USAF, AFRF, SAF and GREF)- no change.
Tried removing persistent saves (such as the Antistasi save) and re-starting the campaigns/missions from scratch - no change.
Tried deleting the entirety of the arma 3 server install folder except server.cfg and reinstall from scratch via steamcmd - no change.
Tried running the server with no RCON interaction at all - no change.
Tried creating a new server completely from scratch and copying meaningful files over (like server.cfg, saves, etc) and re-running server - no change.
Tried moving virtual disks (VM) between NAS host and directly connected high performance disks - no change.
Manually physically reseated RAM within the underlying physical host server.
Run a full memtest on the entire underlying physical host server - all 120GB of memory being checked for almost 24 hours with 0 errors found. No change.

The only thing I have been able to do to prevent the crashes is to run the mission on Windows (2016 Datacenter) server using the x64 executable. In fact the exact same server configuration that was crashing every 3 hours has run for more than 24 hours stable by just moving it to Windows.

GDB register show indicates registers contain values breaching the upper limit of a 32 bit variable.

Following GDB output from a crash on the 29th April 2020 (crash04 is the associated log below).

[root@arma303 arma3]# gdb arma3server core.2746
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /home/arma303/arma3/arma3server...Missing separate debuginfo for /home/arma303/arma3/arma3server
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/a8/b7592a547af6c97391830d206a6133a579ebee.debug
(no debugging symbols found)...done.
[New LWP 2760]
[New LWP 2747]
[New LWP 2761]
[New LWP 2773]
[New LWP 2762]
[New LWP 2772]
[New LWP 2763]
[New LWP 2764]
[New LWP 2765]
[New LWP 2767]
[New LWP 2771]
[New LWP 2774]
[New LWP 3478]
[New LWP 2751]
[New LWP 2746]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Missing separate debuginfo for /home/arma303/arma3/libsteam_api.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/16/bcac040eda9bb32cd8504be30a429c9aa92331.debug
Missing separate debuginfo for /home/arma303/arma3/steamclient.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/12/8bd9ee9c294925ea030d60506e3cc16422bcf7.debug
Missing separate debuginfo for /home/arma303/arma3/libsteam.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/63/b30c1969f3486fe8711c54d21fdb29ac80bf19.debug
Core was generated by `./arma3server -name=arma303server -port=2312 -config=server.cfg -mod=@rhsafrf;@'.
Program terminated with signal 11, Segmentation fault.
#0 0x09b6f64f in ?? ()
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.i686 libgcc-4.8.5-39.el7.i686 libstdc++-4.8.5-39.el7.i686
(gdb) bt
Python Exception <class 'gdb.MemoryError'> Cannot access memory at address 0x80000004:
(gdb) info registers
eax 0xcb97e400 -879238144
ecx 0x80000000 -2147483648
edx 0x80000100 -2147483392
ebx 0xa6f89d0 175081936
esp 0xd03fe0b0 0xd03fe0b0
ebp 0x80000000 0x80000000
esi 0x10 16
edi 0xde1f63e0 -568368160
eip 0x9b6f64f 0x9b6f64f
eflags 0x10286 [ PF SF IF RF ]
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x0 0
gs 0x63 99

Details

Severity
Crash
Resolution
Open
Reproducibility
Always
Operating System
Linux x64
Operating System Version
Centos 7, Centos 8, Debian 10
Category
Game Crash
Steps To Reproduce
  1. Create Centos 7 Linux machine with 2cpu, 8gb ram
  2. Download files via SteamCMD
  3. Start server (I have performed 90% of my test-time with Antistasi 2.2.1)
  4. Play/generate events
  5. Segfault (in game manifests as yellow chain link then red chain link then no message received) occurs generally within 2-4 hours.

I can personally say I've had guaranteed reproducability with this, but I have not tried on other's servers or hosting the arma server on another physical server, so I wouldnt be surprised if this was potentially more difficult to reproduce by others. I did notice the vast majority of servers in the server list playing antistasi were being hosted on Windows.

Additional Information

Additional context provided in my original post here https://a3antistasi.enjin.com/forum/m/38173738/viewthread/33138963-linux-mp-dedi-rhs-randomly-segfaults/post/138741743#p138741743.

I wouldnt be surprised if this was simply an issue with a 32 bit executable - are there plans for a x64 bit arma linux server executable?

Event Timeline

We've now seen 3 different certified instances of this fault manifesting on different physical systems, virtual systems and operating systems - all 32 bit Linux.

https://a3antistasi.enjin.com/forum/m/38173738/viewthread/33138963-linux-mp-dedi-rhs-randomly-segfaults

This comment was removed by BIS_fnc_KK.
dedmen closed this task as Resolved.EditedMay 18 2020, 9:42 AM
dedmen claimed this task.
dedmen added a subscriber: dedmen.

Fixed a big memory corruption issue recently, on perf/prof and 1.99. I'd just say that was it.

Not sure if I'd call this resolved, it appears to still be happening on all centos boxes:

Apr 26 14:42:36 arma303 kernel: arma3server[13193]: segfault at 1ea02000 ip 0000000009b6f64f sp 00000000d03fe2e0 error 4 in arma3server[8048000+2693000]
Apr 26 17:35:48 arma303 kernel: arma3server[2340]: segfault at 4b68c000 ip 0000000009b6f64f sp 00000000d03fe2c0 error 4 in arma3server[8048000+2693000]
Apr 29 14:20:11 arma303 kernel: arma3server[2760]: segfault at 4b97e500 ip 0000000009b6f64f sp 00000000d03fe0b0 error 4 in arma3server[8048000+2693000]
May 21 08:01:11 arma303 kernel: arma3server[20332]: segfault at 1b31f500 ip 0000000009b6f8cf sp 00000000cfffdf60 error 4 in arma3server[8048000+2693000]
May 22 02:42:30 arma303 kernel: arma3server[15201]: segfault at 1e01b100 ip 0000000009b6f8cf sp 00000000cfffdf80 error 4 in arma3server[8048000+2693000]
May 23 02:37:25 arma303 kernel: arma3server[22987]: segfault at 9 ip 000000000a02b515 sp 00000000f6bff2ec error 4 in arma3server[8048000+2693000]
May 24 00:08:57 arma303 kernel: arma3server[18863]: segfault at 1777fd00 ip 0000000009b6f8cf sp 00000000cfffdec0 error 4 in arma3server[8048000+2693000]
May 24 02:40:55 arma303 kernel: arma3server[5352]: segfault at fc72400 ip 0000000009b6f8cf sp 00000000cfffe150 error 4 in arma3server[8048000+2693000]
May 24 03:21:19 arma303 kernel: arma3server[15637]: segfault at 45d26500 ip 0000000009b6f8cf sp 00000000cfffdff0 error 4 in arma3server[8048000+2693000]
May 24 05:21:18 arma303 kernel: arma3server[18429]: segfault at 12f5f900 ip 0000000009b6f8cf sp 00000000cfffdfc0 error 4 in arma3server[8048000+2693000]
May 24 05:44:50 arma303 kernel: arma3server[26490]: segfault at 44456400 ip 0000000009b6f8cf sp 00000000cfffe340 error 4 in arma3server[8048000+2693000]

(gdb) bt full
#0 0x0a02b515 in ?? ()
No symbol table info available.
#1 0xf75250ee in clone () from /lib/libc.so.6
No symbol table info available.
(gdb) info registers
eax 0xffffffff -1
ecx 0xf6f6a790 -151607408
edx 0xa02b4d8 167949528
ebx 0x0 0
esp 0xf69fd2ec 0xf69fd2ec
ebp 0xf69fd428 0xf69fd428
esi 0xf6f6a79c -151607396
edi 0x0 0
eip 0xa02b515 0xa02b515
eflags 0x210296 [ PF AF SF IF RF ID ]
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x0 0
gs 0x63 99

would need a dump, but I can't check linux dumps currently. Maybe mods or extensions? maybe x64 linux server fixes it..