Page MenuHomeFeedback Tracker

LD_PRELOADing segfaults Linux dedicated server because of unitialized pthread_key_t
Closed, ResolvedPublic

Description

Hi all,

I just wanted to try to LD_PRELOAD the Intel TBB memory allocator proxy (c.f. https://software.intel.com/en-us/node/506097) to the arma3server in order to replace 'malloc()' and friends.

I'm working on a Ubuntu 12.04 (amd64) with lib32gcc1 and libc6:i386 installed.

However, my arma3server process segfaults with that LD_PRELOADING.

I digged into this and it seems like the LD_PRELOADing itself causes glibc's 'dlerror.c:init()' to be called fairly early.
That 'dlerror.c:init()' makes the first call to 'pthread_key_create()' which returns a key equal to '(pthread_key_t)0', i.e. the first one.

Later on, arma3server calls 'pthread_setspecific()' with the key parameter set to whatever is stored at address 0x9ae86c0.
The problem: address 0x9ae86c0 never gets written to (verified by a watchpoint in gdb). Since it is located in the .bss section, it gets initialized with zero.
Thus, in effect, arma3server sets a thread specific value for a key originally allocated within glibc's dl-framework.

Now, glibc's 'dlerror.c:_dlerror_run()' writes some internal stuff to the address stored at that key '(pthread_key_t)0'.
arma3server seems to expect some call tables to be located at the address stored at key '*(pthread_key_t*)0x9ae86c0 == (pthread_key_t)0'.
Thus, I get a segmentation fault whenever arma3server tries to follow one of these call tables overwritten by the glibc's dl-framework.

In the end, the bug is not to initialize the 'pthread_key_t' stored in .bss at 0x9ae86c0 (static variable?) by means of 'pthread_key_create()'.

Details

Legacy ID
3568753664
Severity
None
Resolution
Open
Reproducibility
Always
Category
Dedicated Server
Steps To Reproduce
  1. export LD_LIBRARY_PATH=...
  2. LD_PRELOAD=libtbbmalloc_proxy.so.2 ./arma3server -port=27002 -name=foo -world=empty -noSound

This will result in a segmentation fault.

Additional Information

Where exactly arma3server segfaults depends on the command line, but in all my tests, it DID segfault somewhere.

Event Timeline

nics edited Steps To Reproduce. (Show Details)Apr 23 2014, 1:47 PM
nics edited Additional Information. (Show Details)
nics set Category to Dedicated Server.
nics set Reproducibility to Always.
nics set Severity to None.
nics set Resolution to Open.
nics set Legacy ID to 3568753664.May 7 2016, 6:28 PM
nics edited a custom field.
nics added a comment.Apr 23 2014, 1:49 PM

I suspect that any LD_PRELOADing will segfault arma3server since the real reason seems to be that the dl-framework is entered early.
However, I observed that issue with the Intel TBB allocators.

nics added a comment.Apr 23 2014, 1:55 PM

server version string seems to be "1.16.113494"

AFAIK, the Linux server don't support custom allocators and uses as default only Linux kernel allocator

nics added a comment.Apr 23 2014, 2:23 PM

LD_PRELOADing is a (unixish) technique to change a program's behaviour without its explicit support, i.e. without changing its source (see 'man ld.so' on any Linux box).

It would be a very cheap way of actually supporting custom memory allocators without any coding.

As I wrote above, LD_PRELOADing is not working due to a bug unrelated to LD_PRELOADing at a first glance (usage of unitialized static variables).

nics added a comment.Apr 24 2014, 1:10 PM

To give you guys some more input:

I did succeed to fix the bug described above by manually injecting some machine code doing the missing initalization into the 'arma3server' binary and LD_PRELOAD works for me now.

My benchmark mission is a MP mission using the @ALIVE mod to automatically spawn some AI. My test system is a remote one, being equipped with an Intel Xeon E3 with TurboBoost enabled.

Results (FPS had been measured with '#monitor':

  • Without Intel TBB allocator LD_PRELOADed:
  • The mission, especially @ALIVE, takes incredibly long to load (up to 15 min).
  • FPS are very unstable.
  • FPS with only 35 AI spawned is 3 to 15, 3 far more often.
  • With Intel TBB allocator LD_PRELOADed:
  • The mission, including @ALIVE, loads fast. Although not measured explicitly, it seems to load faster than with the dedicated server for MS Windows operating under Wine.
  • FPS is stable.
  • FPS with 243 AI spawned is 35-45. Even at spawning, the FPS doesn't go below 30. With only 35 AI spawned, it is constantly at 49.

I suspect that the gain in performance is not so much related to the exact memory allocation implementation, but to the fact that Intel TBB's 'mallinfo()' implementation is a nop, i.e. it always returns zero. C.f. the issue I opened at http://feedback.arma3.com/view.php?id=18487

nics added a comment.May 5 2014, 7:44 AM

In case you keep debugging symbols for your released binaries somewhere:
For the 1.18 binary, the uninitialized pthread_key_t variable is located at 0x09b0a5a0.

For the case you haven't, I'll give you some more details to locate the bug in your sources:
I've looked a second time at this and actually, there _is_ an initialization function called at program startup: It is located at 0x08154e60.
It is called from the function at 0x0805e270 which is stored in the .init_array (ELF-)section and thus is called at program initialization time.

It looks like 0x08154e60 is the constructor of some C++ object with static storage duration (not necessarily declared with the 'static'-keyword).
That object wraps a pthread_key_t member.
[That 0x0805e270 stored in the .init_array section is certainly a compiler generated helper function calling all your static storage duration objects' constructors.]

This constructor at 0x08154e60 even calls pthread_key_create(), but it throws the result away and stores a hardcoded zero in its pthread_key_t member.
This is the bug.

So go, grep for "pthread_key_create" in your sources, pick the one called from a C++ object's constructor and check that constructor for the bug described above.
If there is more than one such class, choose the one which is instantiated with static storage duration, i.e. for which there is a global/static variable of that class' type.

Should be a 10min fix, right? To be honest, I've got no idea why this issue stays in the "new"-state for nearly two weeks now. At least someone could have hit the magic "Reviewed"-button in the meanwhile (or asked a question if something is unclear).

k0rd added a subscriber: k0rd.May 7 2016, 6:28 PM
k0rd added a comment.May 5 2014, 8:02 AM

some guy on the forums is working on his own chrooted environment with custom glibc that has mallinfo() returning a cached struct.

http://forums.bistudio.com/showthread.php?169926-Linux-Dedicated-Server-feedback/page18

I know, it's a *bit* off-topic, but I wanted to help point you in another direction since you don't seem to be getting much feedback here.

nics added a comment.May 5 2014, 12:25 PM

Thanks k0rd for pointing me to that forum entry. However, I'm, already using a solution which is far more easier and achieves the same, c.f. my last comment at http://feedback.arma3.com/view.php?id=18487#c70219

The reason I want to use the Intel TBB allocators instead of glibc's malloc + nop-mallinfo is another 10% performance boost I observed on our server with our mission.

k0rd added a comment.May 5 2014, 1:59 PM

hey nics - just curious - do you still need to inject asm to work this, or did you figure that out?

reason why i ask - i don't like to inject code into things that have anti-hack sentries, ya know?

nics added a comment.May 5 2014, 2:05 PM

You still need to inject. However, 1.18 broke it for me due to http://feedback.arma3.com/view.php?id=18706

I don't have a working Intel TBB solution for 1.18 at this time, but I'm working on it.

Btw, the dedicated server part doesn't seem to have any anti-cheat protections, at least I didn't run into any problems.

dedmen closed this task as Resolved.May 18 2020, 10:50 AM