Page MenuHomeFeedback Tracker

Server Crash: NULL_POINTER_READ_c0000005
Need More Info, NormalPublic

Description

I had loaded up a test server with our usual mission file (Which has seemingly been crashing for multiple months, at multiple times a day without any resolution so far), standing still and in the pause menu.
No one else was in the server, and the server had only been started for about 4 minutes before it crashed.

There are only 2 "Stack_Text" lines according to WinDbg, so hopefully it can be found and fixed fairly easily

Details

Severity
Crash
Resolution
Open
Reproducibility
Random
Operating System
Windows 10 x64
Category
General
Steps To Reproduce

Unable to reproduce reliably, but happens at-least 2 times a day on our server.

Additional Information

Crash mdmp attached

RPT Fault Description
Version 2.10.149879
Fault time: 2022/10/02 21:56:07
Fault address: 03ED86F2 01:012676F2ll %ARMA_PATH%\arma3server_x64.exe
file: O_altis_life (__cur_mp)
world: Altis
Prev. code bytes: C2 B1 01 4D 85 C0 0F 84 B1 00 00 00 4D 8B 48 18
Fault code bytes: 49 8B 01 41 FF 40 10 49 89 40 18 49 3B C2 75 53

Event Timeline

Fraali created this task.Oct 3 2022, 4:11 AM
Tenshi added a subscriber: Tenshi.Oct 3 2022, 4:15 PM

Heya we analysed your crashlog.
Could you try setting the parameter "-malloc=system" and check if it resolves the isssue?

Tenshi changed the task status from New to Need More Info.Oct 3 2022, 4:15 PM
Fraali added a comment.Oct 4 2022, 2:14 AM

I will do that, it may take a while to see if it fixed the issue completely, but I very much appreciate the response!

Fraali added a comment.EditedOct 5 2022, 3:04 AM

It seems that this did not fix the issue. I applied "-malloc=system" to the start up params of our live server, and it crashed with seemingly the same fault address.
--EDIT--

Removed the mdmp due to it being one from a lot longer ago

Fraali added a comment.EditedOct 9 2022, 6:59 PM

I updated our server to the latest profiling build, (v06 - 2.10.149973) and the server still crashed. Hopefully the profiling build crash can help a bit.








--EDIT--
I've added all of the crash-logs since updating to the profiling build. The crashes seem to be the same consistency as without profiling, but also seem to have 3 or 4 different fault addresses from what I can see.

If you guys are unable to find or fix the issue due to it being so obscure, if we were to know at-least where exactly its faulting and potentially the function it originated, we might be able to modify things server-side in case its one of our scripts causing the issue.
This has been an issue for a long while, so it would be great to get this resolved finally.

Your issues seem to be caused by "futurecept", not by Arma 3

codeYeTi added a subscriber: codeYeTi.EditedNov 1 2022, 5:35 AM

Your issues seem to be caused by "futurecept", not by Arma 3

Thanks for the information. I had assumed that at the *very* least we were triggering some unexpected behavior with futurecept, even if not *entirely* misusing some API.

Any information you can provide us with regards to the stack trace (like notably what the game is trying to do when it crashes... we don't have debug syms, so everything is just an offset from ExportSVG for us), would help greatly. Given your suggestion regarding the allocator, I'm guessing that this is *likely* a race condition with a double-free, OR that access to the pooled resource allocators for some game data types is not completely thread-safe.

I know we cannot talk too much about this kind of thing here, but I would sing your praises pretty much forever if you could give me a little nudge in the right direction - my discord handle is mcoffin#1270, if that works as a communication platform for any of you.

I'd also be happy to add you to our source tracking repository for futurecept, should you want to at least take a look around, even if just for the sake of enjoyment :) - It's an intercept-based asynchronous futures system for SQF, currently with HTTP, MySQL, and apache kafka client implementations.

My current best guesses are as follows:

  1. The integer for refcounting game data types is not actually atomic on ArmA's side, causing a double-free condition - https://github.com/intercept/intercept/blob/344c11dae80a146554abf41cf860426fd697ea01/src/client/headers/shared/containers.hpp#L219 (Futurecept assumes that this is an atomic integer since ArmA *does* have multiple threads, so I assume it was atomic under the hood, even if it's not represented as such in the intercept API implementation(s))
  2. The allocator used for game_value types (containing r_string in this example), might not thread-safe internally, and futurecept had/has been assuming that it was, causing undefined behavior when two simultaneous calls trigger some kind of race condition (due to freeing game_data types from *outside* of the game-provided SQF threads) - https://github.com/intercept/intercept/blob/344c11dae80a146554abf41cf860426fd697ea01/src/client/headers/shared/containers.hpp#L631
  3. There could be an underlying race condition in disconnect scenarios, where futurecept continues to assume some object is non-nil, but due to something on the engine side, it becomes nil before futurecept asynchronously accesses it's data. I think this is NOT likely, given that we'd see more of futurecept in the stack trace if this were the case. We've seen these in the past when we first started down this path, but were able to resolve the issues ourselves due to the trace dumps being mostly inside of our futurecept code, unlike these (if I'm reading them correctly).

Looking forward to hearing from you, and big thanks for all the support so far, and thanks to Jay for taking the time to do the initial writeup for me here, despite me being the originating author and really only maintainer of futurecept as it stands currently