Server lags out and RPT spammed when Zeus RC is killed
New, NormalPublic

Description

During our weekly large-scale FTX (aprox. 60 players against Zeus-generated OPFOR), we (again) experienced an issue which caused the server to grind to a halt and ultimately caused us to abandon the mission.

One of our Zeus operators was remote controlling a unit.
When that unit was killed, no body was left behind.
At the same time, the server RPT started getting spammed (hundreds per second) with the below errors:

"21:21:12 Unit O Charlie 3-3:4 (0xdcea0180) - network ID 6:580 - no person"
"21:21:40 Server: Network message 4ce2bd7 is pending"

(obvious variation on message IDs etc in the RPT)

The logged-in admin noted the server FPS dropped to 0, and it basically never recovered.

Perspective from one of our Zeuses is here:
https://clips.twitch.tv/FitAbstemiousPartridgeStinkyCheese

In this clip, you'll notice that the teleport attempt which immediately follows has a huge delay between moving members of the group, and radio chatter turns to people reporting tech issues etc.

I'm currently seeking additional footage from Zeus' in case there is any additional information to be gleaned.

What I'm looking for really is answers to one, if not both, of the below:

  1. What causes this? Both in terms of the body not being created when the AI was killed and the knock-on effect of the server dying; and
  2. How can we fix/avoid/work around this issue?

The RPT itself is over 70MB.

Details

Severity
Major
Resolution
Open
Reproducibility
Random
Operating System
Windows 10 x64
Category
Server
Additional Information

Server is a Windows Dedicated server with config served by TADST
No major changes to mission template or mod set since other recent operations, all of which went without issue
We saw a similar issue ~3 months ago, but not since, with 6 of these large operations taking place per month

LuckySpoon edited Additional Information. (Show Details)Jun 18 2018, 11:43 AM

Additional feedback from the primary Zeus:

I believe I may be able to re-create the crash, I think its the combination of controlling AI with members of that AI's squad already dead, and then deleting the dead bodies. Ares usually tells you that the unit is been RC'd by another Zeus Operator if you try to delete the unit, however, I think that having all the Ares slots full and multiple Operators controlling OPFOR in the same squad can trip the system up and allow you to delete a unit been RC'd. We need to test this and see.

Edit: An additional note, at the time of the crash I was not cleaning up any units, I was spectating SPC Loster RC'ing that one unit, I do however remember the last time this happened I was cleaning up units. I think the "21:21:12 Unit O Charlie 3-3:4 (0xdcea0180) - network ID 6:580 - no person" means that the unit Charlie 3-3:4 existed on the map and was under RC, however, the unit then despawned, leaving Ares technically in RC mode with no attached unit. A classic logic error that would cause a repeating server log.

Some isolated testing has been performed, to different but possibly related results. A frame drop by 15 with nothing much going on could cause a drop to 0 with 60+ players and a full AO.

Jumped into the official server with 4 of us Zeus Operating and, I put the same template on as last night and tried to recreate the conditions. We notices a few things

  1. I was able to drag units that where been RC'd
  2. I was able to delete units that were been RC'd
  3. Killing (hitting end) and then deleting a unit that was been RC'd caused a good amount of server lag. I would need the RPT file checked to see if we managed to create the same error.

Unfortunately, not the same error in the RPT SSG,

"16:14:39 Server: Object 92:955 not found (message Type_121)"

Did FPS hit 0 as it did yesterday?
How long before the server recovered?

Negative the frames dropped by about 15 and all Zeus operators experienced a lot of desync, it took about 10 seconds for everyone to catch up.

This occurred again tonight on multiple occasions. It was noted that when the Zeus Operator who was remote controlling the AI disconnected, the server was able to restore some frames.

Luckily, we have two streamers out of our 3 Zeus operators.

This is one assistant operator, their only role is to remote control units.
https://www.youtube.com/watch?v=O9Q3SUpVqmk&
Crash 1: 0:45:40 -- interesting TP behaviour around 0:46:08
Crash 2: 1:02:30 (approx, not noticable from this perspective)
Crash 3: 1:16:30 (approx, same as above)

This is our primary operator, the logged in Admin who places all units.
https://www.twitch.tv/videos/282427458
Crash 1: 0:46:00 -- interesting TP behaviour around 0:46:16
Crash 2: 1:02:15 (based on the #monitor results)
Crash 3: 1:16:30 (approx)

Wulf added a subscriber: Wulf.Jul 24 2018, 5:11 PM

Hello.

Is it possible to recreate this issue with a unmoded Arma? Could you also please upload the rpt with the spam?

Thank you.

LuckySpoon added a comment.EditedWed, Sep 12, 7:24 AM

Latest RPT is available here

It wouldn't really be feasible for me to test this unmodded I don't think, but I'll try and set something up.