Page MenuHomeFeedback Tracker

Improvement on the BIS_fnc_arrayShuffle
Closed, ResolvedPublic

Description

I was able to speed up the BIS_fnc_arrayShuffle function a bit with some modification and also fixed a slight bug with the random number being biased against the first and last element by swapping the "round" with "floor" call. It still uses the modern Fisher-Yates shuffle algorithm to generate a random distribution of the finite array set. It also uses the new push_back function that has recently been added in report ID 0019447.

A more detailed analysis and test cases can be found on my blog post here which include screenshots and refactoring the code here: http://jamiesharpe.info/blog/?p=400 {F24277} {F24278} {F24279} {F24280} {F24281}

Details

Legacy ID
2051683511
Severity
None
Resolution
Fixed
Reproducibility
Always
Category
Scripting
Steps To Reproduce

I've added a file which contains a compiled single player mission, the mission editor file, and also an RPT log file which contains logs of both this method of shuffling the array and the BIS_fnc_arrayShuffle one. These logs can be found near the bottom.

Upon loading the mission the there will be four function calls. The first two is the BIS_fnc_arrayShuffle working on an array of length four and one hundred, reporting the processing times to the rpt file. The next two calls will be the same data but called from the modified array shuffle named JSharpe_fnc_ImpArrayShuffle.

Tested in both the editor and as a packaged pbo for single player missions and both results are in favour of the new method.

This has to be tested on the DEV build as it uses the new push_back function.

Additional Information

The new function source code is as follows:

private["_array", "_copy", "_result", "_index"];
_array = [[_this], 0, [], [[]]] call BIS_fnc_param;
_copy = [];
_result = [];

{

_copy push_back _x;

} forEach _array;

for "_i" from 0 to (count _copy-1) do
{

_index = floor(random (count _copy));
_result push_back (_copy select _index);
_copy set [_index, objNull];
_copy = _copy - [objNull];

};

_result;

Event Timeline

JSharpe edited Steps To Reproduce. (Show Details)Jul 10 2014, 7:12 AM
JSharpe edited Additional Information. (Show Details)
JSharpe set Category to Scripting.
JSharpe set Reproducibility to Always.
JSharpe set Severity to None.
JSharpe set Resolution to Fixed.
JSharpe set Legacy ID to 2051683511.May 7 2016, 6:56 PM

Hi and thank you for the feedback, really appreciated!

"Why in the bottom is there a need to have a “DELETE_ME_PLEASE” string?"

  • This is because the element must be as unique as possible, your example would not work, because it would remove all NULL objects from the array. For example:

_array = [5, objnull, "text"];
_shuffled = _array call BIS_fnc_arrayShuffle;
_shuffled; // Will be ["text", 5]

"New Shuffle Inline with the Debug Console"

  • Code executed within the console is run in a non scheduled environment (code is run on one single frame, no matter what), while within a script called with execVM, execFSM, spawn, will be caught and limited by the script scheduler. So when you want to test performance of a script/function, it MUST be done in non scheduled env.

So I did a few tests and the results are (for a 10 element array, from console):

Old BIS_fnc_arrayShuffle - 0.212201 ms
New BIS_fnc_arrayShuffle - 0.161798 ms
Your Array Shuffle suggestion - 0.168295 ms

The only difference between Old and New BIS_fnc_arrayShuffle is that instead of Set, it now uses push_back, as you suggested.

Attaching files in case you would like to perform some benchmarks yourself and give back some more feedback!

BIS_fnc_arrayShuffle > Old function
BIS_fnc_arrayShuffle2 > New version using push_back
BIS_fnc_arrayShuffle3 > Your version

Thanks a lot,
Nelson Duarte

What about my variant :)? It is at least 4 times faster than the original and even has shuffle strength param http://killzonekid.com/arma-scripting-tutorials-arrays-part-4-arrayshuffle/

JSharpe added a subscriber: JSharpe.May 7 2016, 6:56 PM

Hi Nelson,

Thanks for the response and taking the time to explain how the environment works.

I was running some tests on the scripts you uploaded and found similar results with data sets ranging from 10 to 1000 from the console as you suggested. The BIS_fnc_arrayShuffle2 is faster than the old one currently implemented.

I noticed that the _index local variable is now defined outside of the for loop as well, were you able to see any differences in performance where it is either declared outside or inside? Particularly on larger data sets.

Also is there any official technical documents as to how the Arma scripting environment works other than the wiki? I'd love to get a greater understanding in how the environment works on the lower level.

Thanks for your time,
Jamie Sharpe

Hi Killzone_Kid,

I've read your blog post on your implementation on shuffling an array. Although it is not guaranteed that all elements will be shuffled. The second version you made where you can strengthen the randomness of the array will only raise the chances. Sadly it is not a trusted way. A better understand can be found here: http://en.wikipedia.org/wiki/Uniform_distribution_(continuous)

Your method relies on a random number to select the index to be swapped with the previous, iteration. This random number, as it's random, could potentially not select an elements index; in a worst case scenario the random number index could always be 0 for each loop iteration (although unlikely it's possible).

The original BIS_fnc_arrayShuffle also creates a copy of the passed in array to maintain the integrity of the original, and return a shuffled copy of it. As commented here at the top of the file:
We create a copy of the container
This will allow us to keep the old array reference instead of making it unusable for user

Although you are right in that working directly with a referenced array within the function would be quicker than making a copy of it, it does not make it backwards compatible with older scripts that may rely on the the original passed in array to not be modified. So it would be unfair to perform tests against it.

I stepped through your code in pen and paper to give a better understanding on how the algorithm is flawed. The random numbers and index was directly generated through arma itself using the same source code you provided in your blog post here:
KK_fnc_arrayShuffle = {

private ["_cnt","_el1","_rnd","_indx","_el2"];
_cnt = count _this - 1;
_el1 = _this select _cnt;
_this resize _cnt;
_rnd = random diag_tickTime * _cnt;
for "_i" from 0 to _cnt do {
    _indx = floor random _rnd % _cnt;
    _el2 = _this select _indx;
    _this set [_indx, _el1];
    _el1 = _el2;
};
_this set [_cnt, _el1];
_this

};

It's a good attempt at creating an array shuffle function non the less. I've uploaded a picture of my step through of your code, annotating various steps. Hope this helps you, and if you have any questions feel free to ask.

Jamie Sharpe

was too slow with writing - posting it anyway:

@KK:
your version of the shuffle can't be compared to the other ones:

  1. you're not using BIS_fnc_param to check the input
  2. you're not making a copy
  3. your version doesn't create an uniform distribution - not even slightly, some permutations aren't possible at all

10000 runs of [0,1,2] call KK_fnc_arrayShuffle (+ array copy):
permutaton - #
[0,1,2] - 0
[0,2,1] - 3714
[1,0,2] - 2577
[1,2,0] - 0
[2,0,1] - 0
[2,1,0] - 3709

@neokika & JSharpe:
any specific reason for using that forEach-loop for copying instead of "+" (https://community.bistudio.com/wiki/plus_a)?

@JSharpe:
In your blog post you wrote that you're using the modern version of Fisher-Yates shuffle algorithm - you aren't exchanging the 2 elements.

2 implementations:

  1. Modern version of Fisher-Yates shuffle algorithm

MR85_fnc_modernShuffle =
{

private["_array", "_result", "_index", "_tmp"];
_array = [[_this], 0, [], [[]]] call BIS_fnc_param;
_result = +_array;
for "_i" from (count _result-1) to 1 step -1 do
{
    _index = floor random (_i + 1);
    _tmp = _result select _i;
    _result set [_i, _result select _index];
    _result set [_index, _tmp];
};
_result

}

  1. The "inside-out" algorithm as described on the wikipedia (without that i!=j check which hurts more than it helps performancewise)

MR85_fnc_insideOutShuffle =
{

private["_array", "_result", "_index"];
_array = [[_this], 0, [], [[]]] call BIS_fnc_param;
_result = [];
_result resize (count _array);
for "_i" from 0 to (count _array -1) do
{
    _index = floor random (_i + 1);
    _result set [_i, _result select _index];
    _result set [_index, _array select _i];
};
_result

}

Hi Master85,

I was never aware of the + operator copying an array. It does improve readability of the code and there is no change in performance when using it. Thanks!

Also you are right, I messed up in the implementation of the algorithm in my blog post and actually implemented the original Fisher-Yates one.

I ran some more performance tests with the following functions and here are the results for a 100 element array:

BIS_fnc_arrayShuffle - 1.11328
Shuffle_fnc_BISNew (new version using pushback) - 0.730469
Shuffle_fnc_JSFY (version in this ticket) - 0.775293
Shuffle_fnc_M85FYM (Master85's Fisher-Yates modern implementation) - 0.525195
Shuffle_fnc_M85FIO (Master85's Inside Out implementation) - 0.481738

I've uploaded a mission file with all the functions. There's also a tests.txt file with the test cases I ran in the console individually to get the results.

Edit - Just noticed there's a slight spelling mistake in the tests.txt where Shuffle_fnc_M85IO should be Shuffle_fnc_M85FIO. The results are still valid as during testing it was changed in the console edit window, just forgot to fix it in the file.

@Master85 Thanks for pointing out the flaw *facepalm* Was very simple fix. I've optimised Fisher-Yates a bit so it is now even shorter, probably the fastest now:

KK_fnc_arrayShuffleFY = {
_this = [[+_this], 0, [], [[]]] call BIS_fnc_param;
private ["_el","_rnd"];

    for "_i" from count _this - 1 to 0 step -1 do {
		_el = _this select _i;
		_rnd = random _i;
		_this set [_i, _this select _rnd];
		_this set [_rnd, _el];

};

_this

};

@KK

When using random to select from an array you should always explicitly round down.

select will round the number if it is a real number, where <x.5 will round down and >=x.5 will round up. This means the first and last numbers are half as likely to appear.

https://community.bistudio.com/wiki/select
https://community.bistudio.com/wiki/random

@JSharpe

Nah, makes no difference:

arr = [0,1]; arr select random 1;
3 attempts 10000 iterations each

1 / 4987
0 / 5013

1 / 5063
0 / 4937

1 / 4874
0 / 5126

arr = [0,1]; arr select round random 1;
3 attempts 10000 iterations each

1 / 4958
0 / 5042

1 / 5046
0 / 4954

1 / 5002
0 / 4998

EDIT: Also you are right about half the apearance, but it makes no differenc eif you round or not:
with round
2 / 2478
1 / 4931
0 / 2591
without
1 / 5065
0 / 2489
2 / 2446

Edit: floor on the other hand does make it more even, but not sure if this really needed for shuffle operation. It might make it too predictable like for example "if such and such combination is close to 30% the chance of it appearing is greatly reduced". Those functions from Master85 give pretty subtle deviation +- 30%

@KK

Are you saying that the wiki is wrong?

It states in the "select" page that:

"If index has decimal places it gets rounded down for fractions less than or equal .5, otherwise it gets rounded up."

It states in the "random" page that:

"Random real (floating point) value from 0 (inclusive) to x (not inclusive)."

It also states in the notes section of the same page:

"x=round(random 5) will return 0,1,2,3,4 or 5. (non-uniform distribution, 0 and 5 are half as likely to be selected than any of the other numbers) x=floor(random 5) will return 0,1,2,3 or 4. (uniform distribution, all numbers have the same probability of being selected) x=ceil(random 5) will return 0,1,2,3,4 or 5. (0 is very unlikely, but possible, as ceil 0 is 0)"

You should round down. Making a test case with just 2 values isn't a fair test, as both are likely to to appear. Place in a 3rd element and then there is more probability that the middle element will appear.

@JSharpe I've run the actual tests "select round random" is no different to "select random". select floor random is different (see the edit)

@KK

Using "select round random" is pointless anyway as the select will round the value automatically. I mentioned this in the post prior to yours, and that you should round down. Even when dealing with shuffling of an array it should be a fair equal shuffle.

You should round down.

Here's more evidence:
http://bost.ocks.org/mike/shuffle/

Improved version, as suggested in first post is already on development branch, setting as resolved, please assign back to me if something else comes up.

Thank you,
Nelson Duarte

Closing as resolved, feel free to re-open if something else comes up.
Thank you.