While working, and re-working through the problem of tracking changes to AD group memberships (see HERE) the final “production” script that I wrote came out to over 300 lines and ended up being a little different to what I put in that previous post. In practice some of the groups the script was tracking had more than 10,000 members – and the script was slow to run when comparing two lists, each with more than 10,000 entries.
So, I wrote this test, of 3 different methods to compare two lists of strings to track new and removed entries, and timed each one of them.
The sample data I used was two text files, each with near identical contents (random numbers between 10,000 and 10,000,000), with a couple of changes to each to mimic a small number of additions/removals.
Test 2 (below) was the first method I tried. I came up with a left-of-center solution in Test 1 because Test 2 was achingly slow and I thought that there had to be a better way.
Test 3, well, the results of Test 3 speaks for itself.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
# Import numbers from text files $numbers1 = get-content numbers1.txt; $numbers2 = get-content numbers2.txt; #---------- Test 1 ----------# # Record start of test 1 $starttime = get-date; # Group numbers (count occurences) as a sum of the original list ($numbers1), plus 2 x the new list ($numbers2). $groupednumbers = $numbers1 + $numbers2 + $numbers2 | group -noelement; # New numbers will only appear twice (from the 2 x $numbers2, above) $newnumbers = $groupednumbers | where {$_.count -eq 2} | select name; # Removed numbers will only appear once (from the single copy of $numbers1, but nothing from $numbers2) $removednumbers = $groupednumbers | where {$_.count -eq 1} | select name; # Anything that appears 3 times are non-changed numbers (they appear in the old and new lists) $samenumbers = $removednumbers = $groupednumbers | where {$_.count -eq 3} | select name; # Record end of test 1 $endtime = get-date # Calculate total test time $test1time = new-timespan -start $starttime -end $endtime; #---------- Test 2 ----------# # Record start of test 2 $starttime = get-date; # Use the -notcontains comparison operator $removednumbers = $numbers1 | where {$numbers2 -notcontains $_}; $newnumbers = $numbers2 | where {$numbers1 -notcontains $_}; # Record end of test 2 $endtime = get-date # Calculate total test time $test2time = new-timespan -start $starttime -end $endtime; #---------- Test 3 ----------# # Record start of test 3 $starttime = get-date; # User the compare-object cmdlet $comparison = compare-object -referenceobject $numbers1 -differenceobject $numbers2; $removednumbers = $comparison | where {$_.sideindicator -eq "<="}; $newnumbers = $comparison | where {$_.sideindicator -eq "=>"}; # Record end of test 3 $endtime = get-date # Calculate total test time $test3time = new-timespan -start $starttime -end $endtime; #---------- Tests Complete ----------# write-host -foregroundcolor "red" "Test 1:" $test1time.totalseconds "seconds`nTest 2:" $test2time.totalseconds "seconds`nTest 3:" $test3time.totalseconds "seconds"; |
The output of each of the tests (for $newnumbers and $removednumbers) are slightly different, but easily workable into any output you like.
I repeated these tests over and over, and below is a fairly typical result: