Disk Space Usage Trend Analysis – Part 2

Following from Part 1 in this series, this section focuses on using the previously collected disk usage data and applies a ‘line of best fit’ linear trend-line (which will give us it’s “average” growth in GB/day).

There is a decent amount of maths going on here, to manually determine a linear trend in the format of y = mx + c. You don’t really need to understand how it works, but I’ll leave a link at the bottom for anyone interested in pursuing the methods to this madness.

To use the script below, you just need to ensure the $disklogdirectory is pointing to the same folder that you previously saved the disk usage logs to (c:\scripts\disklog from Part 1).

By default, this script looks at the last 30 days worth of data to determine the disk usage trend. This is set by the $cutoffdate variable – and you can easily change the -30 to a larger negative number if you want to look for more long-term trends.

Also note, that there is a place-holder towards the very bottom of the script for the graph generation. This is where the script from Part 3 will be inserted; though you can use just the Part 1 and Part 2 scripts without implementing the scripts from the next part.

# Location of disk data logs (include trailing "\")
$disklogdirectory = "c:\scripts\disklog\";

# Add Active Directory powershell plug-in
import-module activedirectory;

# Get all servers from AD and ignore predefined list
$adservers = get-adcomputer -filter {operatingsystem -like "*server*"} | where {$_.enabled -eq $true} | sort name ;

# Loop through each server
foreach ($servername in $adservers)
{
    $hostname = $servername.name;

    # Check if server can be pinged
    if(test-connection -cn $hostname -quiet -count 1) {
		
        write-host "Check disk space on" $hostname;

        # Get disk drive letters for current server using WMI
        $drives = $null;
        $drives = gwmi win32_logicaldisk -computername $hostname | where {$_.drivetype -eq 3};

        if ($drives -ne $null) {

            # Loop through each drive found
            foreach ($drive in $drives) {

                # Get current disk capacity
                $capacityGB = [math]::Round($drive.size/1024/1024/1024,0);

                # Defines how many data points, or days into the past, to use to calculate a trend
                $cutoffDate = (Get-Date).AddDays(-30)

                # Get historical data for drive from log file
                $filename =  $disklogdirectory + $hostname + "_" + $drive.deviceid[0] + ".txt";
                $datasource  = Import-Csv $filename -header date, gb | where {$_.date -as [datetime] -gt $cutoffdate;
                
            }



            #############################################
            ############START_CALCULATE_TREND############
            #############################################

            # Use available data to define linear trend in the form y=mx+c
            # aka: y-datapoint = ($slope * x-data-point) + $c

            # Variables required to calculate trend
            $tempnsumxy = 0;
            $sample = 0;
            $sumx = 0;
            $samplecount = $datasource.count;
            $sumy = 0;
            $nsumxy = 0;
            $sumxy = 0;
            $ysquared = 0;
            $xsquaredsum = 0;
            $nsumsquaredx = 0;
            $sumxsquared = 0;
            $slope = 0.0;
            $slopesumx = 0;

            # Loop through data points retrieved from log file
            foreach ($datapoint in $datasource) {

                $sumx = $sumx + $sample;
                $sumy = $sumy + $datapoint.gb;
                $tempnsumxy = $tempnsumxy + ($sample * $datapoint.gb);

                $xsquared = $sample * $sample;
                $xsquaredsum = $xsquaredsum + $xsquared

                $sample = $sample + 1;
            }

            $nsumxy = $samplecount * $tempnsumxy;
            $sumxy = $sumx * $sumy;
            $nsumsquaredx = $samplecount * $xsquaredsum;
            $sumxsquared = $sumx * $sumx;

            # Final value for slope defined
            $slope = ($nsumxy - $sumxy) / ($nsumsquaredx - $sumxsquared);
				
            # Final value for c defined
            $slopesumx = $slope * $sumx;
            $c = ($sumy - $slopesumx)/$samplecount;

            # Output results to the screen
            write-host "Drive:" $drive.deviceid[0] "  Slope:" ([math]::Round($slope,2)) "  C:"  ([math]::Round($c,2)); 

            ###########################################
            ############END_CALCULATE_TREND############
            ###########################################


            #########################################
            ############INSERT_GRAPH HERE############
            #########################################

            }
        }
    }
}

100

101

102

103

104

105

106

107

# Location of disk data logs (include trailing "\")

$disklogdirectory = "c:\scripts\disklog\";

# Add Active Directory powershell plug-in

import-module activedirectory;

# Get all servers from AD and ignore predefined list

$adservers = get-adcomputer -filter {operatingsystem -like "*server*"} | where {$_.enabled -eq $true} | sort name ;

# Loop through each server

foreach ($servername in $adservers)

{

$hostname = $servername.name;

# Check if server can be pinged

if(test-connection -cn $hostname -quiet -count 1) {

write-host "Check disk space on" $hostname;

# Get disk drive letters for current server using WMI

$drives = $null;

$drives = gwmi win32_logicaldisk -computername $hostname | where {$_.drivetype -eq 3};

if ($drives -ne $null) {

# Loop through each drive found

foreach ($drive in $drives) {

# Get current disk capacity

$capacityGB = [math]::Round($drive.size/1024/1024/1024,0);

# Defines how many data points, or days into the past, to use to calculate a trend

$cutoffDate = (Get-Date).AddDays(-30)

# Get historical data for drive from log file

$filename = $disklogdirectory + $hostname + "_" + $drive.deviceid[0] + ".txt";

$datasource = Import-Csv $filename -header date, gb | where {$_.date -as [datetime] -gt $cutoffdate;

}

#############################################

############START_CALCULATE_TREND############

#############################################

# Use available data to define linear trend in the form y=mx+c

# aka: y-datapoint = ($slope * x-data-point) + $c

# Variables required to calculate trend

$tempnsumxy = 0;

$sample = 0;

$sumx = 0;

$samplecount = $datasource.count;

$sumy = 0;

$nsumxy = 0;

$sumxy = 0;

$ysquared = 0;

$xsquaredsum = 0;

$nsumsquaredx = 0;

$sumxsquared = 0;

$slope = 0.0;

$slopesumx = 0;

# Loop through data points retrieved from log file

foreach ($datapoint in $datasource) {

$sumx = $sumx + $sample;

$sumy = $sumy + $datapoint.gb;

$tempnsumxy = $tempnsumxy + ($sample * $datapoint.gb);

$xsquared = $sample * $sample;

$xsquaredsum = $xsquaredsum + $xsquared

$sample = $sample + 1;

}

$nsumxy = $samplecount * $tempnsumxy;

$sumxy = $sumx * $sumy;

$nsumsquaredx = $samplecount * $xsquaredsum;

$sumxsquared = $sumx * $sumx;

# Final value for slope defined

$slope = ($nsumxy - $sumxy) / ($nsumsquaredx - $sumxsquared);

# Final value for c defined

$slopesumx = $slope * $sumx;

$c = ($sumy - $slopesumx)/$samplecount;

# Output results to the screen

write-host "Drive:" $drive.deviceid[0] " Slope:" ([math]::Round($slope,2)) " C:" ([math]::Round($c,2));

###########################################

############END_CALCULATE_TREND############

###########################################

#########################################

############INSERT_GRAPH HERE############

#########################################

}

When you run the above, you should see results similar what’s below. A message indicating which server is being processed, followed by the details of each disk (drive letter, slope, and C value).

The slope in this case is actually the average change per day (in GB). So in this example, the D: drive on SQL01 is growing by an average of .31 GB per day – or around 9.4GB every month.

The C value is used in Part 3 when we plot the values and trend-line in a graph.

If you were just using Part 1 and Part 2 of this series, you could easily wrap the output above inside an If statement to only show you the disks where growth rate was greater than a certain value.

References
http://en.wikipedia.org/wiki/Linear_regression

7 Comments → Disk Space Usage Trend Analysis – Part 2

Marc November 18, 2015 at 9:07 pm

Hi Kamal,
Great script, many thnx for that!

I only get a failure in the part 2 script.

Attempted to divide by zero.
At D:\scripts\DPP\WFF\StorageTrendAnalyzing\StorageTrendAnalyzing_2.ps1:86 char:13
+ $slope = ($nsumxy – $sumxy) / ($nsumsquaredx – $sumxsquared);
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], RuntimeException
+ FullyQualifiedErrorId : RuntimeException

I only lowerd the $cutoffDate to -6 days, further everything is the same as youre script. The txt file of the server collected more then 6 day’s of data.

Can you please help me. I realy would like to use this script for analyzing.

regards

Reply ↓
Kamal November 19, 2015 at 10:43 am

Hi Marc,

hard to say without seeing your files with the hard disk usage data.
All of the variables are initialized to zero – so, if you get a “divide by zero” it means the variables are still the default values.

So, without seeing your exact setup, it might be that your data files are not in the correct directory ($filename variable) – OR – your data files don’t contain enough data for you $cutoffdays selected – OR – your date format settings are not consistent and it’s comparing DD/MM to MM/DD format (or vice-versa).

Reply ↓
Karl March 14, 2018 at 9:17 pm

Thanks for this! I used your scripts as a template for monitoring disk usage on a few servers.

I implemented a few modifications because I wanted a much more granular monitoring for our DB server. Therefore I’m also tracking the time in my CSV files and I’ve scheduled a check every 30 minutes.

When calculating the trend I’m using the actual timestamp.
Your script ignores the date saved in the CSV file when calculating the trend. It just relies on consistent logging of one line a day because $sample is used as the x-value . This is a bit misleading I think and could introduce a slight error in the calculated trend if logging of one line each day was skipped or halted for a time.

Because I’m generating much more data this way I’ve also implemented a third script that regularly cleans the CSV files. I could write down some of my modifications if anyone is interested.

Thanks again and best regards!

Reply ↓
1. Kamal March 15, 2018 at 10:18 am
  
  That’s great Karl!
  Looking over it again (it’s been a while), I can definitely see some room for improvement and extension – so it’s great you’ve been able to use and build on this.
  
  Reply ↓
Nitz April 19, 2019 at 12:54 am

Hi Kamal

Where do you specify server name on this script?

Thanks

Reply ↓
1. Kamal April 20, 2019 at 9:26 am
  
  On the third line (where it specifices $adservers) – that’s where it grabs a list of servers from Active Directory. You could change that to be only a single server if you like, though the script is expecting an array of objects, so don’t just use a string for the user name.
  EG
  This won’t work: $adservers = "SQLServer01";
  This will work: $adservers = @("SQLServer01");
  
  Reply ↓
Nitz April 24, 2019 at 1:36 am

Hi Kamal

Getting below error msg:

}
At line:11 char:56
+ foreach ($servername in $adservers = @(“Servername”);
+ ~
Missing closing ‘)’ after expression part of foreach loop.
+ CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
+ FullyQualifiedErrorId : MissingEndParenthesisAfterForeach

Reply ↓

HKEY_LOCAL_MACHINE

Every problem is an opportunity.

Disk Space Usage Trend Analysis – Part 2

7 Comments → Disk Space Usage Trend Analysis – Part 2

Leave a Reply Cancel reply