Monitoring File Integrity

Share this article

Ask yourself how you might address the following circumstances when managing a website:

  • A file is unintentionally added, modified or deleted
  • A file is maliciously added, modified or deleted
  • A file becomes corrupted
More importantly, would you even know if one of these circumstances occurred? If your answer is no, then keep reading. In this guide I will demonstrate how to create a profile of your file structure which can be used to monitor the integrity of your files. The best way to determine whether or not a file has been altered is to hash its contents. PHP has several hashing functions available, but for this project I’ve decided to use the hash_file() function. It provides a wide range of different hashing algorithms which will make my code easy to modify at a later time should I decide to make a change. Hashing is used in a wide variety of applications, everything from password protection to DNA sequencing. A hashing algorithm works by transforming a data into a fixed-sized, repeatable cryptographic string. They are designed so that even a slight modification to the data should produce a very different result. When two or more different pieces of data produce the same result string, it’s referred to as a “collision.” The strength of each hashing algorithm can be measured by both its speed and the probability of collisions. In my examples I will be using the SHA-1 algorithm because it’s fast, the probability for collisions is low and it has been widely used and well tested. Of course, you’re welcome to research other algorithms and use any one you like. Once the file’s hash has been obtained, it can be stored for later comparison. If hashing the file later doesn’t return the same hash string as before then we know the file has somehow been changed.

Database

To begin, we first need to layout a basic table to store the hashes of our files. I will be using the following schema:
CREATE TABLE integrity_hashes (
    file_path VARCHAR(200) NOT NULL,
    file_hash CHAR(40) NOT NULL,
    PRIMARY KEY (file_path)
);
file_path stores the location of a file on the server and, since the value will always be unique because two files cannot occupy the same location in the file system, is our primary. I have specified its maximum length as 200 characters which should allow for some lengthy file paths. file_hash stores the hash value of a file, which will be a SHA-1 40-character hexadecimal string.

Collecting Files

The next step is to build a profile of the file structure. We define the path of where we want to start collecting files and recursively iterate through each directory until we’ve covered the entire branch of the file system, and optionally exclude certain directories or file extensions. We collect the hashes we need as we’re traversing the file tree which are then stored in the database or used for comparison. PHP offers several ways to navigate the file tree; for simplicity, I’ll be using the RecursiveDirectoryIterator class.
<?php
define("PATH", "/var/www/");
$files = array();

// extensions to fetch, an empty array will return all extensions
$ext = array("php");

// directories to ignore, an empty array will check all directories
$skip = array("logs", "logs/traffic");

// build profile
$dir = new RecursiveDirectoryIterator(PATH);
$iter = new RecursiveIteratorIterator($dir);
while ($iter->valid()) {
    // skip unwanted directories
    if (!$iter->isDot() && !in_array($iter->getSubPath(), $skip)) {
        // get specific file extensions
        if (!empty($ext)) {
            // PHP 5.3.4: if (in_array($iter->getExtension(), $ext)) {
            if (in_array(pathinfo($iter->key(), PATHINFO_EXTENSION), $ext)) {
                $files[$iter->key()] = hash_file("sha1", $iter->key());
            }
        }
        else {
            // ignore file extensions
            $files[$iter->key()] = hash_file("sha1", $iter->key());
        }
    }
    $iter->next();
}
Notice how I referenced the same folder logs twice in the $skip array. Just because I choose to ignore a specific directory doesn’t mean that the iterator will also ignore all of the sub-directories, which can be useful or annoying depending on your needs. The RecursiveDirectoryIterator class gives us access to several methods:
  • valid() checks whether or not we’re working with a valid file
  • isDot() determines if the directory is “.” or “..
  • getSubPath() returns the folder name in which the file pointer is currently located
  • key() returns the full path and file name
  • next() starts the loop over again
There are also several more methods available to work with, but mostly the ones listed above are really all we need for the task at hand, although the getExtension()
method has been added in PHP 5.3.4 which returns the file extension. If your version of PHP supports it, you can use it to filter out unwanted entries rather than what I did using pathinfo(). When executed, the code should populate the $files array with results similar to the following:
Array
(
    [/var/www/test.php] => b6b7c28e513dac784925665b54088045cf9cbcd3
    [/var/www/sub/hello.php] => a5d5b61aa8a61b7d9d765e1daf971a9a578f1cfa
    [/var/www/sub/world.php] => da39a3ee5e6b4b0d3255bfef95601890afd80709
)
Once we have the profile built, updating the database is easy peasy lemon squeezy.
<?php
$db = new PDO("mysql:host=" . DB_HOST . ";dbname=" . DB_NAME,
    DB_USER, DB_PASSWORD);

// clear old records
$db->query("TRUNCATE integrity_hashes");

// insert updated records
$sql = "INSERT INTO integrity_hashes (file_path, file_hash) VALUES (:path, :hash)";
$sth = $db->prepare($sql);
$sth->bindParam(":path", $path);
$sth->bindParam(":hash", $hash);
foreach ($files as $path => $hash) {
    $sth->execute();
}

Checking For Discrepancies

You now know how to build a fresh profile of the directory structure and how to update records in the database. The next step is to put it together into some sort of real world application like a cron job with e-mail notification, administrative interface or whatever else you prefer. If you just want to gather a list of files that have changed and you don’t care how they changed, then the simplest approach is to pull the data from the database into an array similar to $files and then use PHP’s array_diff_assoc() function to weed out the riffraff.
<?php
// non-specific check for discrepancies
if (!empty($files)) {
    $result = $db->query("SELECT * FROM integrity_hashes")->fetchAll();
    if (!empty($result)) {
        foreach ($result as $value) {
            $tmp[$value["file_path"]] = $value["file_hash"];
        }
        $diffs = array_diff_assoc($files, $tmp);
        unset($tmp);
    }
}
In this example, $diffs will be populated with any discrepancies found, or it will be an empty array if the file structure is intact. Unlike array_diff(), array_diff_assoc() will use keys in the comparison which is important to us in case of a collision, such as two empty files having the same hash value. If you want to take things a step further, you can throw in some simple logic to determine exactly how a file has been affected, whether it has been deleted, altered or added.
<?php
// specific check for discrepancies
if (!empty($files)) {
    $result = $db->query("SELECT * FROM integrity_hashes")->fetchAll();
    if (!empty($result)) {
        $diffs = array();
        $tmp = array();
        foreach ($result as $value) {
            if (!array_key_exists($value["file_path"], $files)) {
                $diffs["del"][$value["file_path"]] = $value["file_hash"];
                $tmp[$value["file_path"]] = $value["file_hash"];
            }
            else {
                if ($files[$value["file_path"]] != $value["file_hash"]) {
                    $diffs["alt"][$value["file_path"]] = $files[$value["file_path"]];
                    $tmp[$value["file_path"]] = $files[$value["file_path"]];
                }
                else {
                    // unchanged
                    $tmp[$value["file_path"]] = $value["file_hash"];
                }
            }
        }
        if (count($tmp) < count($files)) {
            $diffs["add"] = array_diff_assoc($files, $tmp);
        }
        unset($tmp);
    }
}
As we loop through the results from the database, we make several checks. First, array_key_exists() is used to check if the file path from our database is present in $files
, and if not then the file must have been deleted. Second, if the file exists but the hash values do not match, the file must have been altered or is otherwise unchanged. We store each check into a temporary array named $tmp, and finally, if there are a greater number of $files than in our database then we know that those leftover un-checked files have been added. When completed, $diffs will either be an empty array or it will contain any discrepancies found in the form of a multi-dimensional array which might appear as follows:
Array
(
    [alt] => Array
        (
            [/var/www/test.php] => eae71874e2277a5bc77176db14ac14bf28465ec3
            [/var/www/sub/hello.php] => a5d5b61aa8a61b7d9d765e1daf971a9a578f1cfa
        )

    [add] => Array
        (
            [/var/www/sub/world.php] => da39a3ee5e6b4b0d3255bfef95601890afd80709
        )

)
To display the results in a more user-friendly format, for an administrative interface or the like, you could for example loop through the results and output them in a bulleted list.
<?php
// display discrepancies
if (!empty($diffs)) {
    echo "<p>The following discrepancies were found:</p>";
    echo "<ul>";
    foreach ($diffs as $status => $affected) {
        if (is_array($affected) && !empty($affected)) {
            echo "<li>" . $status . "</li>";
            echo "<ol>";
            foreach($affected as $path => $hash) {
                echo "<li>" . $path . "</li>";
            }
            echo "</ol>";
        }
    }
    echo "</ul>";
}
else {
    echo "<p>File structure is intact.</p>";
}
At this point you can either provide a link which triggers an action to update the database with the new file structure, in which case you might opt to store $files in a session variable, or if you don’t approve of the discrepancies you can address them however you see fit.

Summary

Hopefully this guide has given you a better understanding of monitoring file integrity. Having something like this in place on your website is an invaluable security measure and you can be comfortable knowing that your files remain exactly as you intended. Of course, don’t forget to keep regular backups. You know… just in case. Image via Semisatch / Shutterstock

Frequently Asked Questions (FAQs) on File Integrity Monitoring

What is the importance of file integrity monitoring in PHP applications?

File integrity monitoring is a critical aspect of maintaining the security and performance of PHP applications. It involves tracking and recording changes in files to ensure they remain in their original, unaltered state. This is important because unauthorized changes to files can lead to security breaches, data loss, and application downtime. By monitoring file integrity, developers can quickly identify and respond to any unauthorized changes, thereby minimizing potential damage.

How does file integrity monitoring work?

File integrity monitoring works by creating a baseline, or a snapshot, of a file’s original state. This includes information such as the file’s size, permissions, and hash value. The monitoring tool then continuously compares the current state of the file with this baseline. If any changes are detected, the tool alerts the administrator or takes predefined actions.

What are some of the best tools for PHP application monitoring?

There are several effective tools for PHP application monitoring. These include SolarWinds, Atatus, and PHP Server Monitor. These tools offer features such as real-time monitoring, alerting, and detailed reporting, helping developers maintain the performance and security of their PHP applications.

How can I implement file integrity monitoring in my PHP application?

Implementing file integrity monitoring in a PHP application typically involves installing and configuring a monitoring tool. This tool should be set to monitor the files and directories that are critical to the application’s operation. The tool should also be configured to alert the appropriate personnel if any changes are detected.

What are the challenges of file integrity monitoring?

One of the main challenges of file integrity monitoring is managing false positives. This occurs when the monitoring tool incorrectly identifies a change as unauthorized. Another challenge is the potential for performance impact. Monitoring tools need to continuously scan files, which can consume system resources. Therefore, it’s important to choose a tool that offers efficient scanning capabilities.

How can I reduce false positives in file integrity monitoring?

Reducing false positives in file integrity monitoring can be achieved by properly configuring the monitoring tool. This includes setting the tool to ignore changes that are expected or authorized. Additionally, using a tool that offers intelligent alerting can help reduce false positives.

Can file integrity monitoring help with compliance?

Yes, file integrity monitoring can help with compliance. Many regulatory standards, such as PCI DSS and HIPAA, require organizations to implement file integrity monitoring. By demonstrating that you have a robust file integrity monitoring process in place, you can help ensure your organization meets these compliance requirements.

What is the role of file integrity monitoring in DevOps?

In a DevOps environment, file integrity monitoring plays a crucial role in maintaining the security and stability of the application. It allows for early detection of unauthorized changes, which can help prevent security incidents and reduce downtime. Additionally, it provides valuable insights that can inform the continuous improvement process.

How does file integrity monitoring contribute to cybersecurity?

File integrity monitoring is a key component of a comprehensive cybersecurity strategy. By detecting unauthorized changes to files, it can help prevent data breaches and other security incidents. Additionally, it provides a record of file changes, which can be useful for forensic investigations following a security incident.

Can file integrity monitoring be automated?

Yes, file integrity monitoring can be automated. Most monitoring tools offer automation features, such as scheduled scans and automatic alerts. This not only saves time but also ensures that file integrity is continuously monitored, even when administrators are not actively monitoring the system.

Martin PsinasMartin Psinas
View Author

Martin E. Psinas is a self-taught web developer, published author, and is currently studying Japanese. For more information, visit his website.

Intermediate
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week