Automating Integrity Checks with Md5deep Scripts

Automating Integrity Checks with Md5deep Scripts

Ensuring file integrity across large datasets or distributed systems is essential for security, backups, and compliance. Md5deep is a suite of command-line programs that compute and compare MD5, SHA-1, SHA-256, and other hashes recursively across directories. This article shows how to automate integrity checks with md5deep (and the related hashdeep tools), including setup, practical scripts, and best practices.

Why automate integrity checks

  • Detect corruption: Identify accidental file corruption during transfer or storage.
  • Detect tampering: Spot unauthorized changes to files.
  • Auditability: Maintain verifiable records for compliance or forensic needs.
  • Scale: Run regular checks across many files without manual effort.

Installing md5deep / hashdeep

On Debian/Ubuntu:

bash
sudo apt-get updatesudo apt-get install md5deep

On macOS with Homebrew:

bash
brew install md5deep

Note: hashdeep (included in many distributions) provides additional functionality like multi-hash support and filename tolerance; use it if available.

Basic usage

Generate hashes for a directory recursively and save to a file:

bash
md5deep -r /path/to/data > hashes.md5

Or with hashdeep to include multiple algorithms:

bash
hashdeep -r -l -c md5,sha1,sha256 /path/to/data > hashes.txt

Key flags:

  • -r : recursive
  • -l : follow symlinks (hashdeep)
  • -c : specify algorithms (hashdeep)

Script: one-off verification

Create a simple script to verify current files against a baseline:

bash
#!/usr/bin/env bashBASELINE=“/var/checksums/hashes.md5”TARGET=“/data”if md5deep -r “\(TARGET" | sort > /tmp/current.md5; then if diff -u "\)BASELINE” /tmp/current.md5 > /tmp/diff.out; then echo “No changes detected.” exit 0 else echo “Changes detected:” cat /tmp/diff.out exit 2 fielse echo “Hashing failed.” exit 1fi

This generates current hashes, compares with the baseline, and prints differences.

Script: scheduled checks with email alerts

Use a cron job plus a script that emails diffs when changes appear:

bash
#!/usr/bin/env bashBASELINE=“/var/checksums/hashes.md5”TARGET=“/data”REPORT=“/var/log/integritycheck\((date +%F).log" md5deep -r "\)TARGET” | sort > /tmp/current.md5if diff -u “\(BASELINE" /tmp/current.md5 > "\)REPORT”; then echo “Integrity OK: \((date)" >> /var/log/integrity_checks_summary.logelse mail -s "Integrity check changes on \)(hostname)” [email protected] < “$REPORT” # Optionally rotate baseline or record incidentfi

Cron entry to run daily at 02:00:

cron
0 2/usr/local/bin/integrity_check.sh

Script: incremental monitoring with hashdeep -a

hashdeep supports audit mode which is more efficient for repeated checks:

  1. Create a manifest:
bash
hashdeep -r -l -c sha256 /data > manifest.txt
  1. Audit against the manifest:
bash
hashdeep -a -k manifest.txt /data

This produces categorized output (found, missing, mismatched) useful for parsing in automation.

Handling renamed or moved files

  • Use hashdeep’s -k/–compare with the -v verbose flag to detect moved files by matching hashes even if paths change.
  • Maintain metadata (file size, timestamps) in your manifest if needed to reduce false positives.

Best practices

  • Use stronger hashes (SHA-256) for security-sensitive use cases; MD5 is fast but vulnerable to collisions.
  • Store baselines on write-once or access-restricted storage.
  • Sign manifests (GPG) to detect tampering of the baseline itself.
  • Log results centrally (syslog, SIEM) for long-term auditing.
  • Combine with file system snapshots or version control for easy recovery.
  • Test restores regularly to ensure integrity checks reflect actual recoverability.

Parsing and integrating results

  • For programmatic actions, parse hashdeep output (exit codes and categorized lines) and integrate with monitoring systems (Prometheus, ELK).
  • Example: return code mapping — 0 no changes, 1 errors, 2 differences (varies by wrapper), so check both output and exit code.

Example: systemd timer + service

Use systemd for more reliable scheduling than cron:

  • /etc/systemd/system/integrity-check.service
ini
[Unit]Description=Integrity check service [Service]Type=oneshotExecStart=/usr/local/bin/integrity_check.sh
  • /etc/systemd/system/integrity-check.timer
ini
[Unit]Description=Run integrity check daily [Timer]OnCalendar=dailyPersistent=true [Install]WantedBy=timers.target

Enable and start:

bash
sudo systemctl enable –now integrity-check.timer

Troubleshooting tips

  • If hashing is slow, limit scope or use faster algorithms; run on dedicated hardware or during low load.
  • Exclude temporary or volatile paths (e.g., /tmp, caches) using –exclude patterns.
  • Ensure consistent environment (mount options, character encodings) when comparing across systems.

Conclusion

Automated integrity checks with md5deep/hashdeep are straightforward to implement and provide a practical defense against corruption and tampering. Use manifests, scheduled scripts or systemd timers, stronger hashes where needed, and keep baselines protected and signed for best results.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *