Automating Integrity Checks with Md5deep Scripts
Ensuring file integrity across large datasets or distributed systems is essential for security, backups, and compliance. Md5deep is a suite of command-line programs that compute and compare MD5, SHA-1, SHA-256, and other hashes recursively across directories. This article shows how to automate integrity checks with md5deep (and the related hashdeep tools), including setup, practical scripts, and best practices.
Why automate integrity checks
- Detect corruption: Identify accidental file corruption during transfer or storage.
- Detect tampering: Spot unauthorized changes to files.
- Auditability: Maintain verifiable records for compliance or forensic needs.
- Scale: Run regular checks across many files without manual effort.
Installing md5deep / hashdeep
On Debian/Ubuntu:
sudo apt-get updatesudo apt-get install md5deep
On macOS with Homebrew:
brew install md5deep
Note: hashdeep (included in many distributions) provides additional functionality like multi-hash support and filename tolerance; use it if available.
Basic usage
Generate hashes for a directory recursively and save to a file:
md5deep -r /path/to/data > hashes.md5
Or with hashdeep to include multiple algorithms:
hashdeep -r -l -c md5,sha1,sha256 /path/to/data > hashes.txt
Key flags:
- -r : recursive
- -l : follow symlinks (hashdeep)
- -c : specify algorithms (hashdeep)
Script: one-off verification
Create a simple script to verify current files against a baseline:
#!/usr/bin/env bashBASELINE=“/var/checksums/hashes.md5”TARGET=“/data”if md5deep -r “\(TARGET" | sort > /tmp/current.md5; then if diff -u "\)BASELINE” /tmp/current.md5 > /tmp/diff.out; then echo “No changes detected.” exit 0 else echo “Changes detected:” cat /tmp/diff.out exit 2 fielse echo “Hashing failed.” exit 1fi
This generates current hashes, compares with the baseline, and prints differences.
Script: scheduled checks with email alerts
Use a cron job plus a script that emails diffs when changes appear:
#!/usr/bin/env bashBASELINE=“/var/checksums/hashes.md5”TARGET=“/data”REPORT=“/var/log/integritycheck\((date +%F).log" md5deep -r "\)TARGET” | sort > /tmp/current.md5if diff -u “\(BASELINE" /tmp/current.md5 > "\)REPORT”; then echo “Integrity OK: \((date)" >> /var/log/integrity_checks_summary.logelse mail -s "Integrity check changes on \)(hostname)” [email protected] < “$REPORT” # Optionally rotate baseline or record incidentfi
Cron entry to run daily at 02:00:
0 2/usr/local/bin/integrity_check.sh
Script: incremental monitoring with hashdeep -a
hashdeep supports audit mode which is more efficient for repeated checks:
- Create a manifest:
hashdeep -r -l -c sha256 /data > manifest.txt
- Audit against the manifest:
hashdeep -a -k manifest.txt /data
This produces categorized output (found, missing, mismatched) useful for parsing in automation.
Handling renamed or moved files
- Use hashdeep’s -k/–compare with the -v verbose flag to detect moved files by matching hashes even if paths change.
- Maintain metadata (file size, timestamps) in your manifest if needed to reduce false positives.
Best practices
- Use stronger hashes (SHA-256) for security-sensitive use cases; MD5 is fast but vulnerable to collisions.
- Store baselines on write-once or access-restricted storage.
- Sign manifests (GPG) to detect tampering of the baseline itself.
- Log results centrally (syslog, SIEM) for long-term auditing.
- Combine with file system snapshots or version control for easy recovery.
- Test restores regularly to ensure integrity checks reflect actual recoverability.
Parsing and integrating results
- For programmatic actions, parse hashdeep output (exit codes and categorized lines) and integrate with monitoring systems (Prometheus, ELK).
- Example: return code mapping — 0 no changes, 1 errors, 2 differences (varies by wrapper), so check both output and exit code.
Example: systemd timer + service
Use systemd for more reliable scheduling than cron:
- /etc/systemd/system/integrity-check.service
[Unit]Description=Integrity check service [Service]Type=oneshotExecStart=/usr/local/bin/integrity_check.sh
- /etc/systemd/system/integrity-check.timer
[Unit]Description=Run integrity check daily [Timer]OnCalendar=dailyPersistent=true [Install]WantedBy=timers.target
Enable and start:
sudo systemctl enable –now integrity-check.timer
Troubleshooting tips
- If hashing is slow, limit scope or use faster algorithms; run on dedicated hardware or during low load.
- Exclude temporary or volatile paths (e.g., /tmp, caches) using –exclude patterns.
- Ensure consistent environment (mount options, character encodings) when comparing across systems.
Conclusion
Automated integrity checks with md5deep/hashdeep are straightforward to implement and provide a practical defense against corruption and tampering. Use manifests, scheduled scripts or systemd timers, stronger hashes where needed, and keep baselines protected and signed for best results.
Leave a Reply