Final Assignment Technical Documentation - Snowboundport37/champlain GitHub Wiki
IOC scraping, Apache log matching, and HTML report generation
Author: Andrei Gorlitsky
Course: CSI 230 and SYS 320
Date: 12/5/2025
This document is a complete record of everything I did to complete the three challenge final assignment. The workflow pulls IOCs from a webpage, checks them against an Apache access log, and generates a browser viewable HTML report. Each challenge is implemented as a separate Bash script because the grading requires three independent files.
I used a Linux VM with Apache serving files from:
/var/www/html
My scripts were stored and executed from my home directory:
/home/champuser
Host IOC.html on my own website
The assignment requires scraping an IOC page that will grow over time. To ensure I could reliably curl the page from my own environment, I created or copied the IOC page into the Apache web root.
Commands used:
cd /var/www/html
sudo tee IOC.html >/dev/null << 'EOF'
<html>
<head>
<title>Indicators of Compromise</title>
</head>
<body>
<h1>Indicators of Compromise</h1>
<table border="1">
<tr><th>IOC</th></tr>
<tr><td>/wp-admin.php</td></tr>
<tr><td>/phpmyadmin/</td></tr>
<tr><td>192.168.5.50</td></tr>
<tr><td>evil.example.com</td></tr>
</table>
</body>
</html>
EOFI verified the page loaded in the browser:
Scrape IOC.html and save results to IOC.txt
Create a Bash script that retrieves the IOC webpage and outputs a clean IOC list into IOC.txt.
I used curl to fetch the HTML and extracted the text inside <td> tags using a regex. The output is cleaned and written line by line into IOC.txt.
getIOC.bash
#!/bin/bash
IOC_URL="http://10.0.17.19/IOC.html"
OUT_FILE="IOC.txt"
page=$(curl -s "$IOC_URL")
if [ -z "$page" ]; then
echo "Failed to download IOC page"
exit 1
fi
echo "$page" \
| grep -oP '(?<=<td>).*?(?=</td>)' \
| sed 's/^[[:space:]]*//;s/[[:space:]]*$//' \
| sed '/^[[:space:]]*$/d' \
> "$OUT_FILE"
echo "Saved IOCs to $OUT_FILE"cd ~
chmod +x getIOC.bash
./getIOC.bash
cat IOC.txt/wp-admin.php
/phpmyadmin/
192.168.5.50
evil.example.com
This matched the expected format from the assignment.
Search access log for IOC matches and save to report.txt
Create a Bash script that takes two inputs—an Apache access log file and an IOC file—and outputs only IP, date time, and page accessed into report.txt.
The provided sample access log did not contain the four IOCs from my IOC webpage. A correct script would return no matches in the default state. To demonstrate that the script logic works end to end in a way that matches the assignment example output, I appended three test log entries that contained IOC patterns.
findIOC.bash
#!/bin/bash
if [ $# -ne 2 ]; then
echo "Usage: $0 <access_log> <ioc_file>"
exit 1
fi
LOG="$1"
IOC="$2"
OUT="report.txt"
TMP="matches.tmp"
if [ ! -f "$LOG" ]; then
echo "Log file not found: $LOG"
exit 1
fi
if [ ! -f "$IOC" ]; then
echo "IOC file not found: $IOC"
exit 1
fi
> "$OUT"
> "$TMP"
while IFS= read -r ioc; do
[ -z "$ioc" ] && continue
grep -F "$ioc" "$LOG" >> "$TMP"
done < "$IOC"
if [ ! -s "$TMP" ]; then
echo "No matches found for IOCs in $LOG"
rm "$TMP"
exit 0
fi
awk '{
ip = $1;
dt = $4 " " $5;
gsub(/\[/, "", dt);
gsub(/\]/, "", dt);
page = $7;
print ip "\t" dt "\t" page
}' "$TMP" | sort -u > "$OUT"
rm "$TMP"
echo "Saved results to $OUT"cat << 'EOF' >> ~/Downloads/access.log
10.0.17.20 - - [04/Mar/2024:15:00:00 -0500] "GET /wp-admin.php HTTP/1.1" 200 512 "-" "Mozilla"
10.0.17.21 - - [04/Mar/2024:15:00:05 -0500] "GET /phpmyadmin/ HTTP/1.1" 404 295 "-" "Mozilla"
10.0.17.22 - - [04/Mar/2024:15:00:10 -0500] "GET /index.html HTTP/1.1" 200 758 "http://evil.example.com" "Mozilla"
EOFcd ~
chmod +x findIOC.bash
./findIOC.bash ~/Downloads/access.log IOC.txt
cat report.txt10.0.17.20 04/Mar/2024:15:00:00 -0500 /wp-admin.php
10.0.17.21 04/Mar/2024:15:00:05 -0500 /phpmyadmin/
10.0.17.22 04/Mar/2024:15:00:10 -0500 /index.html
This matches the required output fields and structure for the assignment.
Turn report.txt into an HTML report and deploy to Apache web root
Create a Bash script that reads report.txt, builds an HTML table, and moves the final report to:
/var/www/html/report.html
so it can be accessed from a browser after running the script.
htmlReport.bash
#!/bin/bash
REPORT="report.txt"
HTML="report.html"
WEBROOT="/var/www/html"
if [ ! -f "$REPORT" ]; then
echo "report.txt missing. Run findIOC.bash first."
exit 1
fi
{
echo "<html>"
echo "<head>"
echo "<title>IOC Apache Log Report</title>"
echo "<style>"
echo "table { border-collapse: collapse; }"
echo "th, td { border: 1px solid black; padding: 6px; }"
echo "th { background-color: #dddddd; }"
echo "</style>"
echo "</head>"
echo "<body>"
echo "<h1>IOC Apache Log Report</h1>"
echo "<table>"
echo "<tr><th>IP</th><th>Date and Time</th><th>Page</th></tr>"
while IFS=$'\t' read -r ip datetime page; do
[ -z "$ip" ] && continue
echo "<tr><td>$ip</td><td>$datetime</td><td>$page</td></tr>"
done < "$REPORT"
echo "</table>"
echo "</body>"
echo "</html>"
} > "$HTML"
sudo mv "$HTML" "$WEBROOT/report.html"
echo "HTML report created at $WEBROOT/report.html"
echo "Open in browser: http://10.0.17.19/report.html"cd ~
chmod +x htmlReport.bash
./htmlReport.bashThe report displayed a formatted table with the same three rows from report.txt, confirming the pipeline worked successfully.
This is the exact sequence I used to run the full pipeline cleanly from scratch:
cd /var/www/html
sudo nano IOC.html
cd ~
chmod +x getIOC.bash
./getIOC.bash
chmod +x findIOC.bash
./findIOC.bash ~/Downloads/access.log IOC.txt
chmod +x htmlReport.bash
./htmlReport.bashI produced and verified all required outputs:
Challenge 1: getIOC.bash generates IOC.txt
Challenge 2: findIOC.bash generates report.txt
Challenge 3: htmlReport.bash generates report.html and deploys it into /var/www/html
All three scripts were uploaded to GitHub as separate files to match grading requirements.