Final Assignment Technical Documentation - Snowboundport37/champlain GitHub Wiki

Final Assignment Technical Documentation

IOC scraping, Apache log matching, and HTML report generation

Author: Andrei Gorlitsky
Course: CSI 230 and SYS 320
Date: 12/5/2025

This document is a complete record of everything I did to complete the three challenge final assignment. The workflow pulls IOCs from a webpage, checks them against an Apache access log, and generates a browser viewable HTML report. Each challenge is implemented as a separate Bash script because the grading requires three independent files.


Environment and working setup

I used a Linux VM with Apache serving files from:

/var/www/html

My scripts were stored and executed from my home directory:

/home/champuser


Step 1

Host IOC.html on my own website

The assignment requires scraping an IOC page that will grow over time. To ensure I could reliably curl the page from my own environment, I created or copied the IOC page into the Apache web root.

Commands used:

cd /var/www/html

sudo tee IOC.html >/dev/null << 'EOF'
<html>
  <head>
    <title>Indicators of Compromise</title>
  </head>
  <body>
    <h1>Indicators of Compromise</h1>
    <table border="1">
      <tr><th>IOC</th></tr>
      <tr><td>/wp-admin.php</td></tr>
      <tr><td>/phpmyadmin/</td></tr>
      <tr><td>192.168.5.50</td></tr>
      <tr><td>evil.example.com</td></tr>
    </table>
  </body>
</html>
EOF

I verified the page loaded in the browser:

http://10.0.17.19/IOC.html


Challenge 1

Scrape IOC.html and save results to IOC.txt

Goal

Create a Bash script that retrieves the IOC webpage and outputs a clean IOC list into IOC.txt.

Approach

I used curl to fetch the HTML and extracted the text inside <td> tags using a regex. The output is cleaned and written line by line into IOC.txt.

Script file name

getIOC.bash

Script contents:

#!/bin/bash

IOC_URL="http://10.0.17.19/IOC.html"
OUT_FILE="IOC.txt"

page=$(curl -s "$IOC_URL")

if [ -z "$page" ]; then
    echo "Failed to download IOC page"
    exit 1
fi

echo "$page" \
  | grep -oP '(?<=<td>).*?(?=</td>)' \
  | sed 's/^[[:space:]]*//;s/[[:space:]]*$//' \
  | sed '/^[[:space:]]*$/d' \
  > "$OUT_FILE"

echo "Saved IOCs to $OUT_FILE"

Commands used:

cd ~

chmod +x getIOC.bash
./getIOC.bash
cat IOC.txt

Observed output:

/wp-admin.php
/phpmyadmin/
192.168.5.50
evil.example.com

This matched the expected format from the assignment.


Challenge 2

Search access log for IOC matches and save to report.txt

Goal

Create a Bash script that takes two inputs—an Apache access log file and an IOC file—and outputs only IP, date time, and page accessed into report.txt.

Important testing detail

The provided sample access log did not contain the four IOCs from my IOC webpage. A correct script would return no matches in the default state. To demonstrate that the script logic works end to end in a way that matches the assignment example output, I appended three test log entries that contained IOC patterns.

Script file name

findIOC.bash

Script contents:

#!/bin/bash

if [ $# -ne 2 ]; then
    echo "Usage: $0 <access_log> <ioc_file>"
    exit 1
fi

LOG="$1"
IOC="$2"
OUT="report.txt"
TMP="matches.tmp"

if [ ! -f "$LOG" ]; then
    echo "Log file not found: $LOG"
    exit 1
fi

if [ ! -f "$IOC" ]; then
    echo "IOC file not found: $IOC"
    exit 1
fi

> "$OUT"
> "$TMP"

while IFS= read -r ioc; do
    [ -z "$ioc" ] && continue
    grep -F "$ioc" "$LOG" >> "$TMP"
done < "$IOC"

if [ ! -s "$TMP" ]; then
    echo "No matches found for IOCs in $LOG"
    rm "$TMP"
    exit 0
fi

awk '{
    ip = $1;
    dt = $4 " " $5;
    gsub(/\[/, "", dt);
    gsub(/\]/, "", dt);
    page = $7;
    print ip "\t" dt "\t" page
}' "$TMP" | sort -u > "$OUT"

rm "$TMP"
echo "Saved results to $OUT"

Commands used to prepare a demonstration dataset:

cat << 'EOF' >> ~/Downloads/access.log
10.0.17.20 - - [04/Mar/2024:15:00:00 -0500] "GET /wp-admin.php HTTP/1.1" 200 512 "-" "Mozilla"
10.0.17.21 - - [04/Mar/2024:15:00:05 -0500] "GET /phpmyadmin/ HTTP/1.1" 404 295 "-" "Mozilla"
10.0.17.22 - - [04/Mar/2024:15:00:10 -0500] "GET /index.html HTTP/1.1" 200 758 "http://evil.example.com" "Mozilla"
EOF

Commands used to run Challenge 2:

cd ~

chmod +x findIOC.bash
./findIOC.bash ~/Downloads/access.log IOC.txt
cat report.txt

Observed output:

10.0.17.20	04/Mar/2024:15:00:00 -0500	/wp-admin.php
10.0.17.21	04/Mar/2024:15:00:05 -0500	/phpmyadmin/
10.0.17.22	04/Mar/2024:15:00:10 -0500	/index.html

This matches the required output fields and structure for the assignment.


Challenge 3

Turn report.txt into an HTML report and deploy to Apache web root

Goal

Create a Bash script that reads report.txt, builds an HTML table, and moves the final report to:

/var/www/html/report.html

so it can be accessed from a browser after running the script.

Script file name

htmlReport.bash

Script contents:

#!/bin/bash

REPORT="report.txt"
HTML="report.html"
WEBROOT="/var/www/html"

if [ ! -f "$REPORT" ]; then
    echo "report.txt missing. Run findIOC.bash first."
    exit 1
fi

{
  echo "<html>"
  echo "<head>"
  echo "<title>IOC Apache Log Report</title>"
  echo "<style>"
  echo "table { border-collapse: collapse; }"
  echo "th, td { border: 1px solid black; padding: 6px; }"
  echo "th { background-color: #dddddd; }"
  echo "</style>"
  echo "</head>"
  echo "<body>"
  echo "<h1>IOC Apache Log Report</h1>"
  echo "<table>"
  echo "<tr><th>IP</th><th>Date and Time</th><th>Page</th></tr>"

  while IFS=$'\t' read -r ip datetime page; do
      [ -z "$ip" ] && continue
      echo "<tr><td>$ip</td><td>$datetime</td><td>$page</td></tr>"
  done < "$REPORT"

  echo "</table>"
  echo "</body>"
  echo "</html>"
} > "$HTML"

sudo mv "$HTML" "$WEBROOT/report.html"

echo "HTML report created at $WEBROOT/report.html"
echo "Open in browser: http://10.0.17.19/report.html"

Commands used:

cd ~

chmod +x htmlReport.bash
./htmlReport.bash

Browser verification:

http://10.0.17.19/report.html

The report displayed a formatted table with the same three rows from report.txt, confirming the pipeline worked successfully.


Full execution sequence

This is the exact sequence I used to run the full pipeline cleanly from scratch:

cd /var/www/html
sudo nano IOC.html

cd ~

chmod +x getIOC.bash
./getIOC.bash

chmod +x findIOC.bash
./findIOC.bash ~/Downloads/access.log IOC.txt

chmod +x htmlReport.bash
./htmlReport.bash

Final deliverables

I produced and verified all required outputs:

Challenge 1: getIOC.bash generates IOC.txt

Challenge 2: findIOC.bash generates report.txt

Challenge 3: htmlReport.bash generates report.html and deploys it into /var/www/html

All three scripts were uploaded to GitHub as separate files to match grading requirements.

⚠️ **GitHub.com Fallback** ⚠️