Scraping Champlain Classes - Snowboundport37/champlain GitHub Wiki

ChamplainScrape

PowerShell scripts to scrape Champlain College course listings from a local HTML file and run queries on the data.
This project fulfills the Scraping Champlain Classes assignment.


📂 Project Structure

ChamplainScrape/
│
├── functions.ps1   # Functions for scraping, parsing, and processing course data
├── main.ps1        # Main driver script that calls functions and runs queries
└── screenshots/    # Output evidence (Sample Data + Queries i–iv)

⚙️ How It Works

  1. functions.ps1

    • Get-ChamplainClasses: Scrapes table rows (<tr>) and extracts course info.
    • Convert-DaysCode: Expands shorthand day codes (M, T, W, Th, F) into full names.
    • Convert-DaysInTable: Replaces the Days property in the table with arrays of full names.
    • Convert-ToTime: Helper for parsing and sorting times.
  2. main.ps1

    • Calls functions.ps1 (dot sourcing).
    • Loads the HTML file from http://localhost/Courses2025FA.html.
    • Runs and displays queries:
      • Sample Data (first 3 rows).
      • Query i: All classes taught by Furkan Paligu.
      • Query ii: All classes in JOYC 310 on Mondays, sorted by start time.
      • Query iii: List of all instructors who teach at least one SYS, NET, SEC, FOR, CSI, DAT class.
      • Query iv: Group instructors by number of classes taught (those same subjects).

📜 Code

functions.ps1

function Get-ChamplainClasses {
    param([string]$Uri)

    $page = Invoke-WebRequest -Uri $Uri -TimeoutSec 5
    $rows = @($page.ParsedHtml.getElementsByTagName("tr"))
    $out  = @()

    for ($i = 1; $i -lt $rows.length; $i++) {
        $tds = @($rows[$i].getElementsByTagName("td"))
        if (-not $tds -or $tds.length -lt 6) { continue }

        # core fields
        $code  = $tds[0].innerText.Trim()
        $title = $tds[1].innerText.Trim()
        $days  = $tds[3].innerText.Trim()
        $time  = $tds[5].innerText.Trim()

        # handle Instructor/Location/Date columns
        $last = $tds[$tds.Length-1].innerText.Trim()
        $prev = $tds[$tds.Length-2].innerText.Trim()
        $datesPattern = '^\d{1,2}/\d{1,2}\s*-\s*\d{1,2}/\d{1,2}$'

        if ($last -match $datesPattern) {
            $loc  = $prev
            $inst = $tds[$tds.Length-3].innerText.Trim()
        }
        elseif ($prev -match $datesPattern) {
            $inst = $tds[$tds.Length-3].innerText.Trim()
            $loc  = $last
        }
        else {
            $inst = $prev
            $loc  = $last
        }

        # split time
        $parts = $time -split '\s*-\s*'
        $start = if ($parts.Count -ge 1) { $parts[0] } else { "" }
        $end   = if ($parts.Count -ge 2) { $parts[1] } else { "" }

        $out += [pscustomobject]@{
            "Class Code" = $code
            "Title"      = $title
            "Days"       = $days
            "Time Start" = $start
            "Time End"   = $end
            "Instructor" = $inst
            "Location"   = $loc
        }
    }
    return $out
}

function Convert-DaysCode {
    param([string]$Code)
    $u = ($Code -replace '\s','').ToUpper()
    $days = @()
    if ($u -match 'TH') { $days += 'Thursday'; $u = $u -replace 'TH','' }
    if ($u -match 'M')  { $days += 'Monday' }
    if ($u -match 'T')  { $days += 'Tuesday' }
    if ($u -match 'W')  { $days += 'Wednesday' }
    if ($u -match 'F')  { $days += 'Friday' }
    return $days
}

function Convert-DaysInTable {
    param([object[]]$Table)
    foreach ($row in $Table) { $row.Days = Convert-DaysCode -Code $row.Days }
    return $Table
}

function Convert-ToTime {
    param([string]$t)
    try { return [datetime]::Parse($t) } catch { return (Get-Date '1/1/1900') }
}

main.ps1

. "$PSScriptRoot\functions.ps1"

$uri = "http://localhost/Courses2025FA.html"

$FullTable = Get-ChamplainClasses -Uri $uri
$FullTable = Convert-DaysInTable -Table $FullTable

Write-Host "`n--- Sample Data ---" -ForegroundColor Cyan
$FullTable | Select-Object -First 3 | Format-List

# i classes of Furkan Paligu
Write-Host "`n--- Query i ---" -ForegroundColor Yellow
$FullTable |
  Where-Object { $_.Instructor -like "Furkan Paligu" } |
  Format-List "Class Code", Instructor, Location, Days, "Time Start", "Time End"

# ii JOYC 310 on Monday sorted by start
Write-Host "`n--- Query ii ---" -ForegroundColor Yellow
$FullTable |
  Where-Object { $_.Location -like "*JOYC 310*" -and ($_.Days -contains "Monday") } |
  Sort-Object @{ Expression = { Convert-ToTime $_."Time Start" } } |
  Select-Object "Time Start","Time End","Class Code" |
  Format-Table

# iii instructors who teach at least one SYS NET SEC FOR CSI DAT
Write-Host "`n--- Query iii ---" -ForegroundColor Yellow
$ITSInstructors =
  $FullTable |
  Where-Object { $_."Class Code" -match "^(SYS|NET|SEC|FOR|CSI|DAT)\s" } |
  Select-Object -ExpandProperty Instructor |
  Sort-Object -Unique
$ITSInstructors | ForEach-Object { [pscustomobject]@{ Instructor = $_ } } | Format-Table

# iv group instructors by count on those subjects
Write-Host "`n--- Query iv ---" -ForegroundColor Yellow
$FullTable |
  Where-Object { $_."Class Code" -match "^(SYS|NET|SEC|FOR|CSI|DAT)\s" } |
  Group-Object Instructor |
  Sort-Object Count -Descending |
  Select-Object Count, Name |
  Format-Table

▶️ Running the Scripts

  1. Place Courses2025FA.html in your local web server root:
    C:\xampp\htdocs\Courses2025FA.html
    
  2. Ensure Apache is running in XAMPP.
  3. Clone or download this repo.
  4. In PowerShell, navigate into the project folder:
    cd C:\Users\<YourUser>\Documents\ChamplainScrape
  5. Run the main script:
    .\main.ps1

📸 Screenshots

Evidence of successful queries is included in the screenshots/ folder:

image image

✅ Example Output (snippet)

--- Query i ---

Class Code : CSI 230-01
Instructor : Furkan Paligu
Location   : JOYC 310
Days       : Monday
Time Start : 10AM
Time End   : 12:45PM

📝 Deliverables

  • GitHub repo link containing scripts + screenshots
  • Screenshots also submitted through course portal if required

⚠️ **GitHub.com Fallback** ⚠️