Scraping Champlain Classes - Snowboundport37/champlain GitHub Wiki
PowerShell scripts to scrape Champlain College course listings from a local HTML file and run queries on the data.
This project fulfills the Scraping Champlain Classes assignment.
ChamplainScrape/
│
├── functions.ps1 # Functions for scraping, parsing, and processing course data
├── main.ps1 # Main driver script that calls functions and runs queries
└── screenshots/ # Output evidence (Sample Data + Queries i–iv)
-
functions.ps1
-
Get-ChamplainClasses: Scrapes table rows (<tr>) and extracts course info. -
Convert-DaysCode: Expands shorthand day codes (M, T, W, Th, F) into full names. -
Convert-DaysInTable: Replaces theDaysproperty in the table with arrays of full names. -
Convert-ToTime: Helper for parsing and sorting times.
-
-
main.ps1
- Calls
functions.ps1(dot sourcing). - Loads the HTML file from
http://localhost/Courses2025FA.html. - Runs and displays queries:
- Sample Data (first 3 rows).
- Query i: All classes taught by Furkan Paligu.
-
Query ii: All classes in
JOYC 310on Mondays, sorted by start time. -
Query iii: List of all instructors who teach at least one
SYS, NET, SEC, FOR, CSI, DATclass. - Query iv: Group instructors by number of classes taught (those same subjects).
- Calls
function Get-ChamplainClasses {
param([string]$Uri)
$page = Invoke-WebRequest -Uri $Uri -TimeoutSec 5
$rows = @($page.ParsedHtml.getElementsByTagName("tr"))
$out = @()
for ($i = 1; $i -lt $rows.length; $i++) {
$tds = @($rows[$i].getElementsByTagName("td"))
if (-not $tds -or $tds.length -lt 6) { continue }
# core fields
$code = $tds[0].innerText.Trim()
$title = $tds[1].innerText.Trim()
$days = $tds[3].innerText.Trim()
$time = $tds[5].innerText.Trim()
# handle Instructor/Location/Date columns
$last = $tds[$tds.Length-1].innerText.Trim()
$prev = $tds[$tds.Length-2].innerText.Trim()
$datesPattern = '^\d{1,2}/\d{1,2}\s*-\s*\d{1,2}/\d{1,2}$'
if ($last -match $datesPattern) {
$loc = $prev
$inst = $tds[$tds.Length-3].innerText.Trim()
}
elseif ($prev -match $datesPattern) {
$inst = $tds[$tds.Length-3].innerText.Trim()
$loc = $last
}
else {
$inst = $prev
$loc = $last
}
# split time
$parts = $time -split '\s*-\s*'
$start = if ($parts.Count -ge 1) { $parts[0] } else { "" }
$end = if ($parts.Count -ge 2) { $parts[1] } else { "" }
$out += [pscustomobject]@{
"Class Code" = $code
"Title" = $title
"Days" = $days
"Time Start" = $start
"Time End" = $end
"Instructor" = $inst
"Location" = $loc
}
}
return $out
}
function Convert-DaysCode {
param([string]$Code)
$u = ($Code -replace '\s','').ToUpper()
$days = @()
if ($u -match 'TH') { $days += 'Thursday'; $u = $u -replace 'TH','' }
if ($u -match 'M') { $days += 'Monday' }
if ($u -match 'T') { $days += 'Tuesday' }
if ($u -match 'W') { $days += 'Wednesday' }
if ($u -match 'F') { $days += 'Friday' }
return $days
}
function Convert-DaysInTable {
param([object[]]$Table)
foreach ($row in $Table) { $row.Days = Convert-DaysCode -Code $row.Days }
return $Table
}
function Convert-ToTime {
param([string]$t)
try { return [datetime]::Parse($t) } catch { return (Get-Date '1/1/1900') }
}. "$PSScriptRoot\functions.ps1"
$uri = "http://localhost/Courses2025FA.html"
$FullTable = Get-ChamplainClasses -Uri $uri
$FullTable = Convert-DaysInTable -Table $FullTable
Write-Host "`n--- Sample Data ---" -ForegroundColor Cyan
$FullTable | Select-Object -First 3 | Format-List
# i classes of Furkan Paligu
Write-Host "`n--- Query i ---" -ForegroundColor Yellow
$FullTable |
Where-Object { $_.Instructor -like "Furkan Paligu" } |
Format-List "Class Code", Instructor, Location, Days, "Time Start", "Time End"
# ii JOYC 310 on Monday sorted by start
Write-Host "`n--- Query ii ---" -ForegroundColor Yellow
$FullTable |
Where-Object { $_.Location -like "*JOYC 310*" -and ($_.Days -contains "Monday") } |
Sort-Object @{ Expression = { Convert-ToTime $_."Time Start" } } |
Select-Object "Time Start","Time End","Class Code" |
Format-Table
# iii instructors who teach at least one SYS NET SEC FOR CSI DAT
Write-Host "`n--- Query iii ---" -ForegroundColor Yellow
$ITSInstructors =
$FullTable |
Where-Object { $_."Class Code" -match "^(SYS|NET|SEC|FOR|CSI|DAT)\s" } |
Select-Object -ExpandProperty Instructor |
Sort-Object -Unique
$ITSInstructors | ForEach-Object { [pscustomobject]@{ Instructor = $_ } } | Format-Table
# iv group instructors by count on those subjects
Write-Host "`n--- Query iv ---" -ForegroundColor Yellow
$FullTable |
Where-Object { $_."Class Code" -match "^(SYS|NET|SEC|FOR|CSI|DAT)\s" } |
Group-Object Instructor |
Sort-Object Count -Descending |
Select-Object Count, Name |
Format-Table- Place
Courses2025FA.htmlin your local web server root:C:\xampp\htdocs\Courses2025FA.html - Ensure Apache is running in XAMPP.
- Clone or download this repo.
- In PowerShell, navigate into the project folder:
cd C:\Users\<YourUser>\Documents\ChamplainScrape
- Run the main script:
.\main.ps1
Evidence of successful queries is included in the screenshots/ folder:
--- Query i ---
Class Code : CSI 230-01
Instructor : Furkan Paligu
Location : JOYC 310
Days : Monday
Time Start : 10AM
Time End : 12:45PM
- GitHub repo link containing scripts + screenshots
- Screenshots also submitted through course portal if required