Week Two — Traceroute Analysis - matthewlaujh/itpnyu-understandingnetworks GitHub Wiki
Traceroute Analysis
Use traceroute to explore the paths of your network transactions (here’s an additional introduction to traceroute). Try to get a sense of where the routers and servers on which you rely are physically located, and what networks they traverse.
Start by tracing a few of the sites that you visit regularly: Facebook, gmail, bank, school, Zoom, etc. Trace your last few online purchases or meetings. Alternately, download your browsing history from Google (if you use Chrome) from takeout.google.com. You’re not recording the activities, you’re collecting data about the paths over which those activities occurred, after the fact.
Use the various command line tools we cover to extract the ten or twenty domains you visit the most. Then traceroute each one to see what common hosts and autonomous systems (AS) your traffic goes through. You can look up AS registrants on ARIN.net or any of the other Regional Internet Registries (RIRs).
Trace the paths from all of the locations you regularly connect from: home, school, any public places from which you connect. You can even trace from and to your mobile phone, if you know its IP address. Save the traces in files, and make maps of the routes. Summarize the most common network providers in your activities. Identify who the major network providers are in your life. Figure out whose hands the data about your life goes through on a regular basis. Look for patterns from your network-browsing habits through analysis and graphing of your network traces.
See if you can figure out where your most common sites are using IP geolocation. Use a site like ipinfo.io to make a list of where those IPs are located. Make a map of your results. How accurate do you think this map is, geographically? You might try a few different geoIP solutions to find where each IP address is located, and see how the results differ:
Maxmind
IP2Location
IPdata.co
WhatismyIP.com
IPstack.com
Feel free to obfuscate the endpoints if you don’t want us to know what sites you visit. Write a summary of your work and findings on your blog. We’ll compare notes on each others traces in class.
Finding my browsing history
First issue is that I'm using this browser called ARC it's an amazing browser it does so much, BUT it doesn't have a function for me to download my browsing history. I'm just going to have to do this assignment another way then. I'll try my best to take note of the websites I visit over the weekend and then I will traceroute them.
Jasmine uses the same browser and she found a way to find the history deep in the library files, the next thing is that it's in sqlite3 and I'm not familiar with that so I will be using some LLM assistance and googling to find the right commands to find the information needed. Not too surprised about my search history it's mainly email, chat messaging platforms and social media, spotify and github(for some reason).
Did a quick look through my history and it might've been because of the way the browser handles pinned tabs or something but the bulk of my top 20 domains were my pinned tabs or domains adjecent to that. It was my messaging sites (telegram and whatsapp, and the pages associated with those, various different chats), productivity sites (gmail and google calendar), and social/entertainment sites (spotify and instagram) and also github just cause I've been using it a lot. On top of those usual domains I access, I also met up with some ex-colleagues who were visiting and while we were talking and that reminded me that it could be interesting to trace the route to the company's website (rga) and while I was at it, I also added my current workplace to the list of domains to trace route to. (accenture)
Tracerouting the domains
Link to traced routes Link to IP info images of each route
First 3 routes traced were always my router, and I'm assuming a server somewhere in my building and then one in the area before they split out to other places. Looking at the initial findings, I found that domains from the same parent companies share the same-ish route (WhatsApp and Instagram) (Gmail and Google Calendar). And in the case of the google ones, they took the same amount of routes to reach the server. I'm not sure if that's true for the meta-owned companies cause they timed out after 64. The google ones had shorter routes cause they just pointed me towards a server in New York. The meta-owned companies bounced around New York and New Jersey before it reached hidden IPs timing out. Same for telegram and spotify. Based on these I'm suspecting that cause they're web apps, they're all just going to the nearest possible server of the company and then something else happens there cause they're not pages that get loaded once, they stay on.
But these are pretty boring, the slightly more interesting one was Github which went to Virgina before it hit hidden IPs and timed out. For RGA and my website they went to AWS before getting lost in hidden IPs and timing out. For Accenture it actually made it to an IP which was similar to the initial trace IP before hitting hidden IPs and timing out, which I'm assuming should mean they actually reached the server or an adjacent server that was actually hosting the site.
Interesting thing happened when I was committing the images of the traced route IPs to this repo —
They thought they were keys hahaha!