sp500 constituents - KamarajuKusumanchi/market_data GitHub Wiki
https://en.wikipedia.org/wiki/List_of_S%26P_500_companies contains the latest sp500 constituents.
https://github.com/wazabata/p4fin/blob/master/Code_part1/12_ML_Classifier.py contains some code to parse the wikipage.
This article might be helpful - https://roche.io/2016/05/scrape-wikipedia-with-python
snippet to get revision number of a wikipedia article
import re from bs4 import BeautifulSoup import requests from urllib.parse import urlparse, parse_qs url="https://en.wikipedia.org/wiki/List_of_S&P_500_companies" resp = requests.get(url) soup = BeautifulSoup(resp.text, "lxml") line=soup.find('a', attrs={'title': re.compile("Information on how to cite this page")}) link=line.get('href') query_string=urlparse(link).query id=parse_qs(query_string).get('id', None)[0]
Check if this is any useful - https://pythonspot.com/en/extract-links-from-webpage-beautifulsoup/