ICE 4 - slcc2c/CS5590_Python GitHub Wiki
In class exercise: Write a simple program that parse a wikipage then extract the headers of the page. Steps to do this simple program:
- Import these libraries import requests from bs4 import BeautifulSoup import os
- define a variable and put the link you are willing to extract data of that
- use the Request library to download the url in another variable ex: sourcecode=Request.get(url)
- parse the sourceCode using the BeautifulSoap library and save the parsed code in a variable 5.use findAll(‘div’) to find all the div in the parsed sourcecode
- use a loop by the result that you have got of the step 5 to find heading ex: for div in result: R1=div.find(‘h1’)
- print the content of the R1
- do the same for printing the body
solution is here