ICE 4 - slcc2c/CS5590_Python GitHub Wiki

In class exercise: Write a simple program that parse a wikipage then extract the headers of the page. Steps to do this simple program:

Import these libraries import requests from bs4 import BeautifulSoup import os
define a variable and put the link you are willing to extract data of that
use the Request library to download the url in another variable ex: sourcecode=Request.get(url)
parse the sourceCode using the BeautifulSoap library and save the parsed code in a variable 5.use findAll(‘div’) to find all the div in the parsed sourcecode
use a loop by the result that you have got of the step 5 to find heading ex: for div in result: R1=div.find(‘h1’)
print the content of the R1
do the same for printing the body

solution is here