ICE 4 - slcc2c/CS5590_Python GitHub Wiki

In class exercise: Write a simple program that parse a wikipage then extract the headers of the page. Steps to do this simple program:

  1. Import these libraries import requests from bs4 import BeautifulSoup import os
  2. define a variable and put the link you are willing to extract data of that
  3. use the Request library to download the url in another variable ex: sourcecode=Request.get(url)
  4. parse the sourceCode using the BeautifulSoap library and save the parsed code in a variable 5.use findAll(‘div’) to find all the div in the parsed sourcecode
  5. use a loop by the result that you have got of the step 5 to find heading ex: for div in result: R1=div.find(‘h1’)
  6. print the content of the R1
  7. do the same for printing the body

solution is here