4.5.3.Webscraping - sj50179/IBM-Data-Science-Professional-Certificate GitHub Wiki
- Define Webscraping
- Beautiful Soup Objects
- Find_all
- Webscraping a website
a process that can be used to automatically extract information from a website, and can easily be accomplished within a matter of minutes and not hours
from bs4 import BeautifulSoup
html="<!DOCTYPE html><html><head><title>Page ......"
soup = BeautifulSoup(html, 'html5lib')
tag_object = soup.title
tag_object = soup.h3
tag_child = tag_object.b
parent_tag = tag_child.parent
sibling_1 = tag_object.next_sibling
sibling_2 = sibling_1.next_sibling
tag_child.attrs
tag_child.string
table_row = table.find_all(name='tr')
#Tag Object
first_row = table_row[0]
first_row.td
#Variable row
for i, row in enumerate(table_rows):
print("row", i)
cells = row.find_all("td")
for j, cell in enumerate(cells):
print("column", j, "cell", cell)