w1_bs4 - steelbear/HMG_Softeer_DE GitHub Wiki

Beautiful Soup 4

  • HTML์ด๋‚˜ XML๊ณผ ๊ฐ™์€ ๋งˆํฌ์—… ์–ธ์–ด๋กœ ์ž‘์„ฑ๋œ ํ…์ŠคํŠธ ํŒŒ์‹ฑ์„ ๋„์™€์ฃผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
from bs4 import BeautifulSoup


soup = BeautifulSoup(text, 'html.parse') # (ํŒŒ์‹ฑํ•  ํ…์ŠคํŠธ, ํŒŒ์‹ฑ ํ•จ์ˆ˜)

soup.find('a') # ๊ฐ€์žฅ ๋จผ์ € ๋ฐœ๊ฒฌ๋˜๋Š” ํƒœ๊ทธ ์ฐพ๊ฐ€

# attribute ์ถ”๊ฐ€๋กœ ๊ฒ€์ƒ‰
tag = soup.find('a', href='...')
soup.find('a', classes=['...'])
soup.find('a', attrs={'value': '0'}) 

# ํ•ด๋‹นํ•˜๋Š” ๋ชจ๋“  ํƒœ๊ทธ ๊ฒ€์ƒ‰
soup.findall('a')

# ํƒœ๊ทธ ์† ์ปจํ…์ธ 
tag.content
tag.text

# ๋ถ€๋ชจ, ์ž์‹, ํ˜•์ œ ํƒœ๊ทธ
tag.parent
tag.child
tag.children
tag.next_sibling
tag.prev_sibling

# ๊ฒ€์ƒ‰์œผ๋กœ ๋ถ€๋ชจ ํƒœ๊ทธ์™€ ํ˜•์ œ ํƒœ๊ทธ ๊ฒ€์ƒ‰
tag.find_parent(...)
tag.find_next_sibling(...)