Website: Web_Crawling.doc - ZhaochengLi/Zhaocheng-s GitHub Wiki
本期学习内容来自于: 莫烦PYTHON, 由衷感谢老师的Tutorial!!!!
内容完全用于个人学习,无商业用途,如有关注,请多多了解 莫烦PYTHON. 一位专业的教学up主!
- 信息采集
- 信息处理,e.g.,可视化处理,寻找潜在联系
-
将要大量运用HTML. CSS 以及 JavaScript 也会提到。
-
For HTML language, its structure is made of massive elements, such as
<head>
, and<body>
. All the contents will be covered inside elements, like<head> ... contents ... </head>.
-
What web-crawling mainly does is to catch those elements for information.
-
We will use Python to accomplish it. There are two steps.
- Use Python to browse the source code of a website;
- Match the elements in source code by using Regular Explression of Python. This is a preferable method for entry-level matching only. For more advanced needs, we will use BeautifulSoup.
- 为了更好理解和实际运用,我们将剩下课题放下Jupyter Notebook的project下运行。