wget,urllib,re - zixcon/python GitHub Wiki
urllib
3.x与2.x相比,它整合了urllib,urllib2,urllib3等一系列的模块
http://blog.csdn.net/drdairen/article/details/51149498
所以要实现下载如下: import urllib.request
-
直接下载 url = 'http://www.cbrc.gov.cn/chinese/files/2017/BF2D2E4669B1458CB1655D0762AD0F60.pdf' data = urllib.request.urlopen(url) with open("去库存-urllib.pdf", "wb") as code: code.write(data.read())
-
伪装User-Agent下载 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'} req = urllib.request.Request(url=url, headers=headers) data = urllib.request.urlopen(req) with open("去库存-urllib.pdf", "wb") as code: code.write(data.read())
wget
re
https://docs.python.org/3.6/library/re.html
reg = re.compile(r'(.*?)') // r表示防止转义 item = re.findall(reg, html)