wget,urllib,re - zixcon/python GitHub Wiki

urllib

3.x与2.x相比,它整合了urllib,urllib2,urllib3等一系列的模块
http://blog.csdn.net/drdairen/article/details/51149498

所以要实现下载如下: import urllib.request

  • 直接下载 url = 'http://www.cbrc.gov.cn/chinese/files/2017/BF2D2E4669B1458CB1655D0762AD0F60.pdf' data = urllib.request.urlopen(url) with open("去库存-urllib.pdf", "wb") as code: code.write(data.read())

  • 伪装User-Agent下载 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'} req = urllib.request.Request(url=url, headers=headers) data = urllib.request.urlopen(req) with open("去库存-urllib.pdf", "wb") as code: code.write(data.read())

wget

re

https://docs.python.org/3.6/library/re.html

reg = re.compile(r'(.*?)') // r表示防止转义 item = re.findall(reg, html)