用php來爬全家的商店 - stu80143/gitNote GitHub Wiki
有時候需要一些資料,當然最好的方法就是請對方提供api, 一次取全部,無憂無慮,如果有問題也是對方api的問題,可是 有的時候沒有辦法這麼簡單,只好自己去網路上面抓資料, 以下就用全家的店舖來當範例
用php的Curl功能來從網頁上抓資料
很簡單的透過google可以找到這個網址
畫面會是這樣
在畫面的左邊是正常的進入的頁面
右側則是打開開發者工具後的畫面
首先先把目前載入的所有紀錄清除,方便查看所需要的資料 所以首先到network標籤頁,點擊紀錄旁邊的禁止符號(clear)
清除完後應該會變成這樣
接著隨便點一個縣市, 這裡以點擊基隆為範例,可以很清楚的看見多了一筆request的紀錄 而且很明顯可以看到是GET的request
可以點擊該筆紀錄,查看更詳細的訊息
首先來看Response,這樣才知道這筆request是不是真正需要的
response內容:
storeTownList([
{
"post": "200",
"town": "仁愛區",
"city": "基隆市"
},
{
"post": "201",
"town": "信義區",
"city": "基隆市"
},
{
"post": "202",
"town": "中正區",
"city": "基隆市"
},
{
"post": "203",
"town": "中山區",
"city": "基隆市"
},
{
"post": "204",
"town": "安樂區",
"city": "基隆市"
},
{
"post": "205",
"town": "暖暖區",
"city": "基隆市"
},
{
"post": "206",
"town": "七堵區",
"city": "基隆市"
}
])
所以從伺服器得到了這些回應資料,就是在基隆市下面的所有區域 如此確定了這是所需要的request,就再回到Headers頁籤
其中比較重要的內容有
//節錄url
Request URL: http://api.map.com.tw/net/familyShop.aspx?searchType=...
//請求的方法類型
Request Method: GET
//從哪裡請求的
Referer: http://www.family.com.tw/marketing/inquiry.aspx
//Query String Parameters
searchType: ShowTownList
type:
city: 基隆市
fun: storeTownList
key: 6F30E8BF706D653965...
由於是透過GET方法來取值的,因此參數就直接放在URL後面即可
基本上使用php curl就可以取得資料
範例:
<?php
ini_set('post_max_size', '64M');
set_time_limit(0);
ob_end_clean();
ob_implicit_flush(1);
function searchArea (){
$temp = [];
$cities = ["基隆市","台北市","新北市","桃園市","新竹縣","新竹市","苗栗縣","台中市",
"彰化縣","南投縣","雲林縣","嘉義市","嘉義縣","台南市","高雄市","屏東縣",
"宜蘭縣","花蓮縣","台東縣"];
$referer = 'http://www.family.com.tw/marketing/inquiry.aspx';
foreach ($cities as $city) {
$url = "http://api.map.com.tw/net/familyShop.aspx?searchType=ShowTownList&type=&city=";
$url .= $city;
$url .= "&fun=storeTownList&key=6F30E8BF706D653965BD...";
$ch = curl_init ($url);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_REFERER, $referer);
$storeTownList = curl_exec($ch);
//刪除前面的『storeTownList ( 』,跟最後面的 『)』
$storeTownList = substr($storeTownList,14,strlen($storeTownList)-15);
//json解碼
$storeTownList = json_decode($storeTownList,true);
//把取得的資料包回來
foreach ($storeTownList as $key => $data) {
$temp[$city][] = $data;
}
}
// 查看資料有沒有包好
// print("<pre>".print_r($temp,true)."</pre>");
searchShop($temp);
}
//用一樣的方法來取得各區下面的店舖資料
function searchShop($areaList){
$temp = [];
$referer = 'http://www.family.com.tw/marketing/inquiry.aspx';
foreach ($areaList as $city => $area) {
foreach ($area as $key => $data) {
$paraCity = $data["city"];
$paraTown = $data["town"];
$url = "http://api.map.com.tw/net/familyShop.aspx?searchType=ShopList&type=&city=";
$url .= $paraCity;
$url .= "&area=".$paraTown."&road=&fun=showStoreList&key=6F30E8BF706D6...";
$ch = curl_init ($url);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_REFERER, $referer);
$storeList = curl_exec($ch);
$storeList = substr($storeList,14,strlen($storeList)-15);
$storeList = json_decode($storeList,true);
$temp [$paraCity][$paraTown] = $storeList;
}
}
print("<pre>".print_r($temp,true)."</pre>");
}
?>
如此就可以取得資料,接著要如何處理就看各自的需求了。