GouttePHP 爬蟲庫
Goutte 是一個抓取網站數(shù)據的 PHP 庫。它提供了一個優(yōu)雅的 API,這使得從遠程頁面上選擇特定元素變得簡單。
示例代碼:
require_once '/path/to/goutte.phar';
use Goutte\Client;
//發(fā)送請求
$client = new Client();
$crawler = $client->request('GET', 'http://www.oschina.net/');
//點擊鏈接
$link = $crawler->selectLink('Plugins')->link();
$crawler = $client->click($link);
//提交表單
$form = $crawler->selectButton('sign in')->form();
$crawler = $client->submit($form, array('signin[username]' => 'fabien', 'signin[password]' => 'xxxxxx'));
//提取數(shù)據
$nodes = $crawler->filter('.error_list');
if ($nodes->count())
{
die(sprintf("Authentication error: %s\n", $nodes->text()));
}
printf("Nb tasks: %d\n", $crawler->filter('#nb_tasks')->text());評論
圖片
表情
