Retrieve a URL's HTML using CURL and Drupal's Cache
Category:
Need to get the HTML output from a URL and place it in Drupal's cache? Well then, you may do something like this:
function tf_crawl_url ($url) { // see if we have this url cached already, if we do pull the html from cache, // if we don't, then curl the url and store it in cache $html; $cache_key = $url; $cache = cache_get($cache_key); if ($cache) { drupal_set_message("Grabbed $url from cache."); $html = $cache->data; } else { $curl_handle = curl_init(); curl_setopt($curl_handle,CURLOPT_URL,$url); curl_setopt($curl_handle,CURLOPT_CONNECTTIMEOUT,2); curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,1); $html = curl_exec($curl_handle); curl_close($curl_handle); cache_set($cache_key,$html,"cache"); drupal_set_message("Curled $url and placed in cache."); } return $html; }
Example usage:
$html = tf_crawl_url("http://www.drupal.org"); print $html;