Error message

Deprecated function: implode(): Passing glue string after array is deprecated. Swap the parameters in drupal_get_feeds() (line 394 of /home1/tylerfra/public_html/includes/common.inc).

Retrieve a URL's HTML using CURL and Drupal's Cache

Submitted by tyler on Wed, 05/25/2011 - 18:43

Updated: 2013-04-02

Category:

Code

Tags:

drupal 6.x

cache

curl

html

planet drupal

Need to get the HTML output from a URL and place it in Drupal's cache? Well then, you may do something like this:

function tf_crawl_url ($url) {
  // see if we have this url cached already, if we do pull the html from cache,
  // if we don't, then curl the url and store it in cache
  $html;
  $cache_key = $url;
  $cache = cache_get($cache_key);
  if ($cache) {
    drupal_set_message("Grabbed $url from cache.");
    $html = $cache->data;
  }
  else {
    $curl_handle = curl_init();
    curl_setopt($curl_handle,CURLOPT_URL,$url);
    curl_setopt($curl_handle,CURLOPT_CONNECTTIMEOUT,2);
    curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,1);
    $html = curl_exec($curl_handle);
    curl_close($curl_handle);
    cache_set($cache_key,$html,"cache");
    drupal_set_message("Curled $url and placed in cache.");
  }
  return $html;
}

Example usage:

$html = tf_crawl_url("http://www.drupal.org");
print $html;

Search form

github.png

Error message

You are here

Retrieve a URL's HTML using CURL and Drupal's Cache

User login