Based on Michael Knapp's code, and adding some regex, here's a function that will get all meta tags and the title based on a URL. If there's an error, it will return false. Using the function getUrlContents(), also included, it takes care of META REFRESH re-directions, following up to the specified number of redirections. Please note that the regular expressions included were split into strings because php.net was complaining about the line being to long ;)
{$result=false;$contents=getUrlContents($url);
if (isset($contents) &&is_string($contents))
{$title=null;$metaTags=null;preg_match('/
([^>]*)/si',$contents,$match);if (isset($match) &&is_array($match) &&count($match) >0)
{$title=strip_tags($match[1]);
}preg_match_all('/"]*)"?[\s]*'.'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si',$contents,$match);
if (isset($match) &&is_array($match) &&count($match) ==3)
{$originals=$match[0];$names=$match[1];$values=$match[2];
if (count($originals) ==count($names) &&count($names) ==count($values))
{$metaTags= array();
for ($i=0,$limiti=count($names);$i
{$metaTags[$names[$i]] = array ('html'=>htmlentities($originals[$i]),'value'=>$values[$i]
);
}
}
}$result= array ('title'=>$title,'metaTags'=>$metaTags);
}
return$result;
}
functiongetUrlContents($url,$maximumRedirections=null,$currentRedirection=0)
{$result=false;$contents= @file_get_contents($url);// Check if we need to go somewhere elseif (isset($contents) &&is_string($contents))
{preg_match_all('/"]*)"?'.'[\s]*[\/]?[\s]*>/si',$contents,$match);
if (isset($match) &&is_array($match) &&count($match) ==2&&count($match[1]) ==1)
{
if (!isset($maximumRedirections) ||$currentRedirection
{
returngetUrlContents($match[1][0],$maximumRedirections, ++$currentRedirection);
}$result=false;
}
else
{$result=$contents;
}
}
return$contents;
}?>
Here's an example of its usage. Check that the included URL has a META REFRESH redirection:
$result=getUrlData();
echo'
';print_r($result); echo'';?>
For the above code the output would be:
(
[title] =>Mariano Iglesias:El Eternauta[metaTags] => Array
(
[description] => Array
(
[html] =>
[value] =>Java,PHP, andsome other technological mumble jumble.Also,some real-life stuffaswell.
)
[DC.title] => Array
(
[html] =>
[value] =>Mariano Iglesias-Weblog)
[ICBM] => Array
(
[html] =>
[value] => -34.6017, -58.3956)
[geo.position] => Array
(
[html] =>
[value] => -34.6017;-58.3956)
[geo.region] => Array
(
[html] =>
[value] =>AR-BA)
[geo.placename] => Array
(
[html] =>
[value] =>Buenos Aires)
)
)?>