sitemap 在线生成

sitemap(网站地图)

1、sitemap什么用,为什么要用到这个?

Sitemap 是一个网站的全部URL列表,应该自动不定期更新这个列表,以使得使用 sitemap 的第三方搜索引擎、订阅软件可以即时发现你网站中的新的URL。 Sitemap 是SEO中的首要任务,网站站长应向搜索引擎定期提交更新的URL列表,这就是网站地图 (Sitemap) ,以让搜索引擎可以全面获得网站的网址信息和即时更新信息。 因此sitemap对于一个网站来说,它是十分重要的,同时定期更新网站地图,也是必不可少的环节。有些网站,内容增加了不少,但站点地图还是很老旧的,这样使用站点地图的软件就难以快速发现自己网站中的新增的网址。

通俗点讲,sitemap是网站地图,就是网站全部链接的集合页面,有利于百度/谷歌抓取和收录

2、文档目录:

配置文件       - config/config.ini.php
sitemap主文件  - SiteMap.class.php

3、主文件代码

主文件代码

<?php
    /**
     * the script's main function is to help us to generate the target web's sitemap.xml file 
     *
     * @category sitemap
     * @version 1.0
     */
    namespace Maweibinguo\SiteMap;
    class SiteMap
    {
        const SCHEMA = 'http://www.sitemaps.org/schemas/sitemap/0.9';
 
        /**
         * @var webUrlList
         * @access public
         */
        public $webUrlList = array();
 
        /**
         * @var siteMapList
         * @access public
         */
        public $siteMapList = array();
 
        /**
         * @var isUseCookie
         * @access public
         */
        public $isUseCookie = false;
 
        /**
         * @var cookieFilePath
         * @access public
         */
        public $cookieFilePath = '';
 
        /**
         * @var xmlWriter
         * @access private
         */
        private $_xmlWriter = '';
 
        /**
         * init basic config
         *
         * @access public
         */
        public function __construct()
        {
            $this->_xmlWriter = new \XMLWriter();
 
            $result = $this->_enviromentTest();
        }
 
        /**
         * test the enviroment for the script 
         *
         * @access pirvate
         */
        private function _enviromentTest()
        {
            $sapiType = \php_sapi_name ();
            if( strtolower($sapiType) != 'cli' ) {
                echo ' The Script Must Run In Command Lines ', "\r\n";
                   exit();    
            }
        }
 
        /**
         * load the configValue for genrating sitemap by configname
         *
         * @param string $configName
         * @return string $configValue
         * @access public
         */
        public function loadConfig($configName)
        {
            /* init return value */
            $configValue = '';
 
            /* load config value */
            $configPath = __DIR__ . '/config/config.ini.php';
            if(file_exists( $configPath )) {
                require $configPath;
            } else {
                echo "Can not find config file", "\r\n";
                exit();    
            }
            $configValue = $$configName;
 
            /* return config value */
            return $configValue;
        }
 
        /**
         * generate sitemap.xml for the web
         *
         * @param siteMapList
         * @access public
         */
        public function generateSiteMapXml($siteMapList)
        {
            /* init return result */
            $result = false;
            if( !is_array($siteMapList) || count($siteMapList) <= 0 ) {
                echo 'The SiteMap Cotent Is Empty',"\r\n";
                exit();
            }
 
            /* check the parameter */
            $siteMapPath = $this->loadConfig('SITEMAPPATH');
            if(!file_exists($siteMapPath)) {
                $commandStr = "touch ${siteMapPath}";
                exec($commandStr);
            }
            if( !is_writable($siteMapPath) ) {
                echo 'Is Not Writeable',"\r\n";
                exit();
            }
            $this->_xmlWriter->openURI($siteMapPath);
            $this->_xmlWriter->startDocument('1.0', 'UTF-8');
            $this->_xmlWriter->setIndent(true);
            $this->_xmlWriter->startElement('urlset');
            $this->_xmlWriter->writeAttribute('xmlns', self::SCHEMA);
            foreach($siteMapList as $siteMapKey => $siteMapItem) {
                $this->_xmlWriter->startElement('url');
                $this->_xmlWriter->writeElement('loc',$siteMapItem['Url']);
                $this->_xmlWriter->writeElement('title',$siteMapItem['Title']);
                $changefreq = !empty($siteMapItem['ChangeFreq']) ? $siteMapItem['ChangeFreq'] : 'Daily';
                $this->_xmlWriter->writeElement('changefreq',$changefreq);
                $priority = !empty($siteMapItem['Priority']) ? $siteMapItem['Priority'] : 0.5;
                $this->_xmlWriter->writeElement('priority',$priority);
                $this->_xmlWriter->writeElement('lastmod',date('Y-m-d',time()));                
                $this->_xmlWriter->endElement();
            }
            $this->_xmlWriter->endElement();
 
            /* return return */
            return $result;
        }
 
        /**
         * start to send request to the target url, and get the reponse 
         *
         * @param string $targetUrl
         * @return mixed $returnData 
         * @access public
         */
        public function sendRequest($url)
        {
            /* init return value */
            $responseData = false;
 
            /* check the parameter */
            if( !filter_var($url, FILTER_VALIDATE_URL) ) {
                return $responseData;
            }
            $connectTimeOut = $this->loadConfig('CURLOPT_CONNECTTIMEOUT');
            if( $connectTimeOut === false ) {
                return $responseData;
            }
            $timeOut = $this->loadConfig('CURLOPT_TIMEOUT');
            if( $timeOut === false ) {
                return $responseData;
            }
 
            $handle = curl_init();
            curl_setopt($handle, CURLOPT_URL, $url);
            curl_setopt($handle, CURLOPT_HEADER, false);
            curl_setopt($handle, CURLOPT_AUTOREFERER, true);
            curl_setopt($handle, CURLOPT_RETURNTRANSFER , true);
            curl_setopt($handle, CURLOPT_CONNECTTIMEOUT, $connectTimeOut);
            curl_setopt($handle, CURLOPT_TIMEOUT, $timeOut);
            curl_setopt($handle, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; MSIE 5.01; Windows NT 5.0)" );
            $headersItem = array(    'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
                                    'Connection: Keep-Alive'     );
            curl_setopt($handle, CURLOPT_HTTPHEADER, $headersItem);
            curl_setopt($handle, CURLOPT_FOLLOWLOCATION, 1);
 
            $cookieList = $this->loadConfig('COOKIELIST');
            $isUseCookie = $cookieList['IsUseCookie'];
            $cookieFilePath = $cookieList['CookiePath'];
            if($isUseCookie) {
                if(!file_exists($cookieFilePath)) {
                    $touchCommand = " touch {$cookieFilePath} ";
                    exec($touchCommand);
                }
                curl_setopt($handle, CURLOPT_COOKIEFILE, $cookieFilePath);
                curl_setopt($handle, CURLOPT_COOKIEJAR, $cookieFilePath);
            }
            $responseData = curl_exec($handle);
            $httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
            if($httpCode != 200) {
                $responseData = false;
            }
            curl_close($handle);
 
            /* return response data */
            return $responseData;
        }
 
        /**
         * get the sitemap content of the url, it contains url, title, priority, changefreq
         *
         * @param string $url 
         * @access public
         */
        public function generateSiteMapList($url)
        {
            $content = $this->sendRequest($url);
 
            if($content !== false) {
                $tagsList = $this->_parseContent($content, $url);
                $urlItem = $tagsList['UrlItem'];
                $title = $tagsList['Title'];
 
                $siteMapItem = array(    'Url' => trim($url),
                                        'Title' => trim($title)    );
                $priority = $this->_calculatePriority($siteMapItem['Url']);
                $siteMapItem['Priority'] = $priority;
                $changefreq = $this->_calculateChangefreq($siteMapItem['Url']);
                $siteMapItem['ChangeFreq'] = $changefreq;
 
                $this->siteMapList[] = $siteMapItem;            
                foreach($urlItem as $nextUrl) {
                    if( !in_array($nextUrl, $this->webUrlList) ) {
                        $skipUrlList = $this->loadConfig('SKIP_URLLIST');
                        foreach($skipUrlList as $keyWords) {
                            if( stripos($nextUrl, $keyWords) !== false ) {
                                continue 2;
                            }
                        }
                        $this->webUrlList[] = $nextUrl;
                        echo $nextUrl,"\r\n";
                        $this->generateSiteMapList($nextUrl);
                    }
                }
            }
        }
 
        /**
         *teChangefreq get sitemaplist of the web
         *
         * @access public
         * @return array $siteMapList
         */
        public function getSiteMapList()
        {
            return $this->siteMapList;
        }
 
        /**
         * calate the priority of the targeturl
         *
         * @param string $targetUrl
         * @return float $priority
         * @access private
         */
        private function _calculatePriority($targetUrl)
        {
            /* init priority */
            $priority = 0.5;
 
            /* calculate the priority */
            if( filter_var($targetUrl, FILTER_VALIDATE_URL) ) {
                $priorityList = $this->loadConfig('PRIORITYLIST');
                foreach($priorityList as $priorityKey => $priorityValue) {
                    if(stripos($targetUrl, $priorityKey) !== false) {
                        $priority = $priorityValue;
                        break;
                    }
                }
            }
 
            /* return priority */
            return $priority;
        }
 
        /**
         * calate the changefreq of the targeturl
         *
         * @param string $targetUrl
         * @return float $changefreq
         * @access private
         */
        private function _calculateChangefreq($targetUrl)
        {
            /* init changefreq*/
            $changefreq = 'Daily';
 
            /* calculate the priority */
            if( filter_var($targetUrl, FILTER_VALIDATE_URL) ) {
                $changefreqList = $this->loadConfig('CHANGEFREQLIST');
                foreach($changefreqList as $changefreqKey => $changefreqValue) {
                    if(stripos($targetUrl, $changefreqKey) !== false) {
                        $changefreq = $changefreqValue;
                        break;
                    }
                }
            }
 
            /* return priority */
            return $changefreq;
        }
 
        /**
         * format url 
         * 
         * @param $url
         * @param $orginUrl
         * @access private
         * @return $formatUrl
         */
        private function _formatUrl($url, $originUrl)
        {
            /* init url */
            $formatUrl = '';
 
            /* format url */
            if( !empty($url) && !empty($originUrl) ) {
                $badUrlItem = array(    '\\', 
                                        '/' , 
                                        'javascript',
                                        'javascript:;',
                                        ''    );
                $formatUrl = trim($url);
                   $formatUrl = trim($formatUrl, '#');
                $formatUrl = trim($formatUrl, '\'');
                $formatUrl = trim($formatUrl, '"');
                if(stripos($formatUrl, 'http') === false && !in_array($formatUrl, $badUrlItem)) {
                    if(strpos($formatUrl, '/') === 0) {
                        $domainName = $this->loadConfig('DOMAIN_NAME');    
                        $formatUrl = $domainName . trim($formatUrl, '/');
                    } else {
                        $formatUrl = substr( $originUrl, 0, strrpos($originUrl, '/') ) .'/'. $formatUrl;
                    }
                } elseif( stripos($formatUrl, 'http') === false && in_array($formatUrl, $badUrlItem) ) {
                    $formatUrl = '';
                }
            }
 
            /* return url */
            return $formatUrl;
        }
 
        /**
         * check domain is right
         * 
         * @param $url
         * @return $url
         * @access private
         */
        private function _checkDomain($url)
        {
            /* init url */
            $result = false;
 
            /* check domain */
            if($url) {
                $domainName = $this->loadConfig('DOMAIN_NAME');
                if( stripos($url, $domainName) === false ) {
                    return $result;
                }
                $result = true;
            }
        
            /* return url */
            return $result;
        }
 
        /**
         * parse the response content, so that we can get the urls
         *
         * @param string $content
         * @param string $originUrl
         * @return array $urlItem
         * @access public
         */
        public function _parseContent($content, $originUrl)
        {
            /* init return data */
            $tagsList = array();
 
            /* start parse */
            if( !empty($content) && !empty($originUrl) ) {
                $domainName = $this->loadConfig('DOMAIN_NAME');
 
                /* get the attribute of href for tags <a> */
                $regStrForTagA = '#<\s*a\s+href\s*=\s*(".*?"|\'.*?\')#um';
                if( preg_match_all($regStrForTagA, $content, $matches) ) {
                    $urlItem = array_unique($matches[1]);
                    foreach($urlItem as $urlKey => $url) {
                        $formatUrl = $this->_formatUrl($url, $originUrl);
                        if( empty($formatUrl) ) {
                            unset($urlItem[$urlKey]);
                            continue;
                        }
 
                        $result = $this->_checkDomain($formatUrl);
                        if($result === false) {
                            unset($urlItem[$urlKey]);
                            continue;
                        }
                        $urlItem[$urlKey] = $formatUrl;
                    }
                }
 
                $tagsList['UrlItem'] = $urlItem;
 
                /* get the title tags content */
                $regStrForTitle = '#<\s*title\s*>(.*?)<\s*\/\s*title\s*>#um';
                if( preg_match($regStrForTitle, $content, $matches) ) {
                    $title = $matches[1];    
                }
                $tagsList['Title'] = $title;
 
            }
 
            /* return tagsList */
            return $tagsList;
        }
    }
 
    /* here is a example */
 
    $startTime = microtime(true);
    echo "/***********************************************************************/","\r\n";
    echo "/*                    start to run {$startTime}                        */","\r\n";
    echo "/***********************************************************************/","\r\n\r\n";
 
    $siteMap = new SiteMap();
    $domain = $siteMap->loadConfig('DOMAIN_NAME');
    $siteMap->generateSiteMapList($domain);
    $siteMapList = $siteMap->getSiteMapList();
    $siteMap->generateSiteMapXml($siteMapList);
 
    $endTime = microtime(true);
    $takeTime = $endTime - $startTime;
    echo "/***********************************************************************/","\r\n";
    echo "/*               Had Done, \t it total take {$takeTime}      */","\r\n";
    echo "/***********************************************************************/","\r\n";
?> 

配置文件代码

<?php
    //curl连接时间
    $CURLOPT_CONNECTTIMEOUT = 5;
 
    //curl请求超时时间
    $CURLOPT_TIMEOUT = 10;
 
    //域名(需要获取数据的域名)
    $DOMAIN_NAME = 'http://www.example.com/';
 
    //设置跳过的地址关键字(域名中带有这些关键词的都过滤掉,不记录下来)
    $SKIP_URLLIST = array(
        'addtocart'
    );
 
    //设置cookie
    $COOKIELIST = array(
        'IsUseCookie' => true,
        'CookiePath' => '/tmp/sitemapcookie'
    );
 
    //sitemap文件的保存地址
    $SITEMAPPATH = './sitemap.xml';
 
    //根据连接关键字设置priority(此数据的重要性)
    $PRIORITYLIST = array(
        'product' => '0.8',
        'device' => '0.6',
        'intelligent' => '0.4',
        'course' => '0.2'
    );
 
    //根据连接关键字设置CHANGEFREQ(此数据的更新频率)
    $CHANGEFREQLIST = array(
        'product' => 'Always',
        'device' => 'Hourly',
        'intelligent' => 'Daily',
        'course' => 'Weekly',
        'login' => 'Monthly',
        'about' => 'Yearly'
    );
?>

文件中的大致内容

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 <url>
  <loc>http://www.xxx.com/</loc>
  <title>fadsfsd浮动发送扥阿飞啊大 撒扥森阿斯扥!</title>
  <changefreq>Yearly</changefreq>
  <priority>0.5</priority>
  <lastmod>2020-05-25</lastmod>
 </url>
 <url>
  <loc>http://www.xxx.com/#</loc>
  <title>花萼让团卷共扥广泛僧结婚!</title>
  <changefreq>Yearly</changefreq>
  <priority>0.5</priority>
  <lastmod>2020-05-25</lastmod>
 </url>
</urlset>

4、程序逻辑

该操作是利用 PHPcurl 来进行抓取操作的.

  1. 获取 config.ini.php 配置文件的域名,然后对其进行页面抓取(整个页面内容获取)
  2. 把获取过来的页面title/当前url/页面url存放到数组中
  3. 利用当前url中的匹配配置文件文件的优先级和更新常态,也一并记录到数组中
  4. 然后利用 子url 在一层一层的递归下去,[这一步可能会出现的问题:]
  5. 最后全部采集完成,在吧这些数据转存到.xml文件中
[PHP Fatal error:  Uncaught Error: Maximum function nesting level of '256' reached, aborting]
[修改php.ini即可](http://www.04007.cn/article/757.html)

5、注意点

  1. php的level记得要调高一点,不然递归执行不下去
  2. 需要用命令模式去执行代码
  3. 可以设置定时任务,定时的去执行
  4. 最后生成的文件,记得给他可操作/修改的权限.

其他

在线生成

  • XML-Sitemaps 免费500个页面国外网站交钱的话可以很棒
  • 网站地图制作建议采用这个:暂时还不清楚总共能获取多少,但是我现在能捕获到全部1100
  • 免费站点地图生成器免费5000个国外网站需要注册高级帐户最多25000个
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
生成Sitemap有很多方法,下面介绍一种基于PHP的方法: 1. 首先,创建一个sitemap.xml文件,其中包含您要包含在站点地图中的所有URL。例如: ``` <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://example.com/page1</loc> <lastmod>2021-10-01</lastmod> </url> <url> <loc>https://example.com/page2</loc> <lastmod>2021-10-02</lastmod> </url> ... </urlset> ``` 2. 创建一个PHP脚本,该脚本将读取您的站点地图文件并将其发送到搜索引擎,以便它们了解您的站点的结构。以下是一个示例脚本: ``` <?php // 设置站点地图文件的URL $sitemapUrl = 'https://example.com/sitemap.xml'; // 从站点地图文件中读取URL $xml = simplexml_load_file($sitemapUrl); $urls = $xml->xpath('//url/loc'); // 循环遍历每个URL并发送到搜索引擎 foreach($urls as $url) { pingSearchEngines((string)$url); } // 发送URL到搜索引擎 function pingSearchEngines($url) { $searchEngines = array( 'http://www.google.com/ping?sitemap=', 'http://www.bing.com/ping?sitemap=', 'http://www.ask.com/ping?sitemap=', 'http://www.submitexpress.com/ping?sitemap=', ); foreach($searchEngines as $engine) { $pingUrl = $engine . urlencode($url); $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $pingUrl); curl_setopt($ch, CURLOPT_HEADER, 0); curl_exec($ch); curl_close($ch); } } ?> ``` 3. 将该脚本保存为sitemap-ping.php,并将其放置在您的网站的根目录中。 4. 设置一个cron任务,每天运行一次sitemap-ping.php脚本,以确保您的站点地图始终是最新的。您可以使用以下命令将任务添加到cron: ``` 0 0 * * * /usr/bin/php /path/to/sitemap-ping.php >/dev/null 2>&1 ``` 这将在每天午夜运行sitemap-ping.php脚本。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值