sphinx xmlpipe2 php,php xmlpipe2

In the last article we successfully created a PHP class that outputs XML as input for Sphinx’s indexer. However it was incredibly inefficient as we had to hold everything in memory. Here is an updated class that extends XMLWriter, which is a built in PHP class that is essentially undocumented and works great for creating memory efficient streams of XML data. Rather than keeping each document in memory, XMLWriter will allow us to immediately flush that document’s XML elements to standard output.

/*

* SphinxXMLFeed - efficiently generate XML for Sphinx's xmlpipe2 data adapter

* (c) 2009 Jetpack LLC http://jetpackweb.com

*/

class SphinxXMLFeed extends XMLWriter

{

private $fields = array();

private $attributes = array();

public function __construct($options = array())

{

$defaults = array(

'indent' => false,

);

$options = array_merge($defaults, $options);

// Store the xml tree in memory

$this->openMemory();

if($options['indent']) {

$this->setIndent(true);

}

}

public function setFields($fields) {

$this->fields = $fields;

}

public function setAttributes($attributes) {

$this->attributes = $attributes;

}

public function addDocument($doc) {

$this->startElement('sphinx:document');

$this->writeAttribute('id', $doc['id']);

foreach($doc as $key => $value) {

// Skip the id key since that is an element attribute

if($key == 'id') continue;

$this->startElement($key);

$this->text($value);

$this->endElement();

}

$this->endElement();

print $this->outputMemory();

}

public function beginOutput() {

$this->startDocument('1.0', 'UTF-8');

$this->startElement('sphinx:docset');

$this->startElement('sphinx:schema');

// add fields to the schema

foreach($this->fields as $field) {

$this->startElement('sphinx:field');

$this->writeAttribute('name', $field);

$this->endElement();

}

// add attributes to the schema

foreach($this->attributes as $attributes) {

$this->startElement('sphinx:attr');

foreach($attributes as $key => $value) {

$this->writeAttribute($key, $value);

}

$this->endElement();

}

// end sphinx:schema

$this->endElement();

print $this->outputMemory();

}

public function endOutput()

{

// end sphinx:docset

$this->endElement();

print $this->outputMemory();

}

}

We can use it as follows:

$doc = new SphinxXMLFeed();

$doc->setFields(array(

'title',

'teaser',

'content',

));

$doc->setAttributes(array(

array('name' => 'blog_id', 'type' => 'int', 'bits' => '16', 'default' => '0'),

));

$doc->beginOutput();

foreach(range(1, 1000) as $id) {

$doc->addDocument(array(

'id' => $id,

'blog_id' => rand(1, 10),

'title' => "Article Part {$id}",

'teaser' => "Article {$id} teaster",

'content' => "Article {$id} content",

));

}

$doc->endOutput();

As you can see the first thing we need to do is populate the fields and attributes. Once that is done, we call beginOutput, that will create the head of the XML document. After each document is added, the document’s xml markup is immediately outputted and the memory buffer is cleared.

Finally we call endOutput, which will close the sphinx:docset element.

I have used this class in production to index millions of records that take up dozens of gigabytes. Keep in mind if you are working with that much data, you will probably need to bach your queries so you are not loading all the records at once!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值