sphinx xmlpipe2 php,php xmlpipe2

最新推荐文章于 2021-03-11 22:49:36 发布

weixin_39955938

最新推荐文章于 2021-03-11 22:49:36 发布

阅读量124

点赞数

文章标签： sphinx xmlpipe2 php

In the last article we successfully created a PHP class that outputs XML as input for Sphinx’s indexer. However it was incredibly inefficient as we had to hold everything in memory. Here is an updated class that extends XMLWriter, which is a built in PHP class that is essentially undocumented and works great for creating memory efficient streams of XML data. Rather than keeping each document in memory, XMLWriter will allow us to immediately flush that document’s XML elements to standard output.

* SphinxXMLFeed - efficiently generate XML for Sphinx's xmlpipe2 data adapter

class SphinxXMLFeed extends XMLWriter

{

private $fields = array();

private $attributes = array();

public function __construct($options = array())

{

$defaults = array(

'indent' => false,

);

$options = array_merge($defaults, $options);

// Store the xml tree in memory

$this->openMemory();

if($options['indent']) {

$this->setIndent(true);

}

public function setFields($fields) {

$this->fields = $fields;

}

public function setAttributes($attributes) {

$this->attributes = $attributes;

}

public function addDocument($doc) {

$this->startElement('sphinx:document');

$this->writeAttribute('id', $doc['id']);

foreach($doc as $key => $value) {

// Skip the id key since that is an element attribute

if($key == 'id') continue;

$this->startElement($key);

$this->text($value);

$this->endElement();

}

$this->endElement();

print $this->outputMemory();

}

public function beginOutput() {

$this->startDocument('1.0', 'UTF-8');

$this->startElement('sphinx:docset');

$this->startElement('sphinx:schema');

// add fields to the schema

foreach($this->fields as $field) {

$this->startElement('sphinx:field');

$this->writeAttribute('name', $field);

$this->endElement();

}

// add attributes to the schema

foreach($this->attributes as $attributes) {

$this->startElement('sphinx:attr');

foreach($attributes as $key => $value) {

$this->writeAttribute($key, $value);

}

$this->endElement();

}

// end sphinx:schema

$this->endElement();

print $this->outputMemory();

}

public function endOutput()

{

// end sphinx:docset

$this->endElement();

print $this->outputMemory();

}

We can use it as follows:

$doc = new SphinxXMLFeed();

$doc->setFields(array(

'title',

'teaser',

'content',

));

$doc->setAttributes(array(

array('name' => 'blog_id', 'type' => 'int', 'bits' => '16', 'default' => '0'),

));

$doc->beginOutput();

foreach(range(1, 1000) as $id) {

$doc->addDocument(array(

'id' => $id,

'blog_id' => rand(1, 10),

'title' => "Article Part {$id}",

'teaser' => "Article {$id} teaster",

'content' => "Article {$id} content",

));

}

$doc->endOutput();

As you can see the first thing we need to do is populate the fields and attributes. Once that is done, we call beginOutput, that will create the head of the XML document. After each document is added, the document’s xml markup is immediately outputted and the memory buffer is cleared.

Finally we call endOutput, which will close the sphinx:docset element.

I have used this class in production to index millions of records that take up dozens of gigabytes. Keep in mind if you are working with that much data, you will probably need to bach your queries so you are not loading all the records at once!

weixin_39955938

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
sphinx xmlpipe2 php,php xmlpipe2

In thelast articlewe successfully created a PHP class that outputs XML as input for Sphinx’s indexer. However it was incredibly inefficient as we had to hold everything in memory. Here is an updated...
复制链接

扫一扫

sphinx xmlpipe2 php,php xmlpipe2

“相关推荐”对你有帮助么？