lorem公式_编码Lorem输液

最新推荐文章于 2021-06-03 03:38:25 发布

culi3182

最新推荐文章于 2021-06-03 03:38:25 发布

阅读量2.2k

点赞数

文章标签：数据库算法 python java mysql

原文链接：https://www.sitepoint.com/coding-a-lorem-ipsum-alternative/

版权

lorem公式

Lorem Ipsum generators are well known and are useful for generating text copy during website development. And if you want something that’s a little more to your own taste than pseudo-Latin, SitePoint recently published an article by Craig Buckler which presents ten of the best alternatives to the tried and tested original.

Lorem Ipsum生成器是众所周知的，可用于在网站开发过程中生成文本副本。而且，如果您想要的东西比伪拉丁文更符合您的口味，SitePoint最近发布了克雷格·巴克勒(Craig Buckler)撰写的文章，其中提供了十种经过尝试和测试的原著的最佳选择。

It’s good that we have a wide selection of text generators, but how exactly are these generators made? Can we use PHP and MySQL to build our own? That’s exactly what we’ll tackle in this article. We won’t develop a fully working website; what we will cover are the essentials for building a site such as Fillerati.

我们可以选择多种文本生成器，这很好，但是这些生成器究竟是如何制作的？我们可以使用PHP和MySQL来构建自己的吗？这正是我们在本文中要解决的问题。我们不会开发一个可以正常运行的网站；我们将介绍的是建设Fillerati这样的站点的基本要素。

采购和提取段落 (Sourcing and Extracting Paragraphs)

The project is grouped into just three tasks: sourcing the text content, storing it in a database, and giving front-end access to the content. We’ll take each of these in turn, starting with finding content, and where better to start than Project Gutenberg? Gutenberg offers thousands of public domain texts in various languages, all completely free.

该项目仅分为三个任务：获取文本内容，将其存储在数据库中以及对内容进行前端访问。我们将依次查找所有这些内容，从查找内容开始，还有什么比Gutenberg项目更好的起点呢？古腾堡(Gutenberg)提供数千种各种语言的公共领域文本，而且完全免费。

Unfortunately the HTML formatting is not consistent throughout Gutenberg’s publications; that’s not a criticism of the project, rather it’s an aspect of working with their HTML that we need to be aware of. Some paragraph elements don’t contain useful text at all – they are used merely as spacing between paragraphs. Some paragraphs may be too long for the purpose of providing dummy copy. These are details that we’ll need to code around.

不幸的是，HTML格式在古腾堡的所有出版物中都不一致。这不是对该项目的批评，而是我们需要意识到的使用其HTML的一个方面。有些段落元素根本不包含有用的文本–它们仅用作段落之间的间距。为了提供虚拟副本，某些段落可能太长。这些是我们需要编写代码的细节。

Why choose HTML rather than plain text if the formatting isn’t consistent? Simple: the HTML version contains markup that identifies paragraphs, and paragraphs are at the heart of this project. It’s not quite as easy as scanning a stream of text for <p> and </p> tags, but it gives us a good head start.

如果格式不一致，为什么选择HTML而不是纯文本？简单：HTML版本包含标识段落的标记，而段落是该项目的核心。它不像扫描文本流中的<p>和</p>标签那样容易，但是它为我们提供了一个很好的起点。

Data gathering won’t happen often, so we can afford ourselves the luxury of loading the entire file into memory so it’s easier to search for tags and process the text. I’ve selected the HTML copy of On The Origin Of Species by Charles Darwin.

数据收集不会经常发生，因此我们可以负担得起自己将整个文件加载到内存中的奢望，从而更轻松地搜索标签和处理文本。我选择了Charles Darwin撰写的《物种起源》HTML副本。

Once you’ve downloaded the HTML file, it’s a good idea to open it in an editor and peruse the code to see what we’re up against. We can ignore everything before the first chapter heading on line 426, and the whitespace I mentioned earlier should be removed to make processing easier.

下载完HTML文件后，最好在编辑器中将其打开，然后仔细阅读代码以了解我们遇到的问题。我们可以忽略第一章以426行开头的所有内容，并且应该删除我前面提到的空白以使处理更加容易。

The following is a simple approach for extracting and cleaning the text; it’s a function that’s called in a loop to scan the file and extract paragraphs. Such a loop doesn’t need to be complex.

以下是提取和清除文本的简单方法；该函数在循环中被调用以扫描文件并提取段落。这样的循环不必太复杂。

<?php
function extractContent($tag, $html) {
    $closeTag = substr($tag, 0, 1) . '/' . substr($tag, 1, 3);
    $startPos = strlen($tag);
    $endPos = strpos($html, $closeTag);
    $text = substr($html, $startPos, $endPos - $startPos);
    return array($closeTag, trim(preg_replace('/(\s){2,}/', ' ', $text)));
}

$html = file_get_contents($htmlFile);
$limits = array('min' => 200, 'max' => 2000);
$tag = '<p>';
$paragraphs = array();

$i = 0;
while (($pos = strpos($html, $tag, $i)) !== false) {
    list($closeTag, $text) = extractContent($tag, substr($html, $pos));
    // keep the content if it's a suitable size
    $len = strlen($text);
    if ($len >= $limits['min'] && $len <= $limits['max']) {
        $paragraphs[] = $text;
    }
    $i = $pos + strlen($tag) + strlen($text) + strlen($closeTag);
}

A complete book can be scanned for usable paragraphs with minimal coding. This includes a simple check on the size of the paragraph to eliminate anything that’s either too small or too large. This test has the additional benefit of eliminating tags that are used for spacing. To be sure what we have is useful, you can display a sample in the browser or write it to a log file.

可以用最少的编码来扫描完整的书籍中可用的段落。这包括对段落大小的简单检查，以消除太小或太大的任何内容。此测试的另一个好处是消除了用于间隔的标签。为了确保我们所拥有的有用，您可以在浏览器中显示样本或将其写入日志文件。

填充数据库 (Populate the Database)

The next step is to store these paragraphs in a database. Keep in mind that we’re building the barebones of a Lorem Ipsum system, so there’s no need for a database design like this:

下一步是将这些段落存储在数据库中。请记住，我们正在构建Lorem Ipsum系统的准系统，因此不需要这样的数据库设计：

All we really need is one table:

我们真正需要的只是一张桌子：

CREATE TABLE paragraphs (
    id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
    content MEDIUMTEXT NOT NULL,
    PRIMARY KEY (id)
)

In the interest of efficiency, I’ve chosen a suitably sized data type for both the id and content fields. For a large, fully-functional database that stores many publications, you may want to use the INTEGER and TEXT data types.

为了提高效率，我为id和content字段选择了大小合适的数据类型。对于存储许多出版物的功能齐全的大型数据库，您可能需要使用INTEGER和TEXT数据类型。

Now we can insert the paragraphs that we extracted from the HTML file into the database.

现在，我们可以将从HTML文件中提取的段落插入数据库。

<?php
$db = new PDO(DBDSN, DBUSER, DBPASS);

$query = $db->prepare('INSERT INTO paragraphs (content) VALUES (:content)');
$query->bindParam(':content', $content);
foreach ($paragraphs as $content) {
    $query->execute();
}

Depending on the collating sequence you’ve chosen to use for your database, you may need to apply a conversion to the paragraph strings. This is a niggle of using a third-party data source like Gutenberg – there’s no guarantee that the text uses the same collating sequence as your database. Check the string functions and multi-byte string functions that are available in the PHP manual that may be needed.

根据您选择用于数据库的整理顺序，您可能需要对段落字符串进行转换。这是使用诸如Gutenberg之类的第三方数据源的麻烦-不能保证文本使用与数据库相同的整理顺序。检查PHP手册中可能需要的字符串函数和多字节字符串函数。

一个简单的前端 (A Simple Front-End)

The final step is to access these paragraphs using a front-end in a browser. How the front-end should provide access to the data is limited only by our imaginations. For example, we could retrieve a certain number of paragraphs, or a specified quantity of text, or perhaps a number of characters rounded to the nearest paragraph. We could select consecutive paragraphs, or perhaps we’d be happy with random paragraphs. Whatever we choose, we need a function to read the table.

最后一步是使用浏览器中的前端访问这些段落。前端如何提供对数据的访问仅受我们的想象力限制。例如，我们可以检索一定数量的段落，或指定数量的文本，或者取回四舍五入到最接近的段落的字符。我们可以选择连续的段落，或者对随机的段落感到满意。无论选择什么，我们都需要一个函数来读取表。

<?php
function selectParagraph($db, $id) {
    $query = sprintf('SELECT content FROM paragraphs WHERE id = %d', $id);
    $result = $pdo->query($sql);
    $row = $result->fetch(PDO::FETCH_ASSOC);
    $result->closeCursor();
    return $row['content'];
}

For demonstration purposes, the algorithm I’ll present uses a simple random number generator to select paragraphs from the database. It needs to know the maximum ID value for the paragraph records, hence the $maxID variable (this assumes the ID values are contiguous).

出于演示目的，我将介绍的算法使用一个简单的随机数生成器从数据库中选择段落。它需要知道段落记录的最大ID值，因此要知道$maxID变量(假定ID值是连续的)。

<form method="post">
 <label for="slider">How many paragraphs do you want?</label>
 <input type="range" min="1" max="4" step="1" name="slider">
 <input type="submit" name="submit" value="Get Excerpt">
</form>
<?php
if (isset($_POST['slider'])) {
    $i = $_POST['slider'];
    while ($i--) {
        $id = rand(1, $maxID);
        $paragraph = selectParagraph($db, $id);
        echo '<p>' . $paragraph . '</p>';
    }
}

And that’s the final piece of the project!

这是项目的最后一步！

摘要 (Summary)

In this article we’ve covered the essential aspects of building an alternative to the popular Lorem Ipsum text generator. How complex we make it, how many publications and authors we include, how stylish we make the front-end, and whether we limit our choice of text to a specific genre, is entirely open to personal choice. But the essential elements will all be similar to what we’ve covered here, and all built using a smattering of PHP and MySQL. Easy!

在本文中，我们介绍了构建替代流行的Lorem Ipsum文本生成器的替代方法的基本方面。我们制作的复杂程度，收录的出版物和作者的数量，前端制作的风格如何以及是否将文字选择限制为特定类型，完全取决于个人选择。但是基本元素都将与我们在此介绍的内容相似，并且全部使用少量PHP和MySQL构建。简单！

Code to accompany this article can be found on GitHub. Feel free to clone it expand on it.

可以在GitHub上找到本文附带的代码。随意克隆它就可以了。

Image via Fotolia

图片来自Fotolia

翻译自: https://www.sitepoint.com/coding-a-lorem-ipsum-alternative/

lorem公式

culi3182

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
lorem公式_编码Lorem输液

lorem公式Lorem Ipsum generators are well known and are useful for generating text copy during website development. And if you want something that’s a little more to your own taste than pseudo-Latin, Sit...
复制链接

扫一扫