建立新闻内容的数据库_建立在线新闻发布系统的方法

建立新闻内容的数据库

News has always been a very significant part of our society. In the past, we mostly depended on the news channels and newspapers to get our feeds and keep ourselves updated. Currently, in the fast-paced world, news media and agencies have started using the internet to reach the readers. The venture has proven to be very helpful as it has allowed the houses to extend their reach among readers.

新闻一直是我们社会非常重要的一部分。 过去,我们主要依靠新闻频道和报纸来获取供稿并保持最新状态。 当前,在快节奏的世界中,新闻媒体和代理机构已开始使用互联网来吸引读者。 事实证明,这项冒险活动非常有帮助,因为它使房屋可以扩大读者的视野。

In the present world, there are numerous media outlets, so, it can be easily established that it is impossible for a person to go and gather news from all the outlets, owing to the busy life schedules. Besides, each media outlet covers each story differently. Some readers like to compare stories and read the same story from multiple houses to get the full idea of an event. All these requirements are solved by a type of application that is gaining popularity currently, Online News Distribution applications. These applications aim to gather news from multiple sources and provide to a user as a feed. In this article, we will look at an approach toward building such an application.

在当今世界上,有许多媒体渠道,因此很容易确定,由于生活繁忙,一个人不可能从所有渠道收集新闻。 此外,每个媒体都以不同的方式报道每个故事。 一些读者喜欢比较故事,并从多个房屋中读取同一故事,以获取事件的完整信息。 所有这些要求都可以通过一种目前正在流行的应用程序(在线新闻发布应用程序)来解决。 这些应用程序旨在从多个来源收集新闻并作为提要提供给用户。 在本文中,我们将研究构建此类应用程序的方法。

想法 (The Idea)

The main component of such an application is the news of course. I have used four of the most popular media houses in India for the application, to serve as the sources. All of the media houses possess their own website, from where we scrape the headline links and the stories. We will use the extractive text summarization to extract the gist points from the stories in 3 to 5 sentences. We will store the information collected along with the sources, i.e, the names of the publishing media houses, date, time, and title of the story in datewise files. Each datewise file will give the feed of that particular date.

这种应用程序的主要组成部分当然是新闻。 我使用了印度最受欢迎的四家媒体公司作为该应用程序的资源。 所有媒体公司都拥有自己的网站,我们从中抓取标题链接和故事。 我们将使用摘录文本摘要从3到5个句子中提取故事的要点。 我们将把收集到的信息与来源一起存储,例如,发布媒体公司的名称,日期,时间和故事的标题,保存在按日期排列的文件中。 每个按日期排列的文件都将提供该特定日期的提要。

Now, we can extract another piece of information from the story title, that is the subject of the story. Each title has some relevant information, it may be the name of a person, a country, an organization, or any important topic of that time, for instance, COVID-19. The names or topics are mostly the subjects of the story. We will be extracting these words of interest from the title and we will be using them as labels or tags for the corresponding stories. We will store these labels also along with the titles in the files.

现在,我们可以从故事标题中提取另一条信息,那就是故事的主题。 每个标题都有一些相关信息,它可以是一个人,一个国家,一个组织的名称,或当时的任何重要主题,例如COVID-19。 名称或主题主要是故事的主题。 我们将从标题中提取这些感兴趣的单词,并将它们用作相应故事的标签或标记。 我们还将这些标签以及标题存储在文件中。

An app can be used by many users of different types, so, we must create a filtering or recommender mechanism to customize a user’s feed according to his/her interests. For this, we will need to create a login system, to separately record the type of stories each user reads, and recommend to him/her only based on his/her account. We will be maintaining a database that will contain the user’s name, email, phone number(optional), and password. The email will be our unique key here.

一个应用可以被许多不同类型的用户使用,因此,我们必须创建过滤或推荐机制来根据用户的兴趣来自定义其供稿。 为此,我们将需要创建一个登录系统,以分别记录每个用户阅读的故事类型,并仅根据其帐户向其推荐。 我们将维护一个包含用户名,电子邮件,电话号码(可选)和密码的数据库。 电子邮件将是我们此处的唯一密钥。

We will also be maintaining two JSON files, one to record the stories each user reads and the corresponding labels. In this case, we use the user’s email as the key. The labels will keep telling us the topics the user is interested in. The other file records the users who read a story. In this file, we form a unique key in the format:

我们还将维护两个JSON文件,一个用于记录每个用户阅读的故事以及相应的标签。 在这种情况下,我们使用用户的电子邮件作为密钥。 标签将不断告诉我们用户感兴趣的主题。另一个文件记录了阅读故事的用户。 在此文件中,我们形成以下格式的唯一键:

Publishing House+$+ Publishing Date+$+Story Title

出版社+ $ +出版日期+ $ +故事标题

This unique key will be used as the key in our JSON file. Each key will have the emails of the users who read the story. The idea behind this is, the labels attached in the user’s file to each email will allow us to do content-based recommendations, and if we use both the files together, we can create a full user-item interaction matrix, which can be used to create collaborative filtering based recommendations.

此唯一密钥将用作我们的JSON文件中的密钥。 每个密钥都将包含阅读该故事的用户的电子邮件。 其背后的想法是,用户文件中附加到每封电子邮件的标签将使我们能够进行基于内容的推荐 ,如果我们将两个文件一起使用,则可以创建一个完整的用户项交互矩阵,该矩阵可用于创建基于协作过滤的建议。

Now, we can offer the user three types of distributions of news:

现在,我们可以为用户提供三种新闻发布类型:

  1. Latest Feed: The fresh feed for every day

    最新饲料:每天新鲜的饲料
  2. Most Popular stories

    最受欢迎的故事
  3. Customized Feed: May contain unvisited feed from the last 2–3 days but will be tuned according to the user’s interests.

    自定义的Feed:可能包含最近2-3天未访问的Feed,但会根据用户的兴趣进行调整。

One thing worth noticing is the Latest feed is neither tuned nor popular most, still, it is

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值