大数据搜索引擎技术_网络数据搜索技术

大数据搜索引擎技术

Nowadays this is a very big problem to search appropriate data on web search engines. This is quite difficult to get our desired data. Well, there are more search engines which perform the job of data searching like Google, Yahoo, Bing etc., but sometimes these are unable to provide required data. Here is a list of searching techniques to improve the accuracy of results. The uses of these searching techniques are just like "something is better than nothing". After using these, one can be able to get fruitful results.

如今,在网络搜索引擎上搜索适当的数据已经成为一个很大的问题。 这很难获得我们想要的数据。 好吧,有更多的搜索引擎可以执行数据搜索工作,例如Google,Yahoo,Bing等,但是有时它们无法提供所需的数据。 这是提高结果准确性的搜索技术列表。 这些搜索技术的使用就像“什么总比没有好” 。 使用这些工具后,可以取得丰硕的成果。

1. URL搜索 (1. URL Search)

URL (Uniform resource locator) is a particular web address inside a website. For example ‘http://sciencecongress.nic.in’ is a web address. When we type this in the search bar of a web browser, then we get the opened home page of this website. Suppose we wish to get information about membership of ‘Indian Science Congress Association’ which has the web address ‘http://sciencecongress.nic.in’. Then through the first way, we will open the home page and will click on membership link given on the website. The second way is to go through specified search i.e. type specified address (‘http://sciencecongress.nic.in/membership.htm’) in the search bar of the browser and we will get opened page of ‘membership’ link. Using third way with the help of search engine, we can go directly to the link of membership page of the particular website. When we are searching for the membership of a particular website through any search engine, then we can use this method. For this, type ‘inurl:membership’ followed by website name such as ‘inurl:membership sciencecongress’ in the search bar of the search engine. Then we will receive all web links containing the information about membership page of that website. Top three will be the links to membership page of that website. The other will also be the link to membership page, but some may be from Facebook or anything else.

URL(统一资源定位符)是网站内的特定网址。 例如,“ http://sciencecongress.nic.in”是一个网址。 当我们在网络浏览器的搜索栏中键入此内容时,我们将获得该网站的打开的主页。 假设我们希望获得有关“印度科学大会协会”会员资格的信息,该协会的网址为“ http://sciencecongress.nic.in”。 然后,通过第一种方法,我们将打开主页,然后单击网站上提供的成员资格链接。 第二种方法是进行指定的搜索,即在浏览器的搜索栏中键入指定的地址('http://sciencecongress.nic.in/membership.htm'),我们将打开“ membership”链接的打开页面。 在搜索引擎的帮助下,使用第三种方式,我们可以直接转到特定网站的会员页面的链接。 当我们通过任何搜索引擎搜索特定网站的成员资格时,可以使用此方法。 为此,请在搜索引擎的搜索栏中输入“ inurl:membership”,然后输入网站名称,例如“ inurl:membership sciencecongress”。 然后,我们将收到所有包含该网站会员页面信息的Web链接。 前三名将是该网站会员页面的链接。 另一个也将是指向会员页面的链接,但其中一些可能来自Facebook或其他任何网站。

But if we type ‘inurl:membership’ then the web links returned by search engine will contain the name of only those websites which has membership link inside them and if our required website link is in the top results then it will show in the first page otherwise we have to search next pages returned by search engine. For example if we type inurl:membership in the search bar of the search engine, we will easily and directly get the link of membership page of ‘club penguin’ website (We can find this on the first page because nowadays this is in the top order in Google search priority, otherwise we have to search in next some pages) But if we write ‘inurl:membership clubpenguin’, then we will find only those links which has the information related to membership of club penguin. All the search engines will produce the links containing the word ‘membership’.

但是,如果我们输入“ inurl:membership”,那么搜索引擎返回的网络链接将仅包含内部具有会员链接的那些网站的名称,如果我们所需的网站链接在顶部结果中,则它将显示在第一页中否则,我们必须搜索搜索引擎返回的下一页。 例如,如果我们在搜索引擎的搜索栏中键入inurl:membership,我们将轻松直接地获得“ club penguin”网站的会员页面链接(我们可以在第一页上找到该链接,因为如今它位于顶部按照Google搜索优先顺序排列,否则我们必须在接下来的页面中进行搜索。但是,如果我们编写“ inurl:membership clubpenguin”,那么我们将仅找到那些具有与企鹅俱乐部会员有关的信息的链接。 所有搜索引擎都将产生包含单词“ membership”的链接。

2.知道单词的定义 (2. Know the definition of a word)

People use different techniques to find out the definition of particular one. For example, we wish to find out the definition of the computer then we can write “What the computer is”, "definition of the computer", "what is the definition of a computer “or anything related to it. But in this way, we can find different definitions which may be not satisfied with us. The work style of a search engine is something like this that all words written by you are searched in different websites. That is why many times we get results are not related to our search. But if we write our query in this way.

人们使用不同的技术来找出特定的定义。 例如,我们希望找出计算机的定义,然后可以写“计算机是什么”,“计算机的定义”,“计算机的定义是什么”或与之相关的任何东西。但是通过这种方式,我们会发现可能不满足我们的不同定义。搜索引擎的工作方式类似于这样,您在所有网站上搜索的所有单词都在不同的网站上进行搜索,这就是为什么我们多次获得结果与我们无关搜索,但是如果我们以这种方式编写查询。

Define: computer

定义:计算机

Then there will be complete information. Firstly there will be word meaning of computer, secondly encyclopedia of computer and tertiary other desired information.

然后将有完整的信息。 首先是计算机的词义,其次是计算机百科全书,以及其他所需的第三信息。

3.最少使用介词 (3. Minimum use of prepositions)

This is defiantly a truth that search engines do not consider prepositions in searching such as ‘in, for, and’. We can easily find this concept in practice, when some words are typed in the search bar of a search engine then in search results the words typed by us are shown in bold black letters excluding prepositions. Only those words are shown in bold black letters which are not prepositions, no matter those are numeric or alphabetic.

这绝对是一个事实,即搜索引擎不会在搜索中考虑介词(例如“ in,for和”)。 在实践中,当在搜索引擎的搜索栏中键入某些单词,然后在搜索结果中,我们键入的单词以粗体黑体字(不包括介词)显示时,我们很容易找到这个概念。 无论是数字字母还是字母字母,只有那些单词都以粗体黑色字母显示,它们不是介词。

For example, if we search ‘ewaste seminar in India in 2018’. Then the words ‘ewaste’, ‘seminar’, ‘India’, ‘2018’ are shown bold black. There is also a preposition i.e. ‘in’ in the words written in the search bar. The result shown by search engine contains a no. of web addresses and also some concern text of those web addresses. We can also search same above information with a little change i.e. removing ‘in’ proposition. If we make a search with same words with a little change i.e. excluding preposition ‘in’, then we can get the same result with a negligible change. All the links of websites excluding one are same after using both sets of words. So these are the evidence that may try to prove that search engines do not consider preposition in searching. In other words, we can say prepositions play a little bit roles in searching. So, we should make as little as a possible use of prepositions in searching. Search engines search all the words that we write in a search bar. So during a search, this must be kept in mind, type a most relevant word in a little sentence. If we write more words for searching, then this will not give sufficient result, because every word is searched by a search engine in different websites, sometimes, the word belongs to totally different websites that are not concerned with our desired result, which may be a totally distinct type of data. But in contrast, if there are minimum words related to the subject, so more chances to get the desired result in minimum time.

例如,如果我们搜索“ 2018年印度废物研讨会”。 然后,单词“ ewaste”,“ semnarnar”,“ India”,“ 2018”显示为黑色粗体。 在搜索栏中写的单词中还有一个介词,即“ in”。 搜索引擎显示的结果中包含一个“否”。 网址,还有一些有关这些网址的文字。 我们也可以稍作更改就搜索上述相同信息,即删除“ in”命题。 如果我们用相同的词进行少量的搜索,即排除介词“ in”,那么我们可以得到相同的结果,而变化可以忽略不计。 使用这两组词后,除一个网站以外的所有网站链接都是相同的。 因此,这些证据可能会试图证明搜索引擎在搜索中不考虑介词。 换句话说,我们可以说介词在搜索中起了一点作用。 因此,我们应该在搜索中尽可能少地使用介词。 搜索引擎搜索我们在搜索栏中写的所有单词。 因此,在搜索过程中,必须牢记这一点,在一个小句子中键入一个最相关的词。 如果我们写更多的单词进行搜索,那么将不会得到足够的结果,因为每个单词都是由搜索引擎在不同网站中搜索的,有时,该单词属于与我们期望的结果无关的完全不同的网站,这可能是完全不同类型的数据。 但是相反,如果与主题相关的单词最少,那么有更多机会在最短的时间内获得所需的结果。

4.如何搜索一组单词 (4. How to search a set of words)

For searching a particular set of words, there is an easy technique write the set of words in inverted commas. For example “the life is not a bed of roses”. In this way mostly those web links will occur in which the above words are in given order. If we search a set of words written in inverted comma then all the words are in the inverted comma searched in only the given order. The search engines search the words typed by us on different websites. Suppose we have entered “Life membership of Indian science congress association”, then the words are searched in different websites. So sometimes, this may be also possible that a link related to word ‘life’ can be found in different websites, in one of them contains the information about life insurance, so the web link of that life insurance company may be shown in the result, sometimes, another word ‘membership’ may present different link because this can be found in other websites. But when the words (Life membership of Indian science congress association) are written in an inverted comma (“Life membership of Indian science congress association”), then most of the web links appear which are closely related to our search i.e. less no. of links of websites which are not concerned with required data. In case of idiom/phrase mostly those web links occur in which these words are in the given order.

为了搜索一组特定的单词,有一种简便的方法可以将这些单词集以逗号分隔。 例如,“生活不是玫瑰花床”。 以这种方式,大多数将以上述单词以给定顺序出现的那些Web链接出现。 如果我们搜索一组用逗号分隔的单词,则所有单词都以仅以给定顺序搜索的逗号分隔。 搜索引擎搜索我们在不同网站上键入的单词。 假设我们已经输入“印度科学大会协会的终身会员身份”,然后在不同的网站上搜索这些单词。 因此,有时也可能在不同的网站上找到与“人寿”一词相关的链接,其中一个网站包含有关人寿保险的信息,因此结果中可能会显示该人寿保险公司的网络链接,有时,“会员”一词可能会显示不同的链接,因为可以在其他网站上找到该链接。 但是,当单词(印度科学大会协会的终身会员资格)用倒逗号(“印度科学大会协会的终身会员资格”)书写时,则会出现大多数与我们的搜索紧密相关的网络链接,即“否”。 与所需数据无关的网站链接。 在习惯用语/短语的情况下,大多数出现这些单词以给定顺序排列的网络链接。

5.选择类别 (5. Selection of category)

It is well known that in a search engine there are different categories for searching a specified information such as map, news and etc. So first select the category of search that we wish to search. After selecting the category of search, second thing is to set the words for searching. Suppose if we wish to search the pictures then first we have to select the images category otherwise we can’t get the desired result. Also if we want to get the information about the location then we have to select first map category otherwise same problem will occur i.e. no desired result. The second one is to check the order of the words, Suppose we wish to get the information about the holidays in India then we write ‘Holidays India’ but if more information is not obtained then we can write ‘India vacation’. The mean to say is that to change the order of words for the desired result and also use synonyms for a particular word for a better search.

众所周知,在搜索引擎中,有不同的类别用于搜索特定信息,例如地图,新闻等。因此,首先选择我们要搜索的搜索类别。 选择搜索类别后,第二件事是设置要搜索的单词。 假设如果我们要搜索图片,那么首先我们必须选择图片类别,否则我们将无法获得理想的结果。 另外,如果我们要获取有关位置的信息,则必须选择第一个地图类别,否则将发生相同的问题,即没有期望的结果。 第二个是检查单词的顺序,假设我们希望获取有关印度假期的信息,那么我们写“印度假期”,但是如果没有获得更多信息,那么我们可以写“印度假期”。 说的意思是要更改单词顺序以获得所需结果,并且还对特定单词使用同义词以进行更好的搜索。

6.我对“ Google”感到很幸运 (6. I am feeling lucky in ‘Google')

This is a button which is a neighbor of ‘Google search’ button in Google. This opens the most relevant page of our desired search. For example, if we want to search the conference's list of India in 2018, For this we type “conferences in India in 2018” or “conferences India 2018” (excluding preposition) and press the ‘Google search’ button then we will get the different web links. There will be a lot of results and it will create a problem. Which one should be open and which are to be left? But if we use ‘I am feeling lucky’ button then we will get opened most relevant page. In the above example, we must get opened http://www.conferencealerts.com/country-listing.php?page=5&ipp=100&country=India which contains only the list of conferences of India in 2018 and nothing else. The same will also be available in web links if we go through the ‘Google search’ button. But sometimes we don’t have the information about the exact web link and we were unable to get the required information.

这是一个按钮,与Google中的“ Google搜索”按钮相邻。 这将打开我们想要的搜索中最相关的页面。 例如,如果我们要搜索2018年印度会议列表,为此,我们键入“ 2018年印度会议”或“ 2018年印度会议”(不包括介词),然后按“ Google搜索”按钮,我们将获得不同的网络链接。 结果会很多,这会产生问题。 哪一个应该打开,哪个应该剩下? 但是,如果使用“我感到幸运”按钮,则会打开最相关的页面。 在上面的示例中,我们必须打开http://www.conferencealerts.com/country-listing.php?page=5&ipp=100&country=India,其中仅包含2018年印度会议列表,而没有其他内容。 如果我们单击“ Google搜索”按钮,则在Web链接中也将提供同样的功能。 但是有时我们没有有关确切Web链接的信息,因此我们无法获得所需的信息。

翻译自: https://www.includehelp.com/articles/web-data-searching-techniques.aspx

大数据搜索引擎技术

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值