js 刮一刮_刮擦刮擦

js 刮一刮

侵犯版权| 是的,另一个 (Copyright Violation | Yes, ANOTHER One)

Another incidence of plagiarism / content scraping has come to light, this morning. The URL structure is as follows (remove the [] from around the periods in the domain name, insert your Medium name or publication name, and use a fresh incognito browser window — I’d prefer not to give them any hits — rewards — from this story or website!):

抄袭/抄袭内容的另一种情况是今天早上发现。 URL结构如下(从域名中的句号附近删除[],插入您的媒体名称或出版物名称,并使用新的隐身浏览器窗口-我不希望给他们任何点击-奖励-从这个故事或网站!):

https://lab.dongri[.]me/<insert publication name or @[username] here>https://mission.lovecircular[.]com/<insert publication name or @[username] here>

https://lab.dongri [。] me / <在此处插入发布名称或@ [用户名]> https://mission.lovecircular [。] com / <在此处插入出版物名称或@ [用户名]>

A few things here are noteworthy:

这里有几件事值得注意:

  • They’ve scraped the look and feel of the whole site, not just the content.

    他们抓取了整个网站的外观,而不仅仅是内容。
  • They appear to retain the author credit; in fact, that links back to Medium, not to the scraper site, so you might never know that they have scraped all of the content from Medium, including that which is locked behind the MPP. You may be able to find your name using a Google site search, but it’s a bit hit or miss, possibly because the site is new and not fully indexed yet, or possibly by design. A site search: site:dongri[.]me copyright yielded this delicious irony — though the results here are far from complete:

    他们似乎保留了作者的功劳; 实际上,它链接回Medium,而不是爬虫站点,因此您可能永远不会知道他们已经从Medium刮走了所有内容,包括锁定在MPP后面的内容。 您可能可以使用Google网站搜索来找到自己的名字,但这有点成败之谜,可能是因为该网站是新的且尚未完全索引,或者是设计使然。 站点搜索: site:dongri [。] me版权产生了这种可笑的讽刺意味-尽管此处的结果远非完整:

Image for post
Screen capture by author on 9/21/2020
作者于9/21/2020的屏幕截图
  • Images link back to the original source (e.g., Unsplash).

    图像链接回原始来源(例如,Unsplash)。
  • Claps link back to Medium, and you see a pop-up that notifies you that you are being returned to Medium.com to perform that action.

    拍手链接回到Medium,您会看到一个弹出窗口,通知您您将返回Medium.com执行该操作。
  • Sharing icons appear, but have been rendered non-functional images only.

    共享图标出现,但仅呈现为非功能性图像。
  • They have not figured out how to leave off scraping author info within the body of the story:

    他们还没有想出如何在故事的内容中抓取作者信息:
Image for post
Although this is not yet showing up in search results, it’s a clear notice to anyone who is violating copyright! (Screen capture by author, 9/21/2020)
尽管这尚未显示在搜索结果中,但对于所有侵犯版权的人来说,这是一个明显的通知! (作者截屏,2020年9月21日)

There’s another one — different scraper site — found by Sharon Hurley Hall on 9/22/2020:

沙龙·赫尔利·霍尔( Sharon Hurley Hall)在2020年9月22日发现了另一个刮板站点:

Image for post
Oh, irony — they’ll even scrape a story about how to make their lives miserable! (Screen capture by author, 9/22/2020)
噢,具有讽刺意味的是,他们甚至会刮起一个有关如何使自己的生活变得悲惨的故事! (作者截屏,2020年9月22日)
  • Embeds from Instagram appear to be non-functional on the scraper’s site. I wonder if they’ve been in hot water, before?

    Instagram上的嵌入在刮板的网站上似乎无法正常工作。 我想知道他们以前是否去过热水?
  • Other embedded links appear to work: Medium links are replaced by the scraper site’s own domain (except for the Medium writer profile links); off-site links go to the original sites.

    其他嵌入的链接似乎也起作用:中型链接已由刮板站点自己的域替换(中型作家概要文件链接除外); 站外链接转到原始站点。

一般说明 (General Instructions)

如何查找版权侵权者的IP地址和网络托管公司(How to Find a Copyright Infringer’s IP Address and Web Hosting Company)

Here’s how to determine an infringer’s web hosting company and what’s needed in a DMCA take-down complaint.

以下是确定侵权者的网络托管公司的方法以及DMCA删除投诉中的要求。

1First, identify the domain name. That’s the part of the URL that looks like <something>.<ext> (for example, medium.com). If there’s another part, such as <blah>.medium.com, you would not include <blah>. On the other hand, if <blah> is followed by something other than a period, it is part of the domain name. For example, my-medium.com is a whole different website that has nothing to do with Medium!

1首先,确定域名。 这是URL中看起来像<something>的部分<ext> (例如,medium.com)。 如果还有其他部分,例如<blah> .medium.com,则不会包含<blah> 。 另一方面,如果<blah>后跟非句点,则它是域名的一部分。 例如, my-medium.com是一个完全不同的网站,与Medium没有任何关系!

Image for post
Identifying a Domain Name (Screen capture by author, 9/21/2020)
标识域名(作者截屏,2020年9月21日)

2Convert the domain to an IP address. It used to be that you could simply do a WHOIS lookup to find out where a domain was hosted. That is not always the case, when the site is hosted on a cloud server. You’ll need the accurate contact info in order to submit your DMCA take-down request.

2将域转换为IP地址。 过去,您只需进行WHOIS查找即可找到托管域名的位置。 当站点托管在云服务器上时,情况并非总是如此。 您需要准确的联系信息才能提交DMCA删除请求。

Image for post
Example of a Domain Converted to IP Address 115.68.168.138 (Screen capture by author, 9/22/2020)
域转换为IP地址115.68.168.138的示例(作者截屏,2020年9月22日)

3Use DomainTools.com to do a WHOIS lookup, in order to find out who the web hosting company is for the scraping domain, using the IP address you discovered in the previous step, instead of the domain:

3使用DomainTools.com进行WHOIS查找,以便使用在上一步中找到的IP地址而不是域来查找谁是爬网域的Web托管公司。

Image for post
Sample WHOIS Results for 115.68.168.138 — IMPORTANT: This gives you the hosting company’s contact info. At this point, it does pay to be nice — these are not the crooks! These are the people you will ask to assist in taking them down. (Screen capture by author, 9/21/2020)
115.68.168.138的WHOIS结果样本—重要提示:这为您提供了托管公司的联系信息。 在这一点上,确实值得付出–这些不是骗子! 这些人将要求您帮助他们撤下。 (作者截屏,2020年9月21日)

If you do a WHOIS lookup by the domain, and see cloudflare under DNS or Hosting, you may wish to include abuse@cloudflare.com on the cc: line of your email, for good measure. For example:

如果您按域名进行WHOIS查找,并在DNS或托管下看到cloudflare,则最好在电子邮件的cc:行中包含滥用@ cloudflare.com 。 例如:

Image for post
Screen capture by author, 9/21/2020
作者的屏幕截图,2020年9月21日
Image for post
Screen capture by author, 9/21/2020
作者的屏幕截图,2020年9月21日

Highlighted above are the sorts of things to look for in the WHOIS record.

上面突出显示的是要在WHOIS记录中查找的各种内容。

NOTE: You may have to complete a CAPTCHA code when using these tools in order to prove that you are a real human being, and not a bot. You will probably not get the domain owner’s name and information, as more and more website owners opt to pay extra money in order to cloak this for “privacy reasons.” It is discoverable, but generally not even needed. You won’t be dealing with content thieves directly, anyway. There’s just no point in asking nicely.

注意:使用这些工具时,您可能必须完成验证码,以证明您是真实的人,而不是机器人。 您可能不会获得域名所有者的名称和信息,因为越来越多的网站所有者选择出于“隐私原因”而隐瞒这笔钱。 它是可发现的,但通常甚至不需要。 无论如何,您不会直接与内容盗贼打交道。 问得好一点是没有意义的。

如何提交DMCA删除通知 (How to File a DMCA Take-Down Notice)

Email is usually sufficient. Some web hosting companies make it easy with an online form. Some purport to require snail mail, but few will refuse to act without that, these days — especially when dealing with the potential financial fall-out from scraping a large, commercial, copyrighted site like Medium.

电子邮件通常就足够了。 一些网络托管公司通过在线表格来简化这一过程。 某些人声称需要蜗牛邮件,但如今,很少有人会拒绝这样做,尤其是在处理由于刮擦诸如Medium之类的大型商业版权网站而造成的潜在财务后果时。

You will need to provide the following information:

您将需要提供以下信息:

Name, address, phone number, email address (if available) and physical or electronic signature of the copyright owner or a person authorized to act on the copyright owner’s behalf;

著作权人或获授权代表著作权人行事的人的姓名,地址,电话号码,电子邮件地址(如果有)和物理或电子签名;

Identification of the copyrighted work(s);

识别受版权保护的作品;

Identification of the infringing material you are asking us to remove or disable, and the Internet location of the infringing material;

您要求我们删除或禁用的侵权材料的标识,以及侵权材料的互联网位置;

Any additional information required to be included in a copyright infringement complaint under applicable law (as we may request from you as necessary)

根据适用法律,版权侵权投诉中必须包含的其他任何信息(我们可能会根据需要向您提出要求)

A statement that you have a good faith belief that use of the disputed material is not authorized by the copyright owner, its agent or the law;

说明您真诚地相信有争议的材料的使用未经版权所有者,其代理或法律的授权;

A statement that the information in the complaint is accurate, and under penalty of perjury, that you are authorized to act on behalf of the owner of an exclusive right that is allegedly infringed; AND

声明投诉中的信息准确无误,并受到伪证处分,您有权代表涉嫌侵权的专有权的所有人行事;

Your signature.

你的签名。

Please submit your complaint in one of the following ways:

请通过以下其中一种方式提交投诉:

Email the signed notification to : [THE WEB HOSTING COMPANY’S COPYRIGHT or ABUSE DEPARTMENT]

通过电子邮件将签名的通知通过电子邮件发送至:[网络托管公司的版权或滥用权]

Please note that you maybe liable for damages (including costs and attorneys’ fees) if you materially misrepresent that material is infringing your copyright. Accordingly, if you are not sure whether material available online infringes your copyright, we suggest that you first contact an attorney.

请注意,如果您严重虚假陈述材料侵犯了您的版权,则可能要承担损害赔偿(包括费用和律师费)。 因此,如果您不确定在线提供的材料是否侵犯了您的版权,我们建议您首先联系律师。

In short, do not file a DMCA take-down request for someone else if you are not their legal agent, and do not file one for writing on which you do not hold copyright or where copyrighted material has been licensed for use by the person or site by you or your agent and has not been used without your consent.

简而言之,如果您不是他人的法定代理人,请勿提出针对他人的DMCA删除请求,也不要针对您不拥有版权或经许可的人或他人使用版权材料的书面要求提出起诉网站由您或您的代理人提供,未经您的同意未经使用。

Do file, if it has.

做文件,如果有的话。

其他选择 (Other Options)

Look for the Google Analytics tag and report them to Google. How? Easy.

查找Google Analytics(分析)标签并将其报告给Google。 怎么样? 简单。

1First, go to one of the scraped items (on the scraper site, not on Medium or any site that has a legitimate license to display your content!) Right click, and select View Source or View page source.

1首先,转到其中一个被抓取的项目(在抓取站点上,而不是在Medium或具有合法许可证来显示您的内容的任何站点上!)右键单击,然后选择View SourceView page source

Image for post

2Don’t be daunted! Press Ctrl+F (Find) and search for UA- on the page. It is a Google tag and it’s how they perform analytics — number of visits, number of visitors, etc. Google does not like scammers.

2不要畏惧! 按Ctrl + F(查找),然后在页面上搜索UA-。 它是Google的标签,是他们执行分析的方式-访问次数,访问者人数等。Google不喜欢诈骗者。

Image for post
Search box (Screen capture by author, 9/21/2020)
搜索框(作者截屏,2020年9月21日)

What you are looking for looks something like this:

您要寻找的内容如下所示:

Image for post
Record that whole number: UA-xxxxxxxx-x (Screen capture by author, 9/21/2020)
记录该整数:UA-xxxxxxxx-x(作者截屏,2020年9月21日)

附加链接 (Additional Links)

之前的下架(Previous Take-Downs)

You may find some fun suggestions for dealing with the inevitable onslaught of content thievery here.

您可能会在这里找到一些有趣的建议来应对不可避免的内容攻击。

我需要通知吗? (Do I Need a Notice?)

You do not need to add a copyright notice to your work. Technically, this serves no real legal purpose, but as a courtesy to would-be thieves and to people who have poor understanding of copyright law, you may want to add the notice and explicitly claim your rights.

您无需在作品中添加版权声明。 从技术上讲,这没有真正的法律目的,但是出于对潜在的盗贼和对版权法了解不足的人们的礼貌,您可能希望添加该声明并明确主张您的权利。

如何添加通知? (How Do I Add a Notice?)

On Medium, it’s easy — just enclose a small “c” inside parentheses: ( ) with no spaces, and Medium will instantly convert it to a copyright symbol: ©

在Medium上,这很容易-只需在括号中加上一个小“ c” :(),不带空格,Medium会立即将其转换为版权符号:©

Next, be sure to include the word Copyright, your legal name, and the year (that’s current year, or first year of publication — current year, if you are re-publishing older pieces that were previously published elsewhere). Do not use a cutesy nickname here — things like Copyright © 2020 MySpaceNymphet won’t do it, unless you’ve registered that as a legal entity.

接下来,确保包括版权,您的法定名称和年份(如果您要重新发布以前在其他地方发布的旧作品,则为当年或出版的第一年-本年)一词。 请勿在此处使用可爱的昵称-除非您已将其注册为合法实体,否则,例如Copyright©2020 MySpaceNymphet之类的内容将不会使用。

If it makes you really happy, you can add “All Rights Reserved,” though this is not strictly true and accurate. You’ve already given Medium the rights to display your content. So, remember, you cannot send Medium a DMCA take-down notice if you posted your work here! (You can, if another member steals and reposts it here, though.)

如果这确实让您感到高兴,则可以添加“保留所有权利”,尽管这并非严格正确和准确。 您已经授予Medium显示内容的权利。 因此,请记住,如果您在此处发布您的作品,则无法向Medium发送DMCA删除通知! (不过,如果其他成员偷走并在此处重新发布,则可以。)

I prefer to simply add some text that’s a bit harder to scrape off with a bit of RegEx (pattern searching) and code, and will make it easier, if the plagiarist is careless, to find my work when it is stolen. For example:

我更喜欢简单地添加一些文本,用一些RegEx(模式搜索)和代码来刮掉一些文本,如果窃者不小心的话,这将使找回被盗的工作变得更加容易。 例如:

Holly Jahangiri is the author of Trockle; A Puppy, Not a Guppy; and A New Leaf for Lyle. She draws inspiration from her family, from her own childhood adventures (some of which only happened in her overactive imagination), and from readers both young and young at heart. Subscribe to her newsletter at https://hollyjahangiri.substack.com/

Holly Jahangiri是Trockle的作者; 小狗,而不是Kong雀鱼; 莱尔的新叶子 她从家人,童年的冒险经历中汲取灵感(有些冒险只是在她过度活跃的想象力中发生),也从年轻的读者那里汲取灵感。 https://hollyjahangiri.substack.com/上订阅她的时事通讯

关于衍生权的说明(A Note on Derivative Rights)

Did you know that your copyright includes something called “derivative rights”? This includes things like adapting the work to another form — such as a screenplay or a music video. It includes modifications (amateur plagiarists beware — paraphrasing will not protect you from a copyright violation claim!) and translations or capturing text in an image count as “derivative works.” So, poets, if you find someone on Instagram who has made pretty graphic images with your text incorporated into them without your consent — that’s an unauthorized derivative work and a copyright violation. If you choose to be “flattered” by this, I can’t help you — but I’d suggest you not be.

您是否知道您的版权包含称为“衍生权”的内容? 这包括将作品改编为另一种形式,例如剧本或音乐视频。 它包括修改(业余窃者要当心-措辞不能保护您免受版权侵害!),以及翻译或捕获图像中的文本作为“衍生作品”。 因此,诗人,如果您在Instagram上找到某个人,在未经您同意的情况下将漂亮的图形图像与您的文字结合在一起,那是未经授权的衍生作品,并且侵犯了版权。 如果您选择对此感到“受宠若惊”,那么我无能为力-但我建议您不要这样做。

The following is a copyrighted image and a derivative work. Scrapers are going to have fun with this story, because each element here is a separate violation, and they tend to translate the site when they scrape it, as well.

以下是受版权保护的图像和衍生作品。 抓取者会喜欢这个故事,因为这里的每个元素都是一个单独的违规,并且他们在抓取该站点时也倾向于翻译该站点。

Image for post
Screen capture by author, 9/21/2020 (Image Copyright © 2020 by Holly Jahangiri)
作者的屏幕截图,2020年9月21日(图片版权©2020,作者Holly Jahangiri)

翻译自: https://medium.com/swlh/scrape-scrape-scrape-6ff80745d051

js 刮一刮

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值