Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Foundation

Finding copyright violations with CopyScape tool

 

 

Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded


mboverloadlister at gmail

Aug 24, 2008, 12:08 AM

Post #1 of 6 (231 views)
Permalink
Finding copyright violations with CopyScape tool

(NOTE: I have absolutely 0 relation to anyone at CopyScape or have
any interest in the financial situation of the company)

Hey everyone,

Well a few years ago I used to use [[Copyscape]] at
http://www.copyscape.com/ to help me find copyright violations on new
Wikipedia pages. Basically, you input a URL and it scans the internet
for other copies of all or parts of that text. It's a pretty awesome
tool - try it out (try somewhere other than Wikipedia, explained
below).

It is a paid tool; it costs $0.05 per search. However, they had
Wikipedia whitelisted to be free of charge. However, I recently tried
to use it again and it turns out that they no longer whitelist
Wikipedia addresses. Basically, it had become so popular with
Wikipedia patrollers that they were getting bogged down.

I emailed the people in charge of the site and this is what they had to say:
"We're huge fans of Wikipedia and would love to whitelist it. In fact, we
were doing this for a while, but we had to stop due to constraints we
have on the supply-side, and the large amount of Wikipedia use."

Now, barring me paying for my own searches, is there any way that the
Wikimedia Foundation might step in for funding, either at cost
(probably insanely cheap) or with a blanket plan to whitelist
Wikipedia again? I personally feel this would be an incredibly
valuable tool.

In addition to the benefit I think we would gain they would have the
great distinction of being a product used by one of the top 10
websites on the Internet (which could be a bargaining chip).

Thanks for any comments!
[[User:mboverload]] @ Wikipedia-EN

_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


vacuum at jeb

Aug 24, 2008, 1:59 AM

Post #2 of 6 (223 views)
Permalink
Re: Finding copyright violations with CopyScape tool [In reply to]

I'm currently experimenting with a Javascript gadget that enables live copyvio searches at the Norwegian Wikipedia. The searches uses Yahoo! as the search engine. For the moment a few additional features are lacking, like copyvio searches in old contributions to articles.

If the tool should be popular Yahoo! should probably be contacted as the load generated could be pretty huge. I haven't tried to gather any usage statistics but I guess a couple of searches are generated per 100 contributions or something similar. A gross guess at no.wp we generate somewhere between 100 and 1000 such searches each day.

Unfortunately the documentation is only available in Norwegian.

John

>(NOTE: I have absolutely 0 relation to anyone at CopyScape or have
>any interest in the financial situation of the company)
>
>Hey everyone,
>
>Well a few years ago I used to use [[Copyscape]] at
>http://www.copyscape.com/ to help me find copyright violations on new
>Wikipedia pages. Basically, you input a URL and it scans the internet
>for other copies of all or parts of that text. It's a pretty awesome
>tool - try it out (try somewhere other than Wikipedia, explained
>below).
>
>It is a paid tool; it costs $0.05 per search. However, they had
>Wikipedia whitelisted to be free of charge. However, I recently tried
>to use it again and it turns out that they no longer whitelist
>Wikipedia addresses. Basically, it had become so popular with
>Wikipedia patrollers that they were getting bogged down.
>
>I emailed the people in charge of the site and this is what they had to say:
>"We're huge fans of Wikipedia and would love to whitelist it. In fact, we
>were doing this for a while, but we had to stop due to constraints we
>have on the supply-side, and the large amount of Wikipedia use."
>
>Now, barring me paying for my own searches, is there any way that the
>Wikimedia Foundation might step in for funding, either at cost
>(probably insanely cheap) or with a blanket plan to whitelist
>Wikipedia again? I personally feel this would be an incredibly
>valuable tool.
>
>In addition to the benefit I think we would gain they would have the
>great distinction of being a product used by one of the top 10
>websites on the Internet (which could be a bargaining chip).
>
>Thanks for any comments!
>[[User:mboverload]] @ Wikipedia-EN
>
>_______________________________________________
>foundation-l mailing list
>foundation-l[at]lists.wikimedia.org
>Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Aug 24, 2008, 4:52 AM

Post #3 of 6 (221 views)
Permalink
Re: Finding copyright violations with CopyScape tool [In reply to]

On Sun, Aug 24, 2008 at 4:59 AM, <vacuum[at]jeb.no> wrote:

> If the tool should be popular Yahoo! should probably be contacted as the
> load generated could be pretty huge.
>

Does anyone use Grub any more? Some kind of open source, distributed system
would be a good long term solution. I hate to give Jimbo a plug, but...
_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Platonides at gmail

Aug 24, 2008, 10:14 AM

Post #4 of 6 (221 views)
Permalink
Re: Finding copyright violations with CopyScape tool [In reply to]

Anthony wrote:
> On Sun, Aug 24, 2008 at 4:59 AM, wrote:
>
>> If the tool should be popular Yahoo! should probably be contacted as the
>> load generated could be pretty huge.
>>
>
> Does anyone use Grub any more? Some kind of open source, distributed system
> would be a good long term solution. I hate to give Jimbo a plug, but...

They're supposed to provide their full index for download, so they won't
probably have any problem. You could always fork it... if you had enough
space and computer power, that is.

However, before asking them something like that, we should lift the ban
on Grub, so it can crawl wikipedia ;)


_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


newsmarkie at googlemail

Aug 24, 2008, 10:22 AM

Post #5 of 6 (213 views)
Permalink
Re: Finding copyright violations with CopyScape tool [In reply to]

yes the system is still used. the shards of the index are available for
download @ http://search.isc.org/download/ however that is only partial and
(afaik) relatively out of date. the better way would be to query the api of
the system.

also yeah changes in robots.txt would be nice :-p

regards

mark

On Sun, Aug 24, 2008 at 6:14 PM, Platonides <Platonides[at]gmail.com> wrote:

> Anthony wrote:
> > On Sun, Aug 24, 2008 at 4:59 AM, wrote:
> >
> >> If the tool should be popular Yahoo! should probably be contacted as the
> >> load generated could be pretty huge.
> >>
> >
> > Does anyone use Grub any more? Some kind of open source, distributed
> system
> > would be a good long term solution. I hate to give Jimbo a plug, but...
>
> They're supposed to provide their full index for download, so they won't
> probably have any problem. You could always fork it... if you had enough
> space and computer power, that is.
>
> However, before asking them something like that, we should lift the ban
> on Grub, so it can crawl wikipedia ;)
>
>
> _______________________________________________
> foundation-l mailing list
> foundation-l[at]lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


beesley at gmail

Aug 24, 2008, 10:23 AM

Post #6 of 6 (213 views)
Permalink
Re: Finding copyright violations with CopyScape tool [In reply to]

On Mon, Aug 25, 2008 at 3:14 AM, Platonides <Platonides[at]gmail.com> wrote:
> Anthony wrote:
>> Does anyone use Grub any more? Some kind of open source, distributed system
>> would be a good long term solution. I hate to give Jimbo a plug, but...
>
> They're supposed to provide their full index for download, so they won't
> probably have any problem. You could always fork it... if you had enough
> space and computer power, that is.

The mailing list at http://lists.wikia.com/mailman/listinfo/grub-dev
is a good place to ask about it. The index is pretty small at the
moment, but this sounds a good potential use for it in future.

Angela

_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.