Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Foundation

Historical wikipedia dumps

 

 

Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded


mboverloadlister at gmail

Aug 21, 2008, 11:33 PM

Post #1 of 22 (647 views)
Permalink
Historical wikipedia dumps

Does anyone know where old database dumps are kept? (all revisions
preferable). I asked in #wikimedia-tech but was told that that
Wikimedia does not keep that kind of thing.

Anyone have any ideas? It's for a project to develop a new grammar
checker that needs to see how articles are created and deleted over
time - thus just the old revisions wouldn't work.

I thought this quote was a good one, and would be an acceptable solution.

"Only wimps use tape backup: _real_ men just upload their important
stuff on ftp, and let the rest of the world mirror it ;)"
Torvalds, Linus (1996-07-20). Post to linux.dev.kernel
newsgroup. Retrieved on 2006-08-28.

Thanks,
User:mboverload

_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


millosh at gmail

Aug 22, 2008, 3:19 AM

Post #2 of 22 (637 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

On Fri, Aug 22, 2008 at 8:33 AM, mboverload <mboverloadlister[at]gmail.com> wrote:
> Does anyone know where old database dumps are kept? (all revisions
> preferable). I asked in #wikimedia-tech but was told that that
> Wikimedia does not keep that kind of thing.
>
> Anyone have any ideas? It's for a project to develop a new grammar
> checker that needs to see how articles are created and deleted over
> time - thus just the old revisions wouldn't work.
>
> I thought this quote was a good one, and would be an acceptable solution.
>
> "Only wimps use tape backup: _real_ men just upload their important
> stuff on ftp, and let the rest of the world mirror it ;)"
> Torvalds, Linus (1996-07-20). Post to linux.dev.kernel
> newsgroup. Retrieved on 2006-08-28.

There is no a lot of sense to keep historical dumps because the only
"historical information" from such dumps would be a timestamp and,
possibly, a different file format (it is XML now, it was SQL in the
past). All relevant historical informations which are kept inside of
the dumps are inside of the latest database dump.

_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


mathias.schindler at gmail

Aug 22, 2008, 3:29 AM

Post #3 of 22 (634 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

On Fri, Aug 22, 2008 at 12:19 PM, Milos Rancic <millosh[at]gmail.com> wrote:

> There is no a lot of sense to keep historical dumps because the only
> "historical information" from such dumps would be a timestamp and,
> possibly, a different file format (it is XML now, it was SQL in the
> past). All relevant historical informations which are kept inside of
> the dumps are inside of the latest database dump.

Deleted articles, oversighted versions, anyone?Ä

_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


sterkebak at gmail

Aug 22, 2008, 3:39 AM

Post #4 of 22 (636 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

Hi,

Deleted articels are also in the dump like oversight also. If you
delete a articele is stay's in the database.

Greatings,
Huib

2008/8/22, Mathias Schindler <mathias.schindler[at]gmail.com>:
> On Fri, Aug 22, 2008 at 12:19 PM, Milos Rancic <millosh[at]gmail.com> wrote:
>
>> There is no a lot of sense to keep historical dumps because the only
>> "historical information" from such dumps would be a timestamp and,
>> possibly, a different file format (it is XML now, it was SQL in the
>> past). All relevant historical informations which are kept inside of
>> the dumps are inside of the latest database dump.
>
> Deleted articles, oversighted versions, anyone?Ä
>
> _______________________________________________
> foundation-l mailing list
> foundation-l[at]lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


zvandijk at googlemail

Aug 22, 2008, 4:53 AM

Post #5 of 22 (636 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

Once I had this idea: a tool that shows Wikipedia at a certain, chosen
point of time. For example, I'd like to browse through Wikipedia
seeing always the state of January 1st 2003. Image if Wikipedia were
already decades old and we could read the state of 1965. (One can
always use the version history, yes, but that's more work for the
reader.) Maybe this is something more interesting to a historian like
me than to other people. :-)
Ziko


2008/8/22 Milos Rancic <millosh[at]gmail.com>:
> On Fri, Aug 22, 2008 at 8:33 AM, mboverload <mboverloadlister[at]gmail.com> wrote:
>> Does anyone know where old database dumps are kept? (all revisions
>> preferable). I asked in #wikimedia-tech but was told that that
>> Wikimedia does not keep that kind of thing.
>>
>> Anyone have any ideas? It's for a project to develop a new grammar
>> checker that needs to see how articles are created and deleted over
>> time - thus just the old revisions wouldn't work.
>>
>> I thought this quote was a good one, and would be an acceptable solution.
>>
>> "Only wimps use tape backup: _real_ men just upload their important
>> stuff on ftp, and let the rest of the world mirror it ;)"
>> Torvalds, Linus (1996-07-20). Post to linux.dev.kernel
>> newsgroup. Retrieved on 2006-08-28.
>
> There is no a lot of sense to keep historical dumps because the only
> "historical information" from such dumps would be a timestamp and,
> possibly, a different file format (it is XML now, it was SQL in the
> past). All relevant historical informations which are kept inside of
> the dumps are inside of the latest database dump.
>
> _______________________________________________
> foundation-l mailing list
> foundation-l[at]lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



--
Ziko van Dijk
NL-Silvolde

_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


smolensk at eunet

Aug 22, 2008, 5:11 AM

Post #6 of 22 (637 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

Ziko van Dijk wrote:
> Once I had this idea: a tool that shows Wikipedia at a certain, chosen
> point of time. For example, I'd like to browse through Wikipedia
> seeing always the state of January 1st 2003. Image if Wikipedia were
> already decades old and we could read the state of 1965. (One can
> always use the version history, yes, but that's more work for the
> reader.) Maybe this is something more interesting to a historian like
> me than to other people. :-)

Me too, that would be excellent :) Not sure how light on the database it
could be made, but it shouldn't be too hard to make static pages frozen
at a certain point of time.

_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


smolensk at eunet

Aug 22, 2008, 5:12 AM

Post #7 of 22 (636 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

Ziko van Dijk wrote:
> Once I had this idea: a tool that shows Wikipedia at a certain, chosen
> point of time. For example, I'd like to browse through Wikipedia
> seeing always the state of January 1st 2003. Image if Wikipedia were
> already decades old and we could read the state of 1965. (One can
> always use the version history, yes, but that's more work for the
> reader.) Maybe this is something more interesting to a historian like
> me than to other people. :-)

Me too, that would be excellent :) Not sure how light on the database it
could be made, but it shouldn't be too hard to make static pages frozen
at a certain point of time.

_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


puppy at KillerChihuahua

Aug 22, 2008, 5:13 AM

Post #8 of 22 (637 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

I think that's a fascinating idea. But then, I'm a history buff.

Ziko van Dijk wrote:
> Once I had this idea: a tool that shows Wikipedia at a certain, chosen
> point of time. For example, I'd like to browse through Wikipedia
> seeing always the state of January 1st 2003. Image if Wikipedia were
> already decades old and we could read the state of 1965. (One can
> always use the version history, yes, but that's more work for the
> reader.) Maybe this is something more interesting to a historian like
> me than to other people. :-)
> Ziko
>
>
> 2008/8/22 Milos Rancic <millosh[at]gmail.com>:
>
>> On Fri, Aug 22, 2008 at 8:33 AM, mboverload <mboverloadlister[at]gmail.com> wrote:
>>
>>> Does anyone know where old database dumps are kept? (all revisions
>>> preferable). I asked in #wikimedia-tech but was told that that
>>> Wikimedia does not keep that kind of thing.
>>>
>>> Anyone have any ideas? It's for a project to develop a new grammar
>>> checker that needs to see how articles are created and deleted over
>>> time - thus just the old revisions wouldn't work.
>>>
>>> I thought this quote was a good one, and would be an acceptable solution.
>>>
>>> "Only wimps use tape backup: _real_ men just upload their important
>>> stuff on ftp, and let the rest of the world mirror it ;)"
>>> Torvalds, Linus (1996-07-20). Post to linux.dev.kernel
>>> newsgroup. Retrieved on 2006-08-28.
>>>
>> There is no a lot of sense to keep historical dumps because the only
>> "historical information" from such dumps would be a timestamp and,
>> possibly, a different file format (it is XML now, it was SQL in the
>> past). All relevant historical informations which are kept inside of
>> the dumps are inside of the latest database dump.
>>
>> _______________________________________________
>> foundation-l mailing list
>> foundation-l[at]lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>>
>>
>
>
>
>

_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


millosh at gmail

Aug 22, 2008, 5:18 AM

Post #9 of 22 (634 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

On Fri, Aug 22, 2008 at 1:53 PM, Ziko van Dijk <zvandijk[at]googlemail.com> wrote:
> Once I had this idea: a tool that shows Wikipedia at a certain, chosen
> point of time. For example, I'd like to browse through Wikipedia
> seeing always the state of January 1st 2003. Image if Wikipedia were
> already decades old and we could read the state of 1965. (One can
> always use the version history, yes, but that's more work for the
> reader.) Maybe this is something more interesting to a historian like
> me than to other people. :-)

Yes, it is interesting. I was thinking about that, too :) The only
problem is that such possibility would use a lot of computing
resources or a lot of storage resource. So, there is a need for a lot
of work to make it available on web. While the computing rule "extract
pages earlier than" shouldn't be very complex, it may take a lot of
time for generation of such extract (a couple of hours? a couple of
days? -- on an ordinary computer).

Such tool may be very interesting for getting large picture not only
about Wikipedia (and other Wikimedian projects), but about events,
global and local social developments, public persons, as well as about
Wikimedians themselves.

Also, for a lot of historical informations it is not necessary to make
exactly such tool. It is possible to browse histories of the pages or
to make some much simpler tool for connecting them. And it is true
that historians which job would be to explore the first decade of 21st
century will have much better materials than historians which job
would be to explore the decade earlier.

_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


mohamed.m.k at gmail

Aug 22, 2008, 6:24 AM

Post #10 of 22 (637 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

On Fri, Aug 22, 2008 at 1:39 PM, Huib Laurens <sterkebak[at]gmail.com> wrote:

> Hi,
>
> Deleted articels are also in the dump like oversight also. If you
> delete a articele is stay's in the database.
>

This is wrong. the deleted data (and oversight of course) are not available
@ down.wm.org

--
--alnokta
_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


nemowiki at gmail

Aug 22, 2008, 10:02 AM

Post #11 of 22 (630 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

On Fri, Aug 22, 2008 at 1:39 PM, Huib Laurens <sterkebak[at]gmail.com> wrote:

> Hi,
>
> Deleted articels are also in the dump like oversight also. If you
> delete a articele is stay's in the database.

Yes, but these are private data: e.g. http://download.wikimedia.org/enwiki/20080724/ «Deleted page and revision data. (private)».

Nemo
_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Aug 22, 2008, 10:10 AM

Post #12 of 22 (632 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

On Fri, Aug 22, 2008 at 6:19 AM, Milos Rancic <millosh[at]gmail.com> wrote:

> All relevant historical informations which are kept inside of
> the dumps are inside of the latest database dump.


Anyone know when the last valid full history database dump was? I've got a
134 gig one from 20080103, but I seem to remember that being corrupt. I'm
also not sure if I missed a more recent one.
_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Platonides at gmail

Aug 24, 2008, 10:24 AM

Post #13 of 22 (581 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

Ziko van Dijk wrote:
> Once I had this idea: a tool that shows Wikipedia at a certain, chosen
> point of time. For example, I'd like to browse through Wikipedia
> seeing always the state of January 1st 2003. Image if Wikipedia were
> already decades old and we could read the state of 1965. (One can
> always use the version history, yes, but that's more work for the
> reader.) Maybe this is something more interesting to a historian like
> me than to other people. :-)
> Ziko

I think it was discussed before on wikitech-l. Probably when talking
about implementing stable versions.
Wouldn't be too hard to restrict on a given page to the history data at
X date. However,
-You would need to reverse page moves.
-When a page was deleted and some revisions restored before the epoch,
you don't know which ones were restored.
-Page merges would be specially difficult, as they're a mix of the two
above.


_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Platonides at gmail

Aug 24, 2008, 10:27 AM

Post #14 of 22 (580 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

mboverload wrote:
> Does anyone know where old database dumps are kept? (all revisions
> preferable). I asked in #wikimedia-tech but was told that that
> Wikimedia does not keep that kind of thing.
>
> Anyone have any ideas? It's for a project to develop a new grammar
> checker that needs to see how articles are created and deleted over
> time - thus just the old revisions wouldn't work.

You have the delete log, but you'd need sopme approximation to when they
were created (unless you have deletedhistory right on that wiki).


> I thought this quote was a good one, and would be an acceptable solution.
>
> "Only wimps use tape backup: _real_ men just upload their important
> stuff on ftp, and let the rest of the world mirror it ;)"
> Torvalds, Linus (1996-07-20). Post to linux.dev.kernel
> newsgroup. Retrieved on 2006-08-28.

Then bug WMF about those new storage servers :)


_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


phoebe.wiki at gmail

Aug 25, 2008, 8:29 AM

Post #15 of 22 (553 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

On Fri, Aug 22, 2008 at 5:12 AM, Nikola Smolenski <smolensk[at]eunet.yu> wrote:
> Ziko van Dijk wrote:
>> Once I had this idea: a tool that shows Wikipedia at a certain, chosen
>> point of time. For example, I'd like to browse through Wikipedia
>> seeing always the state of January 1st 2003. Image if Wikipedia were
>> already decades old and we could read the state of 1965. (One can
>> always use the version history, yes, but that's more work for the
>> reader.) Maybe this is something more interesting to a historian like
>> me than to other people. :-)
>
> Me too, that would be excellent :) Not sure how light on the database it
> could be made, but it shouldn't be too hard to make static pages frozen
> at a certain point of time.

Like http://nostalgia.wikimedia.org, but for more dates? :)

_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


zvandijk at googlemail

Aug 25, 2008, 9:26 AM

Post #16 of 22 (554 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

Oh, this nostalgia wp still exists, yes.

I thought about a tool or a user surface where I simply type
"2003-01-01" (as an example) and Wikipedia will show me the articles
from that point of time. I understand that there might be problems
with deleted images, merged articles, right. But it would still be
interesting enough, certainly the older Wikipedia grows. I do not know
so much about technical matters, but I can not imagine that such a
tool would be very complicated. (?)

Greetings
Ziko


2008/8/25 phoebe ayers <phoebe.wiki[at]gmail.com>:
> On Fri, Aug 22, 2008 at 5:12 AM, Nikola Smolenski <smolensk[at]eunet.yu> wrote:
>> Ziko van Dijk wrote:
>>> Once I had this idea: a tool that shows Wikipedia at a certain, chosen
>>> point of time. For example, I'd like to browse through Wikipedia
>>> seeing always the state of January 1st 2003. Image if Wikipedia were
>>> already decades old and we could read the state of 1965. (One can
>>> always use the version history, yes, but that's more work for the
>>> reader.) Maybe this is something more interesting to a historian like
>>> me than to other people. :-)
>>
>> Me too, that would be excellent :) Not sure how light on the database it
>> could be made, but it shouldn't be too hard to make static pages frozen
>> at a certain point of time.
>
> Like http://nostalgia.wikimedia.org, but for more dates? :)
>
> _______________________________________________
> foundation-l mailing list
> foundation-l[at]lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



--
Ziko van Dijk
NL-Silvolde

_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


ben.louis.yates at gmail

Aug 27, 2008, 9:50 PM

Post #17 of 22 (488 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

What would be ideal is a client-side wiki reader that could load past
revisions at runtime.

On Mon, Aug 25, 2008 at 12:26 PM, Ziko van Dijk <zvandijk[at]googlemail.com> wrote:
> Oh, this nostalgia wp still exists, yes.
>
> I thought about a tool or a user surface where I simply type
> "2003-01-01" (as an example) and Wikipedia will show me the articles
> from that point of time. I understand that there might be problems
> with deleted images, merged articles, right. But it would still be
> interesting enough, certainly the older Wikipedia grows. I do not know
> so much about technical matters, but I can not imagine that such a
> tool would be very complicated. (?)
>
> Greetings
> Ziko
>
>
> 2008/8/25 phoebe ayers <phoebe.wiki[at]gmail.com>:
>> On Fri, Aug 22, 2008 at 5:12 AM, Nikola Smolenski <smolensk[at]eunet.yu> wrote:
>>> Ziko van Dijk wrote:
>>>> Once I had this idea: a tool that shows Wikipedia at a certain, chosen
>>>> point of time. For example, I'd like to browse through Wikipedia
>>>> seeing always the state of January 1st 2003. Image if Wikipedia were
>>>> already decades old and we could read the state of 1965. (One can
>>>> always use the version history, yes, but that's more work for the
>>>> reader.) Maybe this is something more interesting to a historian like
>>>> me than to other people. :-)
>>>
>>> Me too, that would be excellent :) Not sure how light on the database it
>>> could be made, but it shouldn't be too hard to make static pages frozen
>>> at a certain point of time.
>>
>> Like http://nostalgia.wikimedia.org, but for more dates? :)
>>
>> _______________________________________________
>> foundation-l mailing list
>> foundation-l[at]lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>>
>
>
>
> --
> Ziko van Dijk
> NL-Silvolde
>
> _______________________________________________
> foundation-l mailing list
> foundation-l[at]lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



--
Ben Yates
Wikipedia blog - http://enotes.com/blogs/wikipedia

_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


luca at dealfaro

Aug 27, 2008, 10:02 PM

Post #18 of 22 (487 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

I would really like to be able to access at least the _previous_ dumps: this
can be very useful when the current dumps are either still running, or are
aborted/stopped (as many of them are right now due to disk space issues).
Are the previous dumps available anywhere?

Luca

On Wed, Aug 27, 2008 at 9:50 PM, Ben Yates <ben.louis.yates[at]gmail.com>wrote:

> What would be ideal is a client-side wiki reader that could load past
> revisions at runtime.
>
> On Mon, Aug 25, 2008 at 12:26 PM, Ziko van Dijk <zvandijk[at]googlemail.com>
> wrote:
> > Oh, this nostalgia wp still exists, yes.
> >
> > I thought about a tool or a user surface where I simply type
> > "2003-01-01" (as an example) and Wikipedia will show me the articles
> > from that point of time. I understand that there might be problems
> > with deleted images, merged articles, right. But it would still be
> > interesting enough, certainly the older Wikipedia grows. I do not know
> > so much about technical matters, but I can not imagine that such a
> > tool would be very complicated. (?)
> >
> > Greetings
> > Ziko
> >
> >
> > 2008/8/25 phoebe ayers <phoebe.wiki[at]gmail.com>:
> >> On Fri, Aug 22, 2008 at 5:12 AM, Nikola Smolenski <smolensk[at]eunet.yu>
> wrote:
> >>> Ziko van Dijk wrote:
> >>>> Once I had this idea: a tool that shows Wikipedia at a certain, chosen
> >>>> point of time. For example, I'd like to browse through Wikipedia
> >>>> seeing always the state of January 1st 2003. Image if Wikipedia were
> >>>> already decades old and we could read the state of 1965. (One can
> >>>> always use the version history, yes, but that's more work for the
> >>>> reader.) Maybe this is something more interesting to a historian like
> >>>> me than to other people. :-)
> >>>
> >>> Me too, that would be excellent :) Not sure how light on the database
> it
> >>> could be made, but it shouldn't be too hard to make static pages frozen
> >>> at a certain point of time.
> >>
> >> Like http://nostalgia.wikimedia.org, but for more dates? :)
> >>
> >> _______________________________________________
> >> foundation-l mailing list
> >> foundation-l[at]lists.wikimedia.org
> >> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >>
> >
> >
> >
> > --
> > Ziko van Dijk
> > NL-Silvolde
> >
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l[at]lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
>
>
>
> --
> Ben Yates
> Wikipedia blog - http://enotes.com/blogs/wikipedia
>
> _______________________________________________
> foundation-l mailing list
> foundation-l[at]lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


nemowiki at gmail

Aug 28, 2008, 1:07 AM

Post #19 of 22 (484 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

>Are the previous dumps available anywhere?

Click "previous dump" or "last dump". For some projects (e.g. itwikiquote), there are five old dumps.

Nemo
_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Aug 28, 2008, 5:12 AM

Post #20 of 22 (479 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

Client-side might be ideal but also a lot harder to make. Something
live-mirrorish would be fairly easy, but would of course violate the "no
live mirrors" rule. To go completely server-side would require a *lot* of
disk space, and/or some tricky db compression which would eat up lots of CPU
cycles. Plus you'd need to find or make a full history dump. I guess if
you've got the bandwidth, disk space, and/or CPU cycles to spare you could
relatively easily scrape up your own full history dump, though.

A mediawiki extension probably wouldn't be too hard, but I don't know
mediawiki well enough to be volunteering. For performance reasons it might
require a new db table/column/index. I don't think mediawiki tables are
optimized for looking up the latest version of a page on a particular date.

I might try hacking up a live-mirrorish version next time I get enough free
time. Lets see - I'd have to find the right templates, article, categories,
and images, presumably working from the stub dump, and then merge them all
together. Anything else? Historical skins would be nice but unnecessary,
historical parsing algorithms would be cool but probably overkill. Anyone
have a tool to recursively parse templates? I always get stuck there trying
to make a perfect parser. On a similar note, is there a standalone parser
yet, or would I have to import it all into a database?

Seems neat, though. One thing that comes to mind is checking out various
articles on the days on and around 9/11/01.

Anthony

On Thu, Aug 28, 2008 at 12:50 AM, Ben Yates <ben.louis.yates[at]gmail.com>wrote:

> What would be ideal is a client-side wiki reader that could load past
> revisions at runtime.
>
> On Mon, Aug 25, 2008 at 12:26 PM, Ziko van Dijk <zvandijk[at]googlemail.com>
> wrote:
> > Oh, this nostalgia wp still exists, yes.
> >
> > I thought about a tool or a user surface where I simply type
> > "2003-01-01" (as an example) and Wikipedia will show me the articles
> > from that point of time. I understand that there might be problems
> > with deleted images, merged articles, right. But it would still be
> > interesting enough, certainly the older Wikipedia grows. I do not know
> > so much about technical matters, but I can not imagine that such a
> > tool would be very complicated. (?)
> >
> > Greetings
> > Ziko
>
_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


wikimail at inbox

Aug 28, 2008, 5:14 AM

Post #21 of 22 (481 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

On Thu, Aug 28, 2008 at 8:12 AM, Anthony <wikimail[at]inbox.org> wrote:

> Lets see - I'd have to find the right templates, article, categories, and
> images, presumably working from the stub dump, and then merge them all
> together. Anything else?
>

Red vs. Blue links... Boy, I hope there's a standalone parser.
_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


rarohde at gmail

Aug 28, 2008, 12:12 PM

Post #22 of 22 (481 views)
Permalink
Re: Historical wikipedia dumps [In reply to]

On Thu, Aug 28, 2008 at 5:14 AM, Anthony <wikimail[at]inbox.org> wrote:
> On Thu, Aug 28, 2008 at 8:12 AM, Anthony <wikimail[at]inbox.org> wrote:
>
>> Lets see - I'd have to find the right templates, article, categories, and
>> images, presumably working from the stub dump, and then merge them all
>> together. Anything else?
>>
>
> Red vs. Blue links... Boy, I hope there's a standalone parser.

You also need the correct versions of the CSS and JS files, which is
pain since those file locations have changed over time. If you wanted
to be really thorough you'd have to look at the Mediawiki space as
well (for example to capture the evolution of the sidebar), but that
has the extra wrinkle that the way the Mediawiki engine parses content
in the Mediawiki space has also evolved over time.

Being completely accurate would be nearly impossible, but one could do
a good approximation with enough effort.

-Robert Rohde

_______________________________________________
foundation-l mailing list
foundation-l[at]lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.