Why Mirroring Is Bad

    Keywords: SL description

Sometimes people would like to have a local copy of Sensei's Library. That's fine, as the content here is published under the Open Content License (see SLCopyright) and thus copying is allowed.

However, instead of downloading a ready packed archive at SLSnapshot some people are mirroring SL with a web copy tool.

So, why is this bad?

Automated mirrors are often stupid

First of all, most scripts are just too dumb to follow even the most fundamental principles of mirroring such as obeying robots.txt. (robots.txt is a file that tells copy programs which links to follow and more importantly which links not to follow).

Automated mirrors are often greedy

Furthermore, some scripts issue many requests/second. As building a library page requires some expensive database lookups, this requires many resources -- resources our server doesn't have in abundance. Therefore, SL shields itself by limiting the number of requests/minute before blocking an IP address.

Automated mirrors are always wasteful

But even if you deploy a nice script, which obeys robots.txt and issues only one request every 2 seconds, you should refrain from mirroring SL.

Why? Because, by using GuidedTours, Aliases, and similar features, a single page can have two or more different URLs. Thus the tool downloads the same page several times.

Furthermore, mirroring must get the page in uncompressed data, whereas the snapshot is compressed.

In effect, in order to make a web copy you download over 1GB over at least 12 hours (anything faster and your script gets blocked). Compare this with downloading a single file, size 50MB from SLSnapshot.

--ArnoHollosi


Also when downloading the snapshot, you get cool speeds like 250kB/s. So your download is done in about 3 sips of coffee.

--HansWalthaus


Mirroring as SL allows one to do implement the mirroring is bad for all the reasons stated above. However mirroring can be done in such a way that it does not cause any of the problems listed above. This type of mirroring, good mirroring, requires either software installed on the SL server or daily collections of differences placed on the server.

Software to be installed on the server to support the first form of good mirroring is either [ext] CVS or [ext] rsync

So dont mirror SL. Unless the time comes that SL is configured to support good mirroring. --Velobici


See also: AccessBlocked


Why Mirroring Is Bad last edited by ArnoHollosi on October 14, 2007 - 13:11
RecentChanges · StartingPoints · About
Edit page ·Search · Related · Page info · Latest diff
[Welcome to Sensei's Library!]
RecentChanges
StartingPoints
About
RandomPage
Search position
Page history
Latest page diff
Partner sites:
Go Teaching Ladder
Goproblems.com
Login / Prefs
Tools
Sensei's Library