[Welcome to Sensei's Library!]

StartingPoints
ReferenceSection
About


Referenced by
AccessBlocked

 

Why Mirroring Is Bad
    Keywords: SL description

Sometimes people would like to have a local copy of Sensei's Library. That's fine, as the content here is published under the Open Content License (see SLCopyright) and thus copying is allowed.

However, instead of downloading a ready packed archive at SLSnapshot some people are mirroring SL with a web copy tool.

So, why is this bad?

First of all, most scripts are just too dumb to follow even the most fundamental principles of mirroring such as obeying robots.txt. (robots.txt is a file that tells copy programs which links to follow and more importantly which links not to follow).

Furthermore, some scripts issue many requests/second. As building a library page requires some expensive database lookups, this requires many resources. Resources our server doesn't have. Therefore, SL shields itself by limiting the number of requests/minute before blocking an IP address.

But even if you deploy a nice script, which obeys robots.txt and issues only one request every 3 seconds, you should refrain from mirroring SL.

Why? Because, by using GuidedTours, Aliases, and similar features, a single page can have two or more different URLs. Thus the tool downloads the same page several times. Furthermore, the snapshot is compressed.

In effect, in order to make a web copy you download some 100MB over at least 6-7 hours (anything faster and your script gets blocked). That's as much traffic as 150 users would generate. Compare this with downloading a single file, size 6MB from SLSnapshot.

Oh yes, and someone (Arno) has to pay for the network traffic.

Free (as in "free speech") is not the same as free ("as in free beer").

--ArnoHollosi



This is a copy of the living page "Why Mirroring Is Bad" at Sensei's Library.
(OC) 2003 the Authors, published under the OpenContent License V1.0.