Tuesday, December 30, 2014

How to mine a ladder.

Once upon a time I used to play Moria, then, a decade later, its successor Angband. I was playing in earnest, and though I never finished Moria, being killed sooner or later invariably,  I've beaten Angband with a warrior, allegedly more difficult class than casters or rangers. Well, they have special name for such folks, nerds^H^H roguelike fans, and a special place to keep their feats, the Angband ladder. My victory, is not it glorious! More than ten years have passed and it is still there, thanks to internets and to Páv Lučištník.

Yet another decade later I decided to revive my old thrills and sorrows and to try to kill the lurking dungeon boss once again, winter holidays and such. The problem was which game to choose, the latest Angband release or one of the variants. There are lots of them. I thought I should look what other peoples choose now, what is cool, what is dead and to do that I shall use Angband ladder data. It has assembled 12839 entries since 2002 and continues to grow, 7 entries added last weekend (turned out to be even more, on that later)  And of course, mere web interface of the ladder wouldn't do. Business means sorting, grouping, dicing, slicing and, mind you, diagramming until one has all the information for this problem of choice. So the little project begun...

First thing was to collect the site data, and it better be with just one try. Nobody likes their sites being pestered by someone's test runs. The simpler the grabber, the less moving parts are in there, the better. It might write the entire HTML pages to output but then I decided to get only table rows with data. It's nicer looking that way, one can tell by line count how many records are there, and it streamlines the further processing. Also I decided I won't need the character dumps themselves. So, the grabber (4th iteration, so much for "one try" approach):
  
grab.js:
function getText(strURL) // MSDN example
{
    var strResult;
    try
    {
        var WinHttpReq = new ActiveXObject("WinHttp.WinHttpRequest.5.1");
        var temp = WinHttpReq.Open("GET", strURL, false);
        WinHttpReq.Send();
        strResult = WinHttpReq.ResponseText;
    }
    catch (objError)
    {
        strResult = objError + "\n"
        strResult += "WinHTTP returned error: " +
            (objError.number & 0xFFFF).toString() + "\n\n";
        strResult += objError.description;
    }
    return strResult;
}

function hasData(s)
{
  return s.indexOf("href='ladder-show.php?") != -1;
}

function getLadderPages(url)
{
    var o=0;

    for (;;)
    {
        var s = getText(url+"o="+o);
        if (hasData(s))  // o still in range ?
        {
           var re = /<tr.*?<\/tr>/g;
           var result = re.exec(s);
           while (result != null)
           {
              var ss = result[0];
              if (hasData(ss))  WScript.Echo(ss);  // print data lines, not headers
              result = re.exec(s);
           }
        }
        else
            break;
        o = o+1;
    }
}

getLadderPages("http://angband.oook.cz/ladder-browse.php?");
It's a Windows machine, you see.
Now in the command prompt:  cscript /nologo grab.js >dump
After a while there is a 5,5M of a dump, locally stored. I can parse it forward, backward and sideways. That's good.

No comments:

Post a Comment