Page 1 of 2

more on user statistics

Posted: Wed Jul 17, 2002 6:04 pm
by xenon
Inspired by an earier thread on this site by Caesum, I thought about how nice it would be to have more detailed user statistics. I've been enjoying solving contest problems for more than 2 months now, and the thing that I miss most, is the possibility to directly link to problems I haven't solved yet. Also I like to know how good my solutions are compared to others.
So I wrote my own program for it, and I must admit, it got a little out of hand. It directly reads the problemstat pages from the acm-host, and compiles detailed statistics for every problem it finds. The output is a set of HTML-pages.
An example of what it produces can be found here http://joachim.wulff.net/valladolid/userstat.html.
The program executable and the source files are contained in this zip-file http://joachim.wulff.net/valladolid/getstats.zip.
It runs in a DOS box under WIN98 and needs access to the internet. See the readme.txt for details.

The bad thing is that it has to read all problemstat data from the site, and that is a bulky 45 Megs, currently. Also it strongly depends on the current layout of the website.

The good thing is that it shows detailed information (at least the information I like to see), and it has options to do quicker partial scans per volume; you don't have to update all the data every time you use the program.

Well, I hope to get your comments. Btw: I don't supply support! And: I'm not responsible for anything that can go wrong, nor for heavy traffic on the ACM-host.

Posted: Wed Jul 17, 2002 6:38 pm
by Caesum
Wow xenon, just what I need to add to my armory ;)

When I run it though I get as far as:

UHTMPAGE: connecting to acm.uva.es
UHTMPAGE: host connected
UHTMPAGE: accessing page /cgi-bin/OnlineJudge?ProblemsList
UHTMPAGE: bytes recieved 0

and thats it. Not sure why this happens. I'm on cable and my ISP implements a transparent proxy which is not always so transparent, my firewall is zonealarm and when it pops up I allow the program to access the site...... cant think of anything other possible reasons at the moment....

Posted: Thu Jul 18, 2002 12:21 am
by Caesum
The only thing that looks funny to me is the page request code:
[pascal]
WebRequest := 'GET '+pagename+' '+char(10)+char(13)+chr(0);
[/pascal]
whenever i have done a page request in code before i have always used
HTTP/1.0 at the end of the GET, and added a few more lines like
GET /cgi-bin/OnlineJudge?ProblemsList HTTP/1.0
Accept: */*
Referer: http://acm.uva.es/
Accept-Language: en-gb
User-Agent: you_wish [en] (SomeO/S; Blah; Sowhat)
Host: acm.uva.es
Anyone else using this program ?

Great

Posted: Thu Jul 18, 2002 4:59 am
by wyvmak
it's really good that you have this stats page, but i'm afraid i couldn't get the connection, with a proxy and on my win2k. i don't know Pascal (i forget them already). but it looks like stopped after "UHTMPAGE: connecting to acm.uva.es", maybe my internet connection is problematic by itself. but it's really inspiring, truly. but i have the following thoughts:
1. it looks that if to measure the difficulty of a problem, your using of number of solvers seems a more accurate way.
2. if a user has the same name (or same display name?) as another user in the ranklist, then there would be a problem, (at least on determine the rank), isn't it?
3. you've filtered judge-not-available problem, that's a good feature.
4. using average running time for comparison seems a bit something to me (i cannot find the word), though, i cannot think of another.

Hmmm

Posted: Thu Jul 18, 2002 10:31 am
by xenon
Caesum:
That's the problem with using code you don't thoroughly understand...
As far as I can see, based on the output you supply, the program issues a receive request, and waits forever. Recv() is really a call to 'recv' in WINSOCK.DLL, and it should time out after some seconds if the server can't deliver the requested page in time (afaik). My program should give an error message if this happens, so since it doesn't, I conclude that recv doesn't time out. Strange.
I don't think the format of the GET command is wrong. At least the acm host knows how to handle it. I can use Telnet to log into acm.uva.es, port 80, and type in 'get /cgi-bin/OnlineJudge?ProblemsList' and then get the page-data allright. The remainder (HTTP/1.0 & data fields) is optional and is normally supplied by the browser, I think.
So maybe there's an intervening proxy along the route? I also use zonealarm, but have static IP. I'll try my program using a modem, and see what happens.
I'm clearly in the dark here :-? If I use my program with your userid, it works OK.

wyvmak:
2. I didn't think of that before, but you're right. I guess it'll use the the last occurrence in every list for it's statistics. I'm not shure ACM accepts duplicate names, but it might. Bad luck.
4. Well, it's just something, and that's all it is. Love it or leave it. I think the median value, or the average of the middle 50%, would be better values to compare with (some problems have improbable extremes, like 0 secs for #333, and times above 30 secs), but I don't care too much.

If anybody can help me find some code to more reliably read webpages (C, C++, Pascal, Assembler), than I would be much obliged.

-xenon

Posted: Thu Jul 18, 2002 12:17 pm
by Adrian Kuegel
I have used your program, and it did work. But my first try was not succesful (it was at 22.00 judge time). I think the best time to use this program is after 0.00 judge time.

Posted: Thu Jul 18, 2002 1:40 pm
by AlexandreN
I have used your program and get the above output: :(

UHTMPAGE: connecting to acm.uva.es
UHTMPAGE: host connected
UHTMPAGE: accessing page /cgi-bin/OnlineJudge?ProblemsList
UHTMPAGE: bytes recieved 0
UHTMPAGE: page read complete
UHTMPAGE: closing host... done

SERIOUS ERROR: Problems list not found on host
Program halting.

possible bug, new version

Posted: Thu Jul 18, 2002 2:12 pm
by xenon
Thanks Adrian, at least it works sometimes...

Caesum:
I think I found a possible bug. As your quote indicates, a HTTP-request can be a multi-line package, so it needs a way to indicate the end of the request. Most probable this is done by adding an empty line. (I checked the junkbuster source code (a great source for wannabe sockets programmers) and they allways end their requests with an extra CRLF).
So I changed my code:[pascal]WebRequest := 'GET '+pagename+' '+char(10)+char(13)+char(10)+char(13)+chr(0);[/pascal]I recompiled and put the new version on the above stated link. Would you be so kind to download it and test it?
The reason the previous version worked here and not from your PC could be the 'transparent' proxy. The ACM host just times out waiting for the extra empty line and sends the requested data anyway. Your proxy, however, waits forever for the end-of-request signal before sending it through to the Judge host. Sounds plausible?
Anyway, I'm anxiously awaiting your results.
Re the 'HTTP/1.0' addition to the request: my wild guess is that it makes the server send a reply header (with date, checksum, server version, etc.) prepended to the page data. We don't need them, so we don't ask for them. I don't get them anyway without the 'HTTP/1.0' addition.

AlexandreN

Posted: Thu Jul 18, 2002 2:20 pm
by AlexandreN
Yes, my id is 3590, it seems like the system cannot read /cgi-bin/OnlineJudge?ProblemsList

D:\acm\util>getstats 3590 0
UHTMPAGE: connecting to acm.uva.es
UHTMPAGE: host connected
UHTMPAGE: accessing page /cgi-bin/OnlineJudge?ProblemsList
UHTMPAGE: bytes recieved 0
UHTMPAGE: page read complete
UHTMPAGE: closing host... done

SERIOUS ERROR: Problems list not found on host
Program halting.

Posted: Thu Jul 18, 2002 2:23 pm
by Ivor
I don't know what's wrong with other people but I just got my general info witout any problems. Works fine, looks fine. Thanks.

Ivor

Posted: Thu Jul 18, 2002 6:36 pm
by Caesum
Xenon,

Yes! working now, as you can see my ISPs transparent proxy is not very transparent (and this isnt the only instance where its transparentness shows up :( )

and completed in between 10 and 15 minutes :)

Posted: Thu Jul 18, 2002 7:41 pm
by wyvmak
i don't know why, i still cannot use it. therefore, i wrote my own version, with fewer features. it's slow, plus i put some sleep() in the code.

source code at:
http://www.zdtech.net/~vincent/acm_stat.cpp
i'm not sure how long i'd put it up. i'm sorry. the code is a bit naive, it works under Linux, i think it won't work if with proxy or firewall (though i haven't tested it). also, any comment on my code would be welcome.

my stats at:
http://www.zdtech.net/~vincent/acm.txt

it looks that most of my ranking aren't high (which is quite disappointing to me).

Posted: Thu Jul 18, 2002 11:06 pm
by xenon
Well Caesum, I'm glad it works now. 't Was the extra CRLF which, I guess, is part of the official HTTP standard :)
Your proxy migh not be that transparent, but at least your connection is twice as fast as mine: it takes me 30 mins for a full scan.

wyvmak: My code will never work under Linux, since it uses WINSOCK.DLL and SOCKETS.DLL. I guess a port is possible, in principle, because Free Pascal also comes for Linux. Looking at your code, there is not much difference in the way Internet is accessed, so the adjustments will be small. I currently have no Linux installed, so I won't be able to do it myself.

Enjoy,
-xenon

Posted: Fri Jul 19, 2002 3:30 am
by wyvmak
actually, i tried your code on my win2k, not on Linux.

>getstats 5656
UHTMPAGE: connecting to acm.uva.es
UHTMPAGE: host connected
UHTMPAGE: accessing page /cgi-bin/OnlineJudge?ProblemsList
UHTMPAGE: bytes recieved 0

then i waited for a minute, but it still kept there. do you know why? or I am just impatient to wait? i write on Linux, as i'm not good at sockets programming, and comparatively Linux seems a easier platform for me to write on.

Posted: Fri Jul 19, 2002 3:07 pm
by xenon
AFAIK you shouldn't have to wait more then a few seconds, otherwise the program is stuck waiting for the page. Are you sure you have the latest version of my program (as uploaded yesterday)? I fixed a small bug to cope with some proxies, and as a side effect it performs twice as fast (if your internet connection is fast enough).
I just tested the program under win2k, and it works fine.

I'm off on holiday now,
-xenon