Page 2 of 2

hmm

Posted: Mon Apr 18, 2005 9:28 am
by shahriar_manzoor
I am looking like a fool to myself let alone others. Probably I am misinformed by someone who has seen the IOI live.

About open voting, how is it ensured that the person who is voting understands all the problems well :) because it requires a lot of ability. I cannot say confidently that I understand all problems.

In some countries a person does the bulk of the job and another person becomes the IOI leader (Please note that I am not talking about Bangladesh). So how can you ensure that the voting is fair? I mean the leader in capable of voting for the problemset.

hmm

Posted: Mon Apr 18, 2005 1:36 pm
by shahriar_manzoor
We always like to think from our own point of view and I am know exception. The reason I like ICPC policy more is because I was able to be associated with it by submitting a single problem and not meeting people any person of ICPC in person a single time. Can someone do that for IOI? I have no plans (nor I have the potential as I have very little idea about the problems of IOI) to be associated with IOI but if someone quite good problemsetter wants to be associated with IOI can he do it like I did, without the help of a middle man? Because no matter how democratic IOI is its grassroot level may not be that democratic :).

Re: hmm

Posted: Mon Apr 18, 2005 1:52 pm
by gvcormac
shahriar_manzoor wrote: About open voting, how is it ensured that the person who is voting understands all the problems well :) because it requires a lot of ability. I cannot say confidently that I understand all problems.

In some countries a person does the bulk of the job and another person becomes the IOI leader (Please note that I am not talking about Bangladesh). So how can you ensure that the voting is fair? I mean the leader in capable of voting for the problemset.
I think that's a big problem. The GA don't really have the time or the expertise to do a detailed analysis of the problems. There's a fair amount of tension I would say, and the balance has swung between them tinkering and second-guessing the SC and them merely rubber-stamping or feeling railroaded.

In my opinion, this meeting can provide "openness" to the process but it is way too late in the process to make meaningful amendments to the problems. The SC does prepare alternate problems, so if the GA feels a problem is inappropriate for whatever reason, they can reject it and an alternate is substituted.

Re: hmm

Posted: Mon Apr 18, 2005 1:55 pm
by gvcormac
shahriar_manzoor wrote:We always like to think from our own point of view and I am know exception. The reason I like ICPC policy more is because I was able to be associated with it by submitting a single problem and not meeting people any person of ICPC in person a single time. Can someone do that for IOI? I have no plans (nor I have the potential as I have very little idea about the problems of IOI) to be associated with IOI but if someone quite good problemsetter wants to be associated with IOI can he do it like I did, without the help of a middle man? Because no matter how democratic IOI is its grassroot level may not be that democratic :).
There's a call for problems for the IOI. The deadline is December, I think. As far as I know, submitters don't have much to do with their problems after they're submitted. They don't become judges as they often do in ICPC.

One thing I can do is to make sure that you know next year about the call for problems.

Re: hmm

Posted: Mon Apr 18, 2005 1:59 pm
by misof
First of all I have to say that I agree with almost everything Gordon said here. Making test data from important contests open is a Good Thing.
gvcormac wrote:[At the IOI] Judging is automated, but the ISC and SC are both involved. The raw results (and test cases) are released to competitors, who may appeal their scores. Appeals are considered and final results are given.

So making the test cases available is very much part of the adjudication process.
Here one has to note that if an error is found in the test data at the IOI (and I'm aware of several occasions when a bug WAS found), rejudging the corresponding problem is 100% fair. This is because the contestants submit all their solutions before testing takes place. If the test data had been correct all the time, all contestants would submit the same solutions.

At the ACM, the situation is more difficult. If a bug in the test data is found after the contest, there is no 100% fair solution. Keeping the results may punish teams that actually solved the problem. Rejudging may lead to lots of "if-we-had-known" issues. There are various tradeoffs possible, but the main issue remains the same: A bug in ACM test data may influence the teams during the contest.

The best way out is (of course :)) making all test data flawless. And (in my opinion) if you know that after the contest everything will be available, it will encourage you to spend more time checking whether everything is correct.

[As a side note, it may not necessarily be the case. Recently I tried to use a Rocky Mountains Regional set (RMRC 2003) as a training set for my teams... and saying that it was flawed is an understatement. Don't repeat my mistake :P]

Re: hmm

Posted: Mon Apr 18, 2005 2:14 pm
by shahriar_manzoor
misof wrote: The best way out is (of course :)) making all test data flawless. And (in my opinion) if you know that after the contest everything will be available, it will encourage you to spend more time checking whether everything is correct.

[As a side note, it may not necessarily be the case. Recently I tried to use a Rocky Mountains Regional set (RMRC 2003) as a training set for my teams... and saying that it was flawed is an understatement. Don't repeat my mistake :P]
Yes I was also talking about this situation. When people start taking contest as just another eight lab assignments, mistakes will be there. And now often I have seen people say "That is a small mistake, the contestants should have guessed it." is simply remarkable. If regions like rocky mountain (A US region and US Comp science schools r world famous) make gross mistakes, It will be ridiculous to think that many new regions will not do the same. For example although we had a number of people in Dhaka to work for the regionals, without the help of Derek Kisman and Jimmy Mardell it would have been difficult for us to make the entire contest error free (It would then have been a contest with beautiful problems but 1/2 mistakes).

Re: hmm

Posted: Mon Apr 18, 2005 2:18 pm
by misof
shahriar_manzoor wrote:So how can you ensure that the voting is fair? I mean the leader in capable of voting for the problemset.
A very good question indeed. No offense intended, but many of them are not capable of doing it [in the available niche of time].
gvcormac wrote:I think that's a big problem. The GA don't really have the time or the expertise to do a detailed analysis of the problems. There's a fair amount of tension I would say, and the balance has swung between them tinkering and second-guessing the SC and them merely rubber-stamping or feeling railroaded.

In my opinion, this meeting can provide "openness" to the process but it is way too late in the process to make meaningful amendments to the problems. The SC does prepare alternate problems, so if the GA feels a problem is inappropriate for whatever reason, they can reject it and an alternate is substituted.
Rejected? Never happened (for any other reason than "we had this problem in our national competition). And it will never happen. To put it bluntly, most of the GA is happy with agreeing on the problemset as soon as possible, translating it and going to sleep.

At the last IOI I tried to object against one of the problems with the reason that it is not suitable for the IOI. Even as I was doing it I knew that it (sadly) was only a formal protest. There was nothing I (as a member of the GA) could do to get my point through and to make GA reject the problem. (Even though it turned out my objection was correct. You may have read the analysis I wrote after the last IOI: http://people.ksp.sk/~misof/ioi/tasks.html )

In my eyes, the only role of the GA in the process of selecting competition tasks currently reduces to: "check whether you have already seen this task". If I want to change something I don't like, GA is not the place to be. (In fact, I'm thinking about a candidacy to be an ISC member, but that's a different story ;))

Re: hmm

Posted: Mon Apr 18, 2005 2:24 pm
by gvcormac
misof wrote: Here one has to note that if an error is found in the test data at the IOI (and I'm aware of several occasions when a bug WAS found), rejudging the corresponding problem is 100% fair. This is because the contestants submit all their solutions before testing takes place. If the test data had been correct all the time, all contestants would submit the same solutions.
An excellent point. The IOI (currently) does all judging after the end of the contest, so fixing errors is easier to do. But we are discussing possibilities (not for this year, but maybe for the future) of giving competitors more feedback in IOI, and the same sort of problems will arise.
misof wrote: At the ACM, the situation is more difficult. If a bug in the test data is found after the contest, there is no 100% fair solution. Keeping the results may punish teams that actually solved the problem. Rejudging may lead to lots of "if-we-had-known" issues. There are various tradeoffs possible, but the main issue remains the same: A bug in ACM test data may influence the teams during the contest.

The best way out is (of course :)) making all test data flawless. And (in my opinion) if you know that after the contest everything will be available, it will encourage you to spend more time checking whether everything is correct.
Tom Verhoeff wrote some guidelines for problem sets. It might be a good idea to put together some guidelines for test data and judge solutions. Then the trick would be to try to ensure they were followed.

No matter how diligently one tries to create correct data, errors may occur. As you point out, there's no fair way to address this. Waterloo have been the beneficiary and the victim of post-contest rejudging. (Beneficiary because our correct solution was later judged correct; victim because we figured out the judges' error during the contest but later those who didn't were awarded credit anyway.) We have also been the victim of in-contest rejudging, losing a balloon after a couple of hours.
misof wrote: [As a side note, it may not necessarily be the case. Recently I tried to use a Rocky Mountains Regional set (RMRC 2003) as a training set for my teams... and saying that it was flawed is an understatement. Don't repeat my mistake :P]
Expect a big improvement in Mountain problemsets. As of 2004, Howard Cheng is the chief judge. There's still inertia but the problem set was much nicer this year and the data was correct, if too easy. Howard represented Alberta at the 1998 world finals (8th place) and was a chief judge and judge for ECNA for many years. Now he's back in Mountain region.
[/quote]

Re: Publishing tests after the contests

Posted: Mon Apr 18, 2005 2:31 pm
by misof
gvcormac wrote:2. I promised to put together a proposal to make the ACM contest more "spectator friendly." This proposal will have several components, starting with simply making sure that scoreboards appear in predictable places. More sophisticated things that might be done could involve commentary - viewing either submissions or work-in-progress, with experts explaining strategies, errors, likely outcomes. This could be done using multimedia or just a blog. For an analogy, think of television coverage of chess, billiards, or other sports that would be mind-numbing without real-time analysis.
As Shahriar already mentioned, TopCoder is a really spectator-friendly contest. (Recently I was at the TCCC'05 finals, so I have some first-hand experience :))

One of the major problems I see with making the ACM contests more spectator-friendly is their length. The whole contest round at TopCoder takes approximately 2 hours and consists of different phases (coding, challenge, waiting for the systests). The ACM contest is plain 5 hours of coding time. IMHO nobody will ever spend the entire 5 hours by looking at people writing code.

Do you already have any ideas on how to address this situation?

Re: hmm

Posted: Mon Apr 18, 2005 2:44 pm
by gvcormac
misof wrote: In my eyes, the only role of the GA in the process of selecting competition tasks currently reduces to: "check whether you have already seen this task". If I want to change something I don't like, GA is not the place to be. (In fact, I'm thinking about a candidacy to be an ISC member, but that's a different story ;))
Please do run for ISC. Lots of people, including myself, are aware of the current structural and political difficulties. How the structures should be changed is another matter. Since I'm an elected member I consider it my role to represent you and the rest of the GA in this process. We probably don't want to go into all the details of IOI organization and potential improvements here - drop me a line. I leave for the meeting in about 36 hours ...

I glanced at your problem analysis of the 2004 contest and I'll look at it further. One comment I have is that I think that throughout the years efficiency has become the paramount evaluation criterion at the expense of correctness. Algorithms that don't work can get most of the marks, while correct ones get almost nothing. This must be changed. Also, competitors are in the dark about what sort of test cases they might expect. This also must be changed.

I'm only one member but I think I'm not alone in my views. And things won't change overnight.

Re: Publishing tests after the contests

Posted: Mon Apr 18, 2005 2:47 pm
by shahriar_manzoor
Well first I would like to say was TCCC05 was the contest where only one problem was solved in the final contest? Was the problemset fair? Or was it way too difficult? What I mean that we sometimes need to critisize topcoder as well :).

I think five hour contest is fair because then you can come back from a disaster. In Topcoder often I see very good coders to badly: like Derek Kisman out of top 20. In a five hour contest you can never expect Derek out of first 20 can you?

Re: Publishing tests after the contests

Posted: Mon Apr 18, 2005 3:10 pm
by gvcormac
shahriar_manzoor wrote:Well first I would like to say was TCCC05 was the contest where only one problem was solved in the final contest? Was the problemset fair? Or was it way too difficult? What I mean that we sometimes need to critisize topcoder as well :).

I think five hour contest is fair because then you can come back from a disaster. In Topcoder often I see very good coders to badly: like Derek Kisman out of top 20. In a five hour contest you can never expect Derek out of first 20 can you?
I'd be happy to criticize TopCoder. But I won't; at least not now.

I think "unbalanced" is a better word than "unfair." I reserve unfair for cases in which one contestant or group has a particular advantage or disadvantage.

TopCoder has two forms of spectator interaction: on-site, and on-line. I've never been on-site, but I've heard accounts. That seems to work pretty well. On-line is also interesting - people follow along in the chat rooms, but don't have access to the problems so there is much speculation. One critical problem is that you have to be a member to spectate. I was once shut out of watching a Google Code Jam because I didn't register and it used a different version of the TopCoder software. Competitors and spectators are drawn from different populations.

Re: Publishing tests after the contests

Posted: Mon Apr 18, 2005 3:23 pm
by misof
shahriar_manzoor wrote:Well first I would like to say was TCCC05 was the contest where only one problem was solved in the final contest? Was the problemset fair? Or was it way too difficult? What I mean that we sometimes need to critisize topcoder as well :).
Well, in my opinion the problemset in the finals was difficult, maybe too difficult, but not way too difficult. All of the problems were solvable... but not at that time, under all the pressure and with so little time on our hands. Still, I think that making the last problemset this hard is a good decision -- the finals shouldn't be a contest in fast typing.
shahriar_manzoor wrote:I think five hour contest is fair because then you can come back from a disaster. In Topcoder often I see very good coders to badly: like Derek Kisman out of top 20. In a five hour contest you can never expect Derek out of first 20 can you?
Yeah, on the skill vs. luck scale, in ACM a little bit less luck is involved. (Failing to solve 1 of 8 problems matters less than failing to solve 1 of 3. Also, in ACM you have the chance to fix your mistakes -- and this may be the biggest difference.)

On the other hand, the form of the TopCoder contest is the result of a different tradeoff -- the tradeoff between fairness and attraction. By making the contests shorter, they are more attractive to watch, and also they may be more attractive to participate in -- even a less skilled contestant can be lucky and place well if the 3 given problems suit him. (This is mainly the case with the practice matches, sometimes the whole problemset covers a single topic, e.g. formal languages. The competition sets tend to be more balanced.)

And about Derek (aka SnapDragon)... well, I don't really remember seeing him out of top 20 anywhere :D

Posted: Mon Apr 18, 2005 5:53 pm
by Adrian Kuegel
And about Derek (aka SnapDragon)... well, I don't really remember seeing him out of top 20 anywhere
Last time that this happened was in SRM 218 (27th place) (for SnapDragon, 11 SRMs ago). So I wouldn't say it happens often. But of course that is a subjective view; after all, one can say, even one time is too much.

Posted: Tue Jan 03, 2006 12:51 am
by subbu
After going through misof's article, I had a thought about how judging can be done at the IOI to promote correctness ahead of heuristics.

For each problem, have 3 or 4 different input files, each containing multiple test cases, each file testing a particular aspect of the solution.

A solution will be awarded points for that file iff it clears ALL the testcases in that file.

For example:
Input File 1: weight 60 , Efficient AND Correct
Input File 2: weight 30 , Correct AND not efficient
Input File 3: weight 10 . Heuristic

The weighting and the categories of the input files can depend on the problem, and the expected type of solutions that contestants will come up with (fast heuristic etc..),
and the relative weights can reflect what the judges percieve as more important (correctness or efficiency).

The probability of a heuristic producing correct answers for a whole file will surely be less than for a particular test case, however well chosen that testcase might have been.