SECURITY FLAW IN ONLINE JUDGE

General topic about Valladolid Online Judge

Moderator: Board moderators

epsilon0
Experienced poster
Posts: 112
Joined: Tue Nov 12, 2002 11:15 pm
Location: Paris, France.

SECURITY FLAW IN ONLINE JUDGE

Post by epsilon0 » Thu Jan 30, 2003 9:20 am

I've discovered a security flaw that lets a malicious hacker with lots of time on her hands to gain knowledge of Online Judge's input on any problem in the sets.

The procedure goes like this:

we can safely assume the input of a problem is a string of ASCII characters, and terminates with either a special sequence of characters or an End Of File.

Thus, this input can be encoded as a binary sequence of bits in any case. The "end of input" can be encoded by any non-ascii byte (ie any byte whose most significant bit is one).

All the malicious hacker has to do then is to write a little program that reads the input of the problem and take the following action:

if the n-th bit of the input is 1, print the message "i'm a hacker".
since this is most likely the wrong answer for the problem, the online judge will send you a WA (Wrong Answer).

if tbe n-th bit of the input is 0, enter an infinite loop, in C: while(1);
the judge will send you a TL (Time Limit) after 10 seconds.

She then repeats the procedure for each bit in the input.

If the input is N bytes, she has to submit her program 8*N times.

since half the bytes are 1 on average, this will take roughly

4*N*10 seconds.

so for 100 bytes of input, the hacker can get the whole input for the problem in 4000 seconds, just more than an hour.

Then our malicious hacker can sell this information for big bucks, or even use it herself to cheat and solve the problem.

QED.

note: a hacker with several identities could obviously do it even faster.
We never perform a computation ourselves, we just hitch a ride on the great Computation that is going on already. --Tomasso Toffoli

Ivan Golubev
Experienced poster
Posts: 167
Joined: Fri Oct 19, 2001 2:00 am
Location: Saint Petersburg, Russia

Post by Ivan Golubev » Thu Jan 30, 2003 10:10 am

Actually TLE is a worst choice -- better to use assert(), division by zero, invalid memory reference, MLE, OLE. So it's possible to get more than one bit of information from one submission. However, most judge's inputs are large, so this won't work. But looks like earlier somebody tried this method to get judge's input...

junjieliang
Experienced poster
Posts: 169
Joined: Wed Oct 31, 2001 2:00 am
Location: Singapore

Post by junjieliang » Thu Jan 30, 2003 12:38 pm

Sure this will work...and most people know about it. Problem is...who will bother? And I for one will definitely not pay for the IO...

minskcity
Experienced poster
Posts: 199
Joined: Tue May 14, 2002 10:23 am
Location: Vancouver

Post by minskcity » Fri Feb 21, 2003 9:07 am

A few days ago I got a mail from someone, who asked me to help him in solving #103 and #102 problems by sending my programs. When I checked his name at OJ site I found out, that he's solved 50 problems in last 48 hrs (total 120 problems in 8 days after joining OJ). :roll:

:wink: There are many easy ways to have problems solved... :wink:

The way discribed above can be used to increase speed of your program. I was thinking about this way a year ago and I'm sure some people use this way to make their programs solve problems really fast... :-?

Not me. Yet? :roll:

Experimenter
Learning poster
Posts: 76
Joined: Thu Mar 13, 2003 5:12 am
Location: Russia

Post by Experimenter » Thu Jun 12, 2003 10:26 am

I thought that judges use random intput to avoid this hacker attempts. Do they?

epsilon0
Experienced poster
Posts: 112
Joined: Tue Nov 12, 2002 11:15 pm
Location: Paris, France.

Post by epsilon0 » Thu Jun 12, 2003 2:18 pm

of course not.

the input has to be the same, otherwise each problem would need a specific judge program.
actually there is one judge program for all problems. it just compares your output against a reference output file. the input file, thus, has to be constant.

there are programs with a special judge program, though.
those are for problems with multiple correct answers. the regular judge cannot work with these problems.
although variable input could be possible with these special problems, i'd bet they dont bother implementing it :D
We never perform a computation ourselves, we just hitch a ride on the great Computation that is going on already. --Tomasso Toffoli

Experimenter
Learning poster
Posts: 76
Joined: Thu Mar 13, 2003 5:12 am
Location: Russia

Post by Experimenter » Thu Jun 12, 2003 4:21 pm

I don't think they were the ones who produced the software for checking the answers, though they might be the ones. I am sure that they use random input because I read it on the site where the software for checking the answers is available, though I don't remember the URL. anyway, it would be really stupid not to use random input.
it would be great if you replied this post. really.

minskcity
Experienced poster
Posts: 199
Joined: Tue May 14, 2002 10:23 am
Location: Vancouver

Post by minskcity » Thu Jun 12, 2003 6:28 pm

I think it's not a problem for online judge to deal with variable input. If he has a right solution for each problem, he can run his solution on random input first and then compare his output with output generated by tested solution. :wink:
it would be really stupid not to use random input.
I hope random input will never be implemented. Because otherwise we all will get more problems with troubleshooting our solutions and we will never be able to find out how fast our solutions are - CPU time can be 1000's or even 1000000's times different for random inputs - rank lists will be completely messed up. :-?

Experimenter
Learning poster
Posts: 76
Joined: Thu Mar 13, 2003 5:12 am
Location: Russia

Post by Experimenter » Thu Jun 12, 2003 6:39 pm

minskcity wrote: I hope random input will never be implemented. Because otherwise we all will get more problems with troubleshooting our solutions and we will never be able to find out how fast our solutions are - CPU time can be 1000's or even 1000000's times different for random inputs - rank lists will be completely messed up. :-?
I don't think so, because the amount of calculation can be the same, just numbers will be different.
what is the difference in CPU time in multiplying this 123 by 345 and 901 by 234. I see no difference.
the same can be done with arrays. Is there any difference in the amount of calculation for finding a reminder got from division of a number stored in the array:
[c]
int array[0xff];
[/c]
by
[c]
int c;
[/c]
I don't think so as well whatever is stored in the array and what is stored in the array is what the judges can give at random. Of course, you can say that for some people the input will be more favourable, as getting a zero array requires no calculation, but then when the program is tested it is not the case that the program just gets one input, it is always the case that it is tested for different input, so the probability of getting a "fast" input is quite small, if not zero at all.
all in all, the random input I am tallking about will have no impact on rank-lists, it just that they will be more close to reality and will certainly help avoiding this "hacker" things described above. :wink:

Hope you got my idea.
it would be great if you replied this post. really.

minskcity
Experienced poster
Posts: 199
Joined: Tue May 14, 2002 10:23 am
Location: Vancouver

Post by minskcity » Thu Jun 12, 2003 6:56 pm

Your example is too simple.

My example is basic as well - fast sorting algorithm can work 1000 times faster or slower, depending on input. (Same array lenght)
There are hundreds of such algorithms, not just sorting.
CPU time depends on quality of input, not quantity in 95% problems posted by online judge.

Experimenter
Learning poster
Posts: 76
Joined: Thu Mar 13, 2003 5:12 am
Location: Russia

Post by Experimenter » Thu Jun 12, 2003 7:21 pm

minskcity wrote:Your example is too simple.

My example is basic as well - fast sorting algorithm can work 1000 times faster or slower, depending on input. (Same array lenght)
of course you are right, but it seems that judges might seem of such possibility as well. So, to my mind it is still possible to give a random input which will not affect the amount of time required to solve the problem.
Besides, what I tried to underline is that it is obiously not the case to vary the length of the arrays. that is clear from my first example.
minskcity wrote: There are hundreds of such algorithms, not just sorting.
CPU time depends on quality of input, not quantity in 95% problems posted by online judge.
sorry, where did you get this statistics from?
it would be great if you replied this post. really.

minskcity
Experienced poster
Posts: 199
Joined: Tue May 14, 2002 10:23 am
Location: Vancouver

Post by minskcity » Thu Jun 12, 2003 9:56 pm

sorry, where did you get this statistics from?
It comes from my experience - I've solved nearly 100 problems, only few of my solutions will run in the same time if you use random input.
So, to my mind it is still possible to give a random input which will not affect the amount of time required to solve the problem.
Only if you can think about all possible algorithms to solve a problem at the same time. Even those new algorithms that will be invented by 1000's of people who will try to solve a problem. Are you sure you can make it?

Most algorithms have worst case, best case and average case time. Your input generator should be really smart to predict all the algorithms used in solutions, links between those algorithms and all changes that will occur in each sub-algorithm input if you change something in problem input.

For me it's easier to create a winning strategy in chess - at least you can be sure there is finite number of variations. :wink:

Experimenter
Learning poster
Posts: 76
Joined: Thu Mar 13, 2003 5:12 am
Location: Russia

Post by Experimenter » Thu Jun 12, 2003 10:51 pm

you might be right. :D
it would be great if you replied this post. really.

User avatar
shamim
A great helper
Posts: 498
Joined: Mon Dec 30, 2002 10:10 am
Location: Bozeman, Montana, USA

Post by shamim » Sat Jun 14, 2003 12:49 pm

Well u can even fool judges during online programming contests,

contests using pc2 under windows mode, are vunerable to such attacks.
I doubt whether the judges take proper care to prevent these things.

Experimenter
Learning poster
Posts: 76
Joined: Thu Mar 13, 2003 5:12 am
Location: Russia

Post by Experimenter » Sat Jun 14, 2003 1:30 pm

shamim wrote:Well u can even fool judges during online programming contests,

contests using pc2 under windows mode, are vunerable to such attacks.
I doubt whether the judges take proper care to prevent these things.
hm. could you explain this bit? I did not understand it very well. :)
it would be great if you replied this post. really.

Post Reply

Return to “General”