Page 1 of 2

11860 - Document Analyzer

Posted: Sun Oct 10, 2010 6:40 pm
by durjay
I think this is a simple adhoc problem.....
but i got TLE in the contest :(
can any one give me any clue.....

thanx for advance.

Re: 11860-Document Analyzer

Posted: Sun Oct 10, 2010 7:35 pm
by Angeh
use roling ...
O(n)

Re: 11860-Document Analyzer

Posted: Wed Oct 20, 2010 9:19 pm
by lord_burgos
durjay wrote:I think this is a simple adhoc problem.....
but i got TLE in the contest :(
can any one give me any clue.....

thanx for advance.
For solve this problem, i'm used Suffix Tree to find words, for example 'a abba a c a' = 1 2 1 3 1, and with this I know the max number of different words = 3 in this case, use interval tree, for know the minimum position of each word, and insert it one by one.

in the example i have 3 different words, [ a, abba, c], the first iteration i don't know the position [ INF, INF, INF], but insert one by one,
1) a = [ 0, INF, INF]
2) abba = [0, 1, INF];
3) a = [2, 1, INF];
4) c = [2, 1, 3]; // in this point, we know all position, and, insert the last one in position 3, so the minimum of all words position is 1, then 3- 1 + 1 = 3 = diff
5) a = [4, 1, 3] // the same, last position 4, the minimum of all is 1 then 4 - 1 + 1 = 4 = diff
the solution for this case is 2 4.

there are many words, and the time to find the minimum position is so slow, for that point i use interval tree, but you can use any other algorithm.

Re: 11860-Document Analyzer

Posted: Sat Oct 23, 2010 12:18 am
by Leonid
I think Suffix Tree is a bit of an overkill for such a simple problem.

Suppose you have an interval of words [a, b], where b > a. There are 2 possibilities: either there are all unique words in the interval or not. If interval contains all unique words, then we can shorten the interval to [a + 1, b]. If there aren't yet all words then we should increase the length of the interval: [a, b + 1]. The solution can be found by starting with interval [0, 0], and moving this interval towards [n, n]. It has complexity O(N * LOG(N)), if map is used to lookup similar words, and can be increased to O(N) by using hashes (disregarding word length). I leave it up to you to prove why it works :)

Re: 11860-Document Analyzer

Posted: Sat Oct 30, 2010 1:56 pm
by peratu
durjay wrote:I think this is a simple adhoc problem.....
but i got TLE in the contest :(
can any one give me any clue.....

thanx for advance.
You can read the whole document and put it into a string variable. For example, the case one would be stored in a string with this content:

"1. a case is a case, 2. case is not a case~"

Then, write a function that cleans that "dirty" document an puts the clean words into a vector (the clean document). Here is an example in C++:

Code: Select all

void cleanDocument( string dirty_document, vector<string> &clean_document ) {
  string clean_word = "";
  
  for ( unsigned int i = 0; i < dirty_document.size(); ++i )
    if ( dirty_document[i] > 96 && dirty_document[i] < 123 )
      clean_word += dirty_document[i];
    else
      if ( clean_word != "" ) {
        clean_document.push_back( clean_word );
        clean_word = "";
      }
}

Re: 11860-Document Analyzer

Posted: Fri Jun 22, 2012 12:42 pm
by sith
Hello

I've got TLE, but I believe that my solution is pretty optimal. I don't have any string comparision, I read all input data . what is wrong?

Code: Select all

import java.io.ByteArrayInputStream;
import java.io.CharArrayReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.TreeSet;

class Main {

    public static final String END = "END";

    public static void main(String[] args) {

        InputStreamReader reader = new InputStreamReader(System.in);

        char[] buf = new char[1024 * 1024 * 9];
        try {
            reader.read(buf);
        }
        catch (IOException e) {
        }

        CharArrayReader bufReader = new CharArrayReader(buf);

        int intch;
        try {
            while ((intch = bufReader.read()) != -1) {

                if (intch == 0) {
                    break;
                }

                Integer count = null;
                try {
                    count = Integer.valueOf(Character.valueOf((char) intch).toString());
                }
                catch (NumberFormatException e) {
                    continue;
                }

                for (int i = 1; i <= count; i++) {
                    int position = 0;

                    Map<String, PriorityQueue<Integer>> words = readTheWords(bufReader, position);


                    if (words == null) {
                        return;
                    }

                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();
                    while (true) {


                        WordsDataStructure dataStructure = new WordsDataStructure();

                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {
                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());
                        }
                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));

                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);
                        if (integers.size() > 1) {
                            integers.poll();
                        }
                        else {
                            Interval poll = resultQueue.poll();
                            System.out.println("Document: " + i + ": " + poll.start + " " + poll.end);
                            break;
                        }
                    }
                }
            }
        }
        catch (IOException e) {
        }
    }

    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)
        throws IOException {
        int intch;


        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();


        StringBuilder builder = new StringBuilder();
        boolean isEnd = false;
        while (!isEnd) {
            intch = reader.read();
            if (intch == -1) {
                return null;
            }
            char c = (char) intch;
            if (Character.isLetter(c)) {
                builder.append(c);
            }
            else {
                String value = builder.toString();
                isEnd = value.equals(END);
                if (builder.length() > 0 && !isEnd) {
                    PriorityQueue<Integer> priorityQueue = words.get(value);
                    if (priorityQueue == null) {
                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {
                            public int compare(final Integer o1, final Integer o2) {
                                return o2.compareTo(o1);
                            }
                        });
                        words.put(value, priorityQueue);
                    }
                    priorityQueue.add(++position);
                }
                builder = new StringBuilder();
            }
        }
        return words;
    }


    static class WordsDataStructure {
        String maxWord;
        int maxPostition = Integer.MIN_VALUE;
        int minPosittion = Integer.MAX_VALUE;


        public void addWord(String word, int position) {
            if (position > maxPostition) {
                maxPostition = position;
                maxWord = word;
            }
            if (position < minPosittion) {
                minPosittion = position;
            }

        }
    }


    static class Interval implements Comparable<Interval> {
        final Integer start;
        final Integer end;


        Interval(final int start, final int end) {
            this.end = end;
            this.start = start;
        }

        public int compareTo(final Interval o) {
            int compareResult = (end - start) - (o.end - o.start);
            if (compareResult != 0) {
                return compareResult;
            }
            return start.compareTo(o.start);
        }
    }
}

Re: 11860-Document Analyzer

Posted: Tue Jun 26, 2012 12:31 am
by brianfry713
Try using BufferedReader and BufferedWriter

Re: 11860-Document Analyzer

Posted: Tue Jun 26, 2012 6:09 pm
by sith
It didn't help. I still get TLE

Here is updated solution

Code: Select all

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.ByteArrayInputStream;
import java.io.CharArrayReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.io.Reader;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.TreeSet;

class Main {

    public static final String END = "END";

    public static void main(String[] args) {

        BufferedReader reader= new BufferedReader(new InputStreamReader(System.in),1024*1024*9);

        BufferedWriter writer = new BufferedWriter(new PrintWriter(System.out));
        int intch;
        try {
            while ((intch = reader.read()) != -1) {

                if (intch == 0) {
                    break;
                }

                Integer count = null;
                try {
                    count = Integer.valueOf(Character.valueOf((char) intch).toString());
                }
                catch (NumberFormatException e) {
                    continue;
                }

                for (int i = 1; i <= count; i++) {
                    int position = 0;

                    Map<String, PriorityQueue<Integer>> words = readTheWords(reader, position);


                    if (words == null) {
                        return;
                    }

                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();
                    while (true) {


                        WordsDataStructure dataStructure = new WordsDataStructure();

                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {
                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());
                        }
                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));

                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);
                        if (integers.size() > 1) {
                            integers.poll();
                        }
                        else {
                            Interval poll = resultQueue.poll();

                            writer.write("Document: " + i + ": " + poll.start + " " + poll.end + "\n");
                            break;
                        }
                    }
                }
            }
            writer.flush();
        }
        catch (IOException e) {
        }
    }

    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)
        throws IOException {
        int intch;


        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();


        StringBuilder builder = new StringBuilder();
        boolean isEnd = false;
        while (!isEnd) {
            intch = reader.read();
            if (intch == -1) {
                return words;
            }
            char c = (char) intch;
            if (Character.isLetter(c)) {
                builder.append(c);
            }
            else {
                String value = builder.toString();
                isEnd = value.equals(END);
                if (builder.length() > 0 && !isEnd) {
                    PriorityQueue<Integer> priorityQueue = words.get(value);
                    if (priorityQueue == null) {
                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {
                            public int compare(final Integer o1, final Integer o2) {
                                return o2.compareTo(o1);
                            }
                        });
                        words.put(value, priorityQueue);
                    }
                    priorityQueue.add(++position);
                }
                builder = new StringBuilder();
            }
        }
        return words;
    }


    static class WordsDataStructure {
        String maxWord;
        int maxPostition = Integer.MIN_VALUE;
        int minPosittion = Integer.MAX_VALUE;


        public void addWord(String word, int position) {
            if (position > maxPostition) {
                maxPostition = position;
                maxWord = word;
            }
            if (position < minPosittion) {
                minPosittion = position;
            }

        }
    }


    static class Interval implements Comparable<Interval> {
        final Integer start;
        final Integer end;


        Interval(final int start, final int end) {
            this.end = end;
            this.start = start;
        }

        public int compareTo(final Interval o) {
            int compareResult = (end - start) - (o.end - o.start);
            if (compareResult != 0) {
                return compareResult;
            }
            return start.compareTo(o.start);
        }
    }
}

Re: 11860-Document Analyzer

Posted: Wed Jun 27, 2012 12:20 am
by brianfry713
Input:

Code: Select all

10
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
100
Correct output:

Code: Select all

Document 1: 1 5
Document 2: 1 5
Document 3: 1 5
Document 4: 1 5
Document 5: 1 5
Document 6: 1 5
Document 7: 1 5
Document 8: 1 5
Document 9: 1 5
Document 10: 1 5

Re: 11860-Document Analyzer

Posted: Wed Jun 27, 2012 8:27 pm
by sith
Thank you for your case. It helped me to fix one bug.
In your case you have 100 at the and of input - is it typo?

But I've still get TLE :((((((((((((((((


Code: Select all

import java.io.*;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;

class Main {

    public static final String END = "END";

    public static void main(String[] args) {

        BufferedReader reader = new BufferedReader(new InputStreamReader(System.in), 1024 * 1024 * 9);

        BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(System.out));
        int intch;
        try {
            while ((intch = reader.read()) != -1) {

                if (intch == 0) {
                    break;
                }

                Integer count = null;
                try {
                    StringBuilder builder = new StringBuilder();

                    char intch1 = (char) intch;
                    while (intch1 != '\n') {
                        builder.append(intch1);
                        intch1 = (char) reader.read();
                    }
                    count = Integer.valueOf(builder.toString().trim());
                } catch (NumberFormatException e) {
                    continue;
                }

                for (int i = 1; i <= count; i++) {
                    int position = 0;

                    Map<String, PriorityQueue<Integer>> words = readTheWords(reader, position);


                    if (words == null) {
                        return;
                    }

                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();
                    while (true) {


                        WordsDataStructure dataStructure = new WordsDataStructure();

                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {
                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());
                        }
                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));

                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);
                        if (integers.size() > 1) {
                            integers.poll();
                        } else {
                            Interval poll = resultQueue.poll();

                            writer.write("Document: " + i + ": " + poll.start + " " + poll.end + "\n");
                            break;
                        }
                    }
                }
            }
            writer.flush();
        } catch (IOException e) {
        }
    }

    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)
            throws IOException {
        int intch;


        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();


        StringBuilder builder = new StringBuilder();
        boolean isEnd = false;
        while (!isEnd) {
            intch = reader.read();
            if (intch == -1) {
                return words;
            }
            char c = (char) intch;
            if (Character.isLetter(c)) {
                builder.append(c);
            }
            else {
                String value = builder.toString();
                isEnd = value.equals(END);
                if (builder.length() > 0 && !isEnd) {
                    PriorityQueue<Integer> priorityQueue = words.get(value);
                    if (priorityQueue == null) {
                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {
                            public int compare(final Integer o1, final Integer o2) {
                                return o2.compareTo(o1);
                            }
                        });
                        words.put(value, priorityQueue);
                    }
                    priorityQueue.add(++position);
                }
                builder = new StringBuilder();
            }
        }
        return words;
    }


    static class WordsDataStructure {
        String maxWord;
        int maxPostition = Integer.MIN_VALUE;
        int minPosittion = Integer.MAX_VALUE;


        public void addWord(String word, int position) {
            if (position > maxPostition) {
                maxPostition = position;
                maxWord = word;
            }
            if (position < minPosittion) {
                minPosittion = position;
            }

        }
    }


    static class Interval implements Comparable<Interval> {
        final Integer start;
        final Integer end;


        Interval(final int start, final int end) {
            this.end = end;
            this.start = start;
        }

        public int compareTo(final Interval o) {
            int compareResult = (end - start) - (o.end - o.start);
            if (compareResult != 0) {
                return compareResult;
            }
            return start.compareTo(o.start);
        }
    }
}

Re: 11860-Document Analyzer

Posted: Thu Jun 28, 2012 12:03 am
by brianfry713
I intentionally put 100 at the end of the input to demonstrate an issue with your code. You should only read the number of documents on the first line and then quit after reading that number of documents. Often the problems on this website will have garbage at the end of the input to see if you are following the problem statement and reading the input correctly.

Re: 11860-Document Analyzer

Posted: Thu Jun 28, 2012 10:09 am
by sith
Thank you.
I fixed It, but TLE has been happend again :(((

Code: Select all

import java.io.*;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;

class Main {

    public static final String END = "END";

    public static void main(String[] args) {

     BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));

        BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(System.out));
        int intch;
        try {
            while ((intch = reader.read()) != -1) {

                if (intch == 0) {
                    break;
                }

                Integer count = null;
                try {
                    StringBuilder builder = new StringBuilder();

                    char intch1 = (char) intch;
                    while (intch1 != '\n') {
                        builder.append(intch1);
                        intch1 = (char) reader.read();
                    }
                    count = Integer.valueOf(builder.toString().trim());
                } catch (NumberFormatException e) {
                    continue;
                }

                for (int i = 1; i <= count; i++) {
                    int position = 0;

                    Map<String, PriorityQueue<Integer>> words = readTheWords(reader, position);


                    if (words == null) {
                        return;
                    }

                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();
                    while (true) {


                        WordsDataStructure dataStructure = new WordsDataStructure();

                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {
                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());
                        }
                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));

                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);
                        if (integers.size() > 1) {
                            integers.poll();
                        } else {
                            Interval poll = resultQueue.poll();

                            writer.write("Document: " + i + ": " + poll.start + " " + poll.end + "\n");
                            break;
                        }
                    }
                }
                writer.flush();
                return;
            }
        } catch (IOException e) {
        }
    }

    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)
            throws IOException {
        int intch;


        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();


        StringBuilder builder = new StringBuilder();
        boolean isEnd = false;
        while (!isEnd) {
            intch = reader.read();
            if (intch == -1) {
                return words;
            }
            char c = (char) intch;
            if (Character.isLetter(c)) {
                builder.append(c);
            } else {
                String value = builder.toString();
                isEnd = value.equals(END);
                if (builder.length() > 0 && !isEnd) {
                    PriorityQueue<Integer> priorityQueue = words.get(value);
                    if (priorityQueue == null) {
                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {
                            public int compare(final Integer o1, final Integer o2) {
                                return o2.compareTo(o1);
                            }
                        });
                        words.put(value, priorityQueue);
                    }
                    priorityQueue.add(++position);
                }
                builder = new StringBuilder();
            }
        }
        return words;
    }


    static class WordsDataStructure {
        String maxWord;
        int maxPostition = Integer.MIN_VALUE;
        int minPosittion = Integer.MAX_VALUE;


        public void addWord(String word, int position) {
            if (position > maxPostition) {
                maxPostition = position;
                maxWord = word;
            }
            if (position < minPosittion) {
                minPosittion = position;
            }

        }
    }


    static class Interval implements Comparable<Interval> {
        final Integer start;
        final Integer end;


        Interval(final int start, final int end) {
            this.end = end;
            this.start = start;
        }

        public int compareTo(final Interval o) {
            int compareResult = (end - start) - (o.end - o.start);
            if (compareResult != 0) {
                return compareResult;
            }
            return start.compareTo(o.start);
        }
    }
}

Re: 11860-Document Analyzer

Posted: Fri Jun 29, 2012 1:25 am
by brianfry713
My C++ code uses a map<string,int> and an array, no priority queue. I use an algorithm similar to the one described by Leonid in this thread and get AC in 0.868 sec.
I rewrote my code in JAVA and got AC in 2.044 sec.

Re: 11860-Document Analyzer

Posted: Sat Sep 08, 2012 12:57 am
by Scarecrow
please help someone. getting WA

Code: Select all

AC

Re: 11860-Document Analyzer

Posted: Thu Jun 05, 2014 7:04 am
by uDebug
Leonid wrote: Suppose you have an interval of words [a, b], where b > a. There are 2 possibilities: either there are all unique words in the interval or not. If interval contains all unique words, then we can shorten the interval to [a + 1, b]. If there aren't yet all words then we should increase the length of the interval: [a, b + 1]. The solution can be found by starting with interval [0, 0], and moving this interval towards [n, n]. It has complexity O(N * LOG(N)), if map is used to lookup similar words, and can be increased to O(N) by using hashes (disregarding word length).
Thank you very much, Leonid. This was very useful.