11860 - Document Analyzer

durjay · Post by **durjay** » Sun Oct 10, 2010 6:40 pm

I think this is a simple adhoc problem.....
but i got TLE in the contest

can any one give me any clue.....

thanx for advance.

Angeh · Post by **Angeh** » Sun Oct 10, 2010 7:35 pm

use roling ...
O(n)

lord_burgos · Post by **lord_burgos** » Wed Oct 20, 2010 9:19 pm

durjay wrote:I think this is a simple adhoc problem.....
but i got TLE in the contest
can any one give me any clue.....

thanx for advance.

For solve this problem, i'm used Suffix Tree to find words, for example 'a abba a c a' = 1 2 1 3 1, and with this I know the max number of different words = 3 in this case, use interval tree, for know the minimum position of each word, and insert it one by one.

in the example i have 3 different words, [ a, abba, c], the first iteration i don't know the position [ INF, INF, INF], but insert one by one,
1) a = [ 0, INF, INF]
2) abba = [0, 1, INF];
3) a = [2, 1, INF];
4) c = [2, 1, 3]; // in this point, we know all position, and, insert the last one in position 3, so the minimum of all words position is 1, then 3- 1 + 1 = 3 = diff
5) a = [4, 1, 3] // the same, last position 4, the minimum of all is 1 then 4 - 1 + 1 = 4 = diff
the solution for this case is 2 4.

there are many words, and the time to find the minimum position is so slow, for that point i use interval tree, but you can use any other algorithm.

Leonid · Post by **Leonid** » Sat Oct 23, 2010 12:18 am

I think Suffix Tree is a bit of an overkill for such a simple problem.

Suppose you have an interval of words [a, b], where b > a. There are 2 possibilities: either there are all unique words in the interval or not. If interval contains all unique words, then we can shorten the interval to [a + 1, b]. If there aren't yet all words then we should increase the length of the interval: [a, b + 1]. The solution can be found by starting with interval [0, 0], and moving this interval towards [n, n]. It has complexity O(N * LOG(N)), if map is used to lookup similar words, and can be increased to O(N) by using hashes (disregarding word length). I leave it up to you to prove why it works

peratu · Post by **peratu** » Sat Oct 30, 2010 1:56 pm

durjay wrote:I think this is a simple adhoc problem.....
but i got TLE in the contest
can any one give me any clue.....

thanx for advance.

You can read the whole document and put it into a string variable. For example, the case one would be stored in a string with this content:

"1. a case is a case, 2. case is not a case~"

Then, write a function that cleans that "dirty" document an puts the clean words into a vector (the clean document). Here is an example in C++:

Code: Select all

void cleanDocument( string dirty_document, vector<string> &clean_document ) {
  string clean_word = "";
  
  for ( unsigned int i = 0; i < dirty_document.size(); ++i )
    if ( dirty_document[i] > 96 && dirty_document[i] < 123 )
      clean_word += dirty_document[i];
    else
      if ( clean_word != "" ) {
        clean_document.push_back( clean_word );
        clean_word = "";
      }
}

sith · Post by **sith** » Fri Jun 22, 2012 12:42 pm

Hello

I've got TLE, but I believe that my solution is pretty optimal. I don't have any string comparision, I read all input data . what is wrong?

Code: Select all

import java.io.ByteArrayInputStream;
import java.io.CharArrayReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.TreeSet;

class Main {

    public static final String END = "END";

    public static void main(String[] args) {

        InputStreamReader reader = new InputStreamReader(System.in);

        char[] buf = new char[1024 * 1024 * 9];
        try {
            reader.read(buf);
        }
        catch (IOException e) {
        }

        CharArrayReader bufReader = new CharArrayReader(buf);

        int intch;
        try {
            while ((intch = bufReader.read()) != -1) {

                if (intch == 0) {
                    break;
                }

                Integer count = null;
                try {
                    count = Integer.valueOf(Character.valueOf((char) intch).toString());
                }
                catch (NumberFormatException e) {
                    continue;
                }

                for (int i = 1; i <= count; i++) {
                    int position = 0;

                    Map<String, PriorityQueue<Integer>> words = readTheWords(bufReader, position);


                    if (words == null) {
                        return;
                    }

                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();
                    while (true) {


                        WordsDataStructure dataStructure = new WordsDataStructure();

                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {
                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());
                        }
                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));

                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);
                        if (integers.size() > 1) {
                            integers.poll();
                        }
                        else {
                            Interval poll = resultQueue.poll();
                            System.out.println("Document: " + i + ": " + poll.start + " " + poll.end);
                            break;
                        }
                    }
                }
            }
        }
        catch (IOException e) {
        }
    }

    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)
        throws IOException {
        int intch;


        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();


        StringBuilder builder = new StringBuilder();
        boolean isEnd = false;
        while (!isEnd) {
            intch = reader.read();
            if (intch == -1) {
                return null;
            }
            char c = (char) intch;
            if (Character.isLetter(c)) {
                builder.append(c);
            }
            else {
                String value = builder.toString();
                isEnd = value.equals(END);
                if (builder.length() > 0 && !isEnd) {
                    PriorityQueue<Integer> priorityQueue = words.get(value);
                    if (priorityQueue == null) {
                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {
                            public int compare(final Integer o1, final Integer o2) {
                                return o2.compareTo(o1);
                            }
                        });
                        words.put(value, priorityQueue);
                    }
                    priorityQueue.add(++position);
                }
                builder = new StringBuilder();
            }
        }
        return words;
    }


    static class WordsDataStructure {
        String maxWord;
        int maxPostition = Integer.MIN_VALUE;
        int minPosittion = Integer.MAX_VALUE;


        public void addWord(String word, int position) {
            if (position > maxPostition) {
                maxPostition = position;
                maxWord = word;
            }
            if (position < minPosittion) {
                minPosittion = position;
            }

        }
    }


    static class Interval implements Comparable<Interval> {
        final Integer start;
        final Integer end;


        Interval(final int start, final int end) {
            this.end = end;
            this.start = start;
        }

        public int compareTo(final Interval o) {
            int compareResult = (end - start) - (o.end - o.start);
            if (compareResult != 0) {
                return compareResult;
            }
            return start.compareTo(o.start);
        }
    }
}

brianfry713 · Post by **brianfry713** » Tue Jun 26, 2012 12:31 am

Try using BufferedReader and BufferedWriter

sith · Post by **sith** » Tue Jun 26, 2012 6:09 pm

It didn't help. I still get TLE

Here is updated solution

Code: Select all

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.ByteArrayInputStream;
import java.io.CharArrayReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.io.Reader;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.TreeSet;

class Main {

    public static final String END = "END";

    public static void main(String[] args) {

        BufferedReader reader= new BufferedReader(new InputStreamReader(System.in),1024*1024*9);

        BufferedWriter writer = new BufferedWriter(new PrintWriter(System.out));
        int intch;
        try {
            while ((intch = reader.read()) != -1) {

                if (intch == 0) {
                    break;
                }

                Integer count = null;
                try {
                    count = Integer.valueOf(Character.valueOf((char) intch).toString());
                }
                catch (NumberFormatException e) {
                    continue;
                }

                for (int i = 1; i <= count; i++) {
                    int position = 0;

                    Map<String, PriorityQueue<Integer>> words = readTheWords(reader, position);


                    if (words == null) {
                        return;
                    }

                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();
                    while (true) {


                        WordsDataStructure dataStructure = new WordsDataStructure();

                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {
                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());
                        }
                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));

                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);
                        if (integers.size() > 1) {
                            integers.poll();
                        }
                        else {
                            Interval poll = resultQueue.poll();

                            writer.write("Document: " + i + ": " + poll.start + " " + poll.end + "\n");
                            break;
                        }
                    }
                }
            }
            writer.flush();
        }
        catch (IOException e) {
        }
    }

    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)
        throws IOException {
        int intch;


        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();


        StringBuilder builder = new StringBuilder();
        boolean isEnd = false;
        while (!isEnd) {
            intch = reader.read();
            if (intch == -1) {
                return words;
            }
            char c = (char) intch;
            if (Character.isLetter(c)) {
                builder.append(c);
            }
            else {
                String value = builder.toString();
                isEnd = value.equals(END);
                if (builder.length() > 0 && !isEnd) {
                    PriorityQueue<Integer> priorityQueue = words.get(value);
                    if (priorityQueue == null) {
                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {
                            public int compare(final Integer o1, final Integer o2) {
                                return o2.compareTo(o1);
                            }
                        });
                        words.put(value, priorityQueue);
                    }
                    priorityQueue.add(++position);
                }
                builder = new StringBuilder();
            }
        }
        return words;
    }


    static class WordsDataStructure {
        String maxWord;
        int maxPostition = Integer.MIN_VALUE;
        int minPosittion = Integer.MAX_VALUE;


        public void addWord(String word, int position) {
            if (position > maxPostition) {
                maxPostition = position;
                maxWord = word;
            }
            if (position < minPosittion) {
                minPosittion = position;
            }

        }
    }


    static class Interval implements Comparable<Interval> {
        final Integer start;
        final Integer end;


        Interval(final int start, final int end) {
            this.end = end;
            this.start = start;
        }

        public int compareTo(final Interval o) {
            int compareResult = (end - start) - (o.end - o.start);
            if (compareResult != 0) {
                return compareResult;
            }
            return start.compareTo(o.start);
        }
    }
}

brianfry713 · Post by **brianfry713** » Wed Jun 27, 2012 12:20 am

Input:

Code: Select all

10
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
100

Correct output:

Code: Select all

Document 1: 1 5
Document 2: 1 5
Document 3: 1 5
Document 4: 1 5
Document 5: 1 5
Document 6: 1 5
Document 7: 1 5
Document 8: 1 5
Document 9: 1 5
Document 10: 1 5

sith · Post by **sith** » Wed Jun 27, 2012 8:27 pm

Thank you for your case. It helped me to fix one bug.
In your case you have 100 at the and of input - is it typo?

But I've still get TLE

(((((((((((((((

Code: Select all

import java.io.*;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;

class Main {

    public static final String END = "END";

    public static void main(String[] args) {

        BufferedReader reader = new BufferedReader(new InputStreamReader(System.in), 1024 * 1024 * 9);

        BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(System.out));
        int intch;
        try {
            while ((intch = reader.read()) != -1) {

                if (intch == 0) {
                    break;
                }

                Integer count = null;
                try {
                    StringBuilder builder = new StringBuilder();

                    char intch1 = (char) intch;
                    while (intch1 != '\n') {
                        builder.append(intch1);
                        intch1 = (char) reader.read();
                    }
                    count = Integer.valueOf(builder.toString().trim());
                } catch (NumberFormatException e) {
                    continue;
                }

                for (int i = 1; i <= count; i++) {
                    int position = 0;

                    Map<String, PriorityQueue<Integer>> words = readTheWords(reader, position);


                    if (words == null) {
                        return;
                    }

                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();
                    while (true) {


                        WordsDataStructure dataStructure = new WordsDataStructure();

                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {
                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());
                        }
                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));

                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);
                        if (integers.size() > 1) {
                            integers.poll();
                        } else {
                            Interval poll = resultQueue.poll();

                            writer.write("Document: " + i + ": " + poll.start + " " + poll.end + "\n");
                            break;
                        }
                    }
                }
            }
            writer.flush();
        } catch (IOException e) {
        }
    }

    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)
            throws IOException {
        int intch;


        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();


        StringBuilder builder = new StringBuilder();
        boolean isEnd = false;
        while (!isEnd) {
            intch = reader.read();
            if (intch == -1) {
                return words;
            }
            char c = (char) intch;
            if (Character.isLetter(c)) {
                builder.append(c);
            }
            else {
                String value = builder.toString();
                isEnd = value.equals(END);
                if (builder.length() > 0 && !isEnd) {
                    PriorityQueue<Integer> priorityQueue = words.get(value);
                    if (priorityQueue == null) {
                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {
                            public int compare(final Integer o1, final Integer o2) {
                                return o2.compareTo(o1);
                            }
                        });
                        words.put(value, priorityQueue);
                    }
                    priorityQueue.add(++position);
                }
                builder = new StringBuilder();
            }
        }
        return words;
    }


    static class WordsDataStructure {
        String maxWord;
        int maxPostition = Integer.MIN_VALUE;
        int minPosittion = Integer.MAX_VALUE;


        public void addWord(String word, int position) {
            if (position > maxPostition) {
                maxPostition = position;
                maxWord = word;
            }
            if (position < minPosittion) {
                minPosittion = position;
            }

        }
    }


    static class Interval implements Comparable<Interval> {
        final Integer start;
        final Integer end;


        Interval(final int start, final int end) {
            this.end = end;
            this.start = start;
        }

        public int compareTo(final Interval o) {
            int compareResult = (end - start) - (o.end - o.start);
            if (compareResult != 0) {
                return compareResult;
            }
            return start.compareTo(o.start);
        }
    }
}

brianfry713 · Post by **brianfry713** » Thu Jun 28, 2012 12:03 am

I intentionally put 100 at the end of the input to demonstrate an issue with your code. You should only read the number of documents on the first line and then quit after reading that number of documents. Often the problems on this website will have garbage at the end of the input to see if you are following the problem statement and reading the input correctly.

sith · Post by **sith** » Thu Jun 28, 2012 10:09 am

Thank you.
I fixed It, but TLE has been happend again

((

Code: Select all

import java.io.*;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;

class Main {

    public static final String END = "END";

    public static void main(String[] args) {

     BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));

        BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(System.out));
        int intch;
        try {
            while ((intch = reader.read()) != -1) {

                if (intch == 0) {
                    break;
                }

                Integer count = null;
                try {
                    StringBuilder builder = new StringBuilder();

                    char intch1 = (char) intch;
                    while (intch1 != '\n') {
                        builder.append(intch1);
                        intch1 = (char) reader.read();
                    }
                    count = Integer.valueOf(builder.toString().trim());
                } catch (NumberFormatException e) {
                    continue;
                }

                for (int i = 1; i <= count; i++) {
                    int position = 0;

                    Map<String, PriorityQueue<Integer>> words = readTheWords(reader, position);


                    if (words == null) {
                        return;
                    }

                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();
                    while (true) {


                        WordsDataStructure dataStructure = new WordsDataStructure();

                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {
                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());
                        }
                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));

                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);
                        if (integers.size() > 1) {
                            integers.poll();
                        } else {
                            Interval poll = resultQueue.poll();

                            writer.write("Document: " + i + ": " + poll.start + " " + poll.end + "\n");
                            break;
                        }
                    }
                }
                writer.flush();
                return;
            }
        } catch (IOException e) {
        }
    }

    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)
            throws IOException {
        int intch;


        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();


        StringBuilder builder = new StringBuilder();
        boolean isEnd = false;
        while (!isEnd) {
            intch = reader.read();
            if (intch == -1) {
                return words;
            }
            char c = (char) intch;
            if (Character.isLetter(c)) {
                builder.append(c);
            } else {
                String value = builder.toString();
                isEnd = value.equals(END);
                if (builder.length() > 0 && !isEnd) {
                    PriorityQueue<Integer> priorityQueue = words.get(value);
                    if (priorityQueue == null) {
                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {
                            public int compare(final Integer o1, final Integer o2) {
                                return o2.compareTo(o1);
                            }
                        });
                        words.put(value, priorityQueue);
                    }
                    priorityQueue.add(++position);
                }
                builder = new StringBuilder();
            }
        }
        return words;
    }


    static class WordsDataStructure {
        String maxWord;
        int maxPostition = Integer.MIN_VALUE;
        int minPosittion = Integer.MAX_VALUE;


        public void addWord(String word, int position) {
            if (position > maxPostition) {
                maxPostition = position;
                maxWord = word;
            }
            if (position < minPosittion) {
                minPosittion = position;
            }

        }
    }


    static class Interval implements Comparable<Interval> {
        final Integer start;
        final Integer end;


        Interval(final int start, final int end) {
            this.end = end;
            this.start = start;
        }

        public int compareTo(final Interval o) {
            int compareResult = (end - start) - (o.end - o.start);
            if (compareResult != 0) {
                return compareResult;
            }
            return start.compareTo(o.start);
        }
    }
}

brianfry713 · Post by **brianfry713** » Fri Jun 29, 2012 1:25 am

My C++ code uses a map<string,int> and an array, no priority queue. I use an algorithm similar to the one described by Leonid in this thread and get AC in 0.868 sec.
I rewrote my code in JAVA and got AC in 2.044 sec.

Scarecrow · Post by **Scarecrow** » Sat Sep 08, 2012 12:57 am

please help someone. getting WA

Code: Select all

AC

uDebug · Post by **uDebug** » Thu Jun 05, 2014 7:04 am

Leonid wrote: Suppose you have an interval of words [a, b], where b > a. There are 2 possibilities: either there are all unique words in the interval or not. If interval contains all unique words, then we can shorten the interval to [a + 1, b]. If there aren't yet all words then we should increase the length of the interval: [a, b + 1]. The solution can be found by starting with interval [0, 0], and moving this interval towards [n, n]. It has complexity O(N * LOG(N)), if map is used to lookup similar words, and can be increased to O(N) by using hashes (disregarding word length).

Thank you very much, Leonid. This was very useful.

OJ Board

11860 - Document Analyzer

11860 - Document Analyzer

Re: 11860-Document Analyzer

Re: 11860-Document Analyzer

Re: 11860-Document Analyzer

Re: 11860-Document Analyzer

Re: 11860-Document Analyzer

Re: 11860-Document Analyzer

Re: 11860-Document Analyzer

Re: 11860-Document Analyzer

Re: 11860-Document Analyzer

Re: 11860-Document Analyzer

Re: 11860-Document Analyzer

Re: 11860-Document Analyzer

Re: 11860-Document Analyzer

Re: 11860-Document Analyzer