11860 - Document Analyzer

All about problems in Volume 118. If there is a thread about your problem, please use it. If not, create one with its number in the subject.

Moderator: Board moderators

durjay
New poster
Posts: 13
Joined: Tue Oct 06, 2009 5:09 pm
Location: ctg

11860 - Document Analyzer

Post by durjay »

I think this is a simple adhoc problem.....
but i got TLE in the contest :(
can any one give me any clue.....

thanx for advance.
Angeh
Experienced poster
Posts: 108
Joined: Sat Aug 08, 2009 2:53 pm

Re: 11860-Document Analyzer

Post by Angeh »

use roling ...
O(n)
>>>>>>>>> A2
Beliefs are not facts, believe what you need to believe;)
lord_burgos
New poster
Posts: 43
Joined: Mon Oct 13, 2003 4:54 pm
Location: Mexico
Contact:

Re: 11860-Document Analyzer

Post by lord_burgos »

durjay wrote:I think this is a simple adhoc problem.....
but i got TLE in the contest :(
can any one give me any clue.....

thanx for advance.
For solve this problem, i'm used Suffix Tree to find words, for example 'a abba a c a' = 1 2 1 3 1, and with this I know the max number of different words = 3 in this case, use interval tree, for know the minimum position of each word, and insert it one by one.

in the example i have 3 different words, [ a, abba, c], the first iteration i don't know the position [ INF, INF, INF], but insert one by one,
1) a = [ 0, INF, INF]
2) abba = [0, 1, INF];
3) a = [2, 1, INF];
4) c = [2, 1, 3]; // in this point, we know all position, and, insert the last one in position 3, so the minimum of all words position is 1, then 3- 1 + 1 = 3 = diff
5) a = [4, 1, 3] // the same, last position 4, the minimum of all is 1 then 4 - 1 + 1 = 4 = diff
the solution for this case is 2 4.

there are many words, and the time to find the minimum position is so slow, for that point i use interval tree, but you can use any other algorithm.
Leonid
Experienced poster
Posts: 146
Joined: Thu Dec 22, 2005 5:50 pm
Contact:

Re: 11860-Document Analyzer

Post by Leonid »

I think Suffix Tree is a bit of an overkill for such a simple problem.

Suppose you have an interval of words [a, b], where b > a. There are 2 possibilities: either there are all unique words in the interval or not. If interval contains all unique words, then we can shorten the interval to [a + 1, b]. If there aren't yet all words then we should increase the length of the interval: [a, b + 1]. The solution can be found by starting with interval [0, 0], and moving this interval towards [n, n]. It has complexity O(N * LOG(N)), if map is used to lookup similar words, and can be increased to O(N) by using hashes (disregarding word length). I leave it up to you to prove why it works :)
peratu
New poster
Posts: 7
Joined: Sat Jul 10, 2010 9:22 am

Re: 11860-Document Analyzer

Post by peratu »

durjay wrote:I think this is a simple adhoc problem.....
but i got TLE in the contest :(
can any one give me any clue.....

thanx for advance.
You can read the whole document and put it into a string variable. For example, the case one would be stored in a string with this content:

"1. a case is a case, 2. case is not a case~"

Then, write a function that cleans that "dirty" document an puts the clean words into a vector (the clean document). Here is an example in C++:

Code: Select all

void cleanDocument( string dirty_document, vector<string> &clean_document ) {
  string clean_word = "";
  
  for ( unsigned int i = 0; i < dirty_document.size(); ++i )
    if ( dirty_document[i] > 96 && dirty_document[i] < 123 )
      clean_word += dirty_document[i];
    else
      if ( clean_word != "" ) {
        clean_document.push_back( clean_word );
        clean_word = "";
      }
}
sith
Learning poster
Posts: 72
Joined: Sat May 19, 2012 7:46 pm

Re: 11860-Document Analyzer

Post by sith »

Hello

I've got TLE, but I believe that my solution is pretty optimal. I don't have any string comparision, I read all input data . what is wrong?

Code: Select all

import java.io.ByteArrayInputStream;
import java.io.CharArrayReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.TreeSet;

class Main {

    public static final String END = "END";

    public static void main(String[] args) {

        InputStreamReader reader = new InputStreamReader(System.in);

        char[] buf = new char[1024 * 1024 * 9];
        try {
            reader.read(buf);
        }
        catch (IOException e) {
        }

        CharArrayReader bufReader = new CharArrayReader(buf);

        int intch;
        try {
            while ((intch = bufReader.read()) != -1) {

                if (intch == 0) {
                    break;
                }

                Integer count = null;
                try {
                    count = Integer.valueOf(Character.valueOf((char) intch).toString());
                }
                catch (NumberFormatException e) {
                    continue;
                }

                for (int i = 1; i <= count; i++) {
                    int position = 0;

                    Map<String, PriorityQueue<Integer>> words = readTheWords(bufReader, position);


                    if (words == null) {
                        return;
                    }

                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();
                    while (true) {


                        WordsDataStructure dataStructure = new WordsDataStructure();

                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {
                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());
                        }
                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));

                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);
                        if (integers.size() > 1) {
                            integers.poll();
                        }
                        else {
                            Interval poll = resultQueue.poll();
                            System.out.println("Document: " + i + ": " + poll.start + " " + poll.end);
                            break;
                        }
                    }
                }
            }
        }
        catch (IOException e) {
        }
    }

    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)
        throws IOException {
        int intch;


        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();


        StringBuilder builder = new StringBuilder();
        boolean isEnd = false;
        while (!isEnd) {
            intch = reader.read();
            if (intch == -1) {
                return null;
            }
            char c = (char) intch;
            if (Character.isLetter(c)) {
                builder.append(c);
            }
            else {
                String value = builder.toString();
                isEnd = value.equals(END);
                if (builder.length() > 0 && !isEnd) {
                    PriorityQueue<Integer> priorityQueue = words.get(value);
                    if (priorityQueue == null) {
                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {
                            public int compare(final Integer o1, final Integer o2) {
                                return o2.compareTo(o1);
                            }
                        });
                        words.put(value, priorityQueue);
                    }
                    priorityQueue.add(++position);
                }
                builder = new StringBuilder();
            }
        }
        return words;
    }


    static class WordsDataStructure {
        String maxWord;
        int maxPostition = Integer.MIN_VALUE;
        int minPosittion = Integer.MAX_VALUE;


        public void addWord(String word, int position) {
            if (position > maxPostition) {
                maxPostition = position;
                maxWord = word;
            }
            if (position < minPosittion) {
                minPosittion = position;
            }

        }
    }


    static class Interval implements Comparable<Interval> {
        final Integer start;
        final Integer end;


        Interval(final int start, final int end) {
            this.end = end;
            this.start = start;
        }

        public int compareTo(final Interval o) {
            int compareResult = (end - start) - (o.end - o.start);
            if (compareResult != 0) {
                return compareResult;
            }
            return start.compareTo(o.start);
        }
    }
}
brianfry713
Guru
Posts: 5947
Joined: Thu Sep 01, 2011 9:09 am
Location: San Jose, CA, USA

Re: 11860-Document Analyzer

Post by brianfry713 »

Try using BufferedReader and BufferedWriter
Check input and AC output for thousands of problems on uDebug!
sith
Learning poster
Posts: 72
Joined: Sat May 19, 2012 7:46 pm

Re: 11860-Document Analyzer

Post by sith »

It didn't help. I still get TLE

Here is updated solution

Code: Select all

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.ByteArrayInputStream;
import java.io.CharArrayReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.io.Reader;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.TreeSet;

class Main {

    public static final String END = "END";

    public static void main(String[] args) {

        BufferedReader reader= new BufferedReader(new InputStreamReader(System.in),1024*1024*9);

        BufferedWriter writer = new BufferedWriter(new PrintWriter(System.out));
        int intch;
        try {
            while ((intch = reader.read()) != -1) {

                if (intch == 0) {
                    break;
                }

                Integer count = null;
                try {
                    count = Integer.valueOf(Character.valueOf((char) intch).toString());
                }
                catch (NumberFormatException e) {
                    continue;
                }

                for (int i = 1; i <= count; i++) {
                    int position = 0;

                    Map<String, PriorityQueue<Integer>> words = readTheWords(reader, position);


                    if (words == null) {
                        return;
                    }

                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();
                    while (true) {


                        WordsDataStructure dataStructure = new WordsDataStructure();

                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {
                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());
                        }
                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));

                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);
                        if (integers.size() > 1) {
                            integers.poll();
                        }
                        else {
                            Interval poll = resultQueue.poll();

                            writer.write("Document: " + i + ": " + poll.start + " " + poll.end + "\n");
                            break;
                        }
                    }
                }
            }
            writer.flush();
        }
        catch (IOException e) {
        }
    }

    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)
        throws IOException {
        int intch;


        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();


        StringBuilder builder = new StringBuilder();
        boolean isEnd = false;
        while (!isEnd) {
            intch = reader.read();
            if (intch == -1) {
                return words;
            }
            char c = (char) intch;
            if (Character.isLetter(c)) {
                builder.append(c);
            }
            else {
                String value = builder.toString();
                isEnd = value.equals(END);
                if (builder.length() > 0 && !isEnd) {
                    PriorityQueue<Integer> priorityQueue = words.get(value);
                    if (priorityQueue == null) {
                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {
                            public int compare(final Integer o1, final Integer o2) {
                                return o2.compareTo(o1);
                            }
                        });
                        words.put(value, priorityQueue);
                    }
                    priorityQueue.add(++position);
                }
                builder = new StringBuilder();
            }
        }
        return words;
    }


    static class WordsDataStructure {
        String maxWord;
        int maxPostition = Integer.MIN_VALUE;
        int minPosittion = Integer.MAX_VALUE;


        public void addWord(String word, int position) {
            if (position > maxPostition) {
                maxPostition = position;
                maxWord = word;
            }
            if (position < minPosittion) {
                minPosittion = position;
            }

        }
    }


    static class Interval implements Comparable<Interval> {
        final Integer start;
        final Integer end;


        Interval(final int start, final int end) {
            this.end = end;
            this.start = start;
        }

        public int compareTo(final Interval o) {
            int compareResult = (end - start) - (o.end - o.start);
            if (compareResult != 0) {
                return compareResult;
            }
            return start.compareTo(o.start);
        }
    }
}
brianfry713
Guru
Posts: 5947
Joined: Thu Sep 01, 2011 9:09 am
Location: San Jose, CA, USA

Re: 11860-Document Analyzer

Post by brianfry713 »

Input:

Code: Select all

10
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
a b c d e
END
100
Correct output:

Code: Select all

Document 1: 1 5
Document 2: 1 5
Document 3: 1 5
Document 4: 1 5
Document 5: 1 5
Document 6: 1 5
Document 7: 1 5
Document 8: 1 5
Document 9: 1 5
Document 10: 1 5
Check input and AC output for thousands of problems on uDebug!
sith
Learning poster
Posts: 72
Joined: Sat May 19, 2012 7:46 pm

Re: 11860-Document Analyzer

Post by sith »

Thank you for your case. It helped me to fix one bug.
In your case you have 100 at the and of input - is it typo?

But I've still get TLE :((((((((((((((((


Code: Select all

import java.io.*;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;

class Main {

    public static final String END = "END";

    public static void main(String[] args) {

        BufferedReader reader = new BufferedReader(new InputStreamReader(System.in), 1024 * 1024 * 9);

        BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(System.out));
        int intch;
        try {
            while ((intch = reader.read()) != -1) {

                if (intch == 0) {
                    break;
                }

                Integer count = null;
                try {
                    StringBuilder builder = new StringBuilder();

                    char intch1 = (char) intch;
                    while (intch1 != '\n') {
                        builder.append(intch1);
                        intch1 = (char) reader.read();
                    }
                    count = Integer.valueOf(builder.toString().trim());
                } catch (NumberFormatException e) {
                    continue;
                }

                for (int i = 1; i <= count; i++) {
                    int position = 0;

                    Map<String, PriorityQueue<Integer>> words = readTheWords(reader, position);


                    if (words == null) {
                        return;
                    }

                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();
                    while (true) {


                        WordsDataStructure dataStructure = new WordsDataStructure();

                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {
                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());
                        }
                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));

                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);
                        if (integers.size() > 1) {
                            integers.poll();
                        } else {
                            Interval poll = resultQueue.poll();

                            writer.write("Document: " + i + ": " + poll.start + " " + poll.end + "\n");
                            break;
                        }
                    }
                }
            }
            writer.flush();
        } catch (IOException e) {
        }
    }

    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)
            throws IOException {
        int intch;


        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();


        StringBuilder builder = new StringBuilder();
        boolean isEnd = false;
        while (!isEnd) {
            intch = reader.read();
            if (intch == -1) {
                return words;
            }
            char c = (char) intch;
            if (Character.isLetter(c)) {
                builder.append(c);
            }
            else {
                String value = builder.toString();
                isEnd = value.equals(END);
                if (builder.length() > 0 && !isEnd) {
                    PriorityQueue<Integer> priorityQueue = words.get(value);
                    if (priorityQueue == null) {
                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {
                            public int compare(final Integer o1, final Integer o2) {
                                return o2.compareTo(o1);
                            }
                        });
                        words.put(value, priorityQueue);
                    }
                    priorityQueue.add(++position);
                }
                builder = new StringBuilder();
            }
        }
        return words;
    }


    static class WordsDataStructure {
        String maxWord;
        int maxPostition = Integer.MIN_VALUE;
        int minPosittion = Integer.MAX_VALUE;


        public void addWord(String word, int position) {
            if (position > maxPostition) {
                maxPostition = position;
                maxWord = word;
            }
            if (position < minPosittion) {
                minPosittion = position;
            }

        }
    }


    static class Interval implements Comparable<Interval> {
        final Integer start;
        final Integer end;


        Interval(final int start, final int end) {
            this.end = end;
            this.start = start;
        }

        public int compareTo(final Interval o) {
            int compareResult = (end - start) - (o.end - o.start);
            if (compareResult != 0) {
                return compareResult;
            }
            return start.compareTo(o.start);
        }
    }
}
brianfry713
Guru
Posts: 5947
Joined: Thu Sep 01, 2011 9:09 am
Location: San Jose, CA, USA

Re: 11860-Document Analyzer

Post by brianfry713 »

I intentionally put 100 at the end of the input to demonstrate an issue with your code. You should only read the number of documents on the first line and then quit after reading that number of documents. Often the problems on this website will have garbage at the end of the input to see if you are following the problem statement and reading the input correctly.
Check input and AC output for thousands of problems on uDebug!
sith
Learning poster
Posts: 72
Joined: Sat May 19, 2012 7:46 pm

Re: 11860-Document Analyzer

Post by sith »

Thank you.
I fixed It, but TLE has been happend again :(((

Code: Select all

import java.io.*;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;

class Main {

    public static final String END = "END";

    public static void main(String[] args) {

     BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));

        BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(System.out));
        int intch;
        try {
            while ((intch = reader.read()) != -1) {

                if (intch == 0) {
                    break;
                }

                Integer count = null;
                try {
                    StringBuilder builder = new StringBuilder();

                    char intch1 = (char) intch;
                    while (intch1 != '\n') {
                        builder.append(intch1);
                        intch1 = (char) reader.read();
                    }
                    count = Integer.valueOf(builder.toString().trim());
                } catch (NumberFormatException e) {
                    continue;
                }

                for (int i = 1; i <= count; i++) {
                    int position = 0;

                    Map<String, PriorityQueue<Integer>> words = readTheWords(reader, position);


                    if (words == null) {
                        return;
                    }

                    PriorityQueue<Interval> resultQueue = new PriorityQueue<Interval>();
                    while (true) {


                        WordsDataStructure dataStructure = new WordsDataStructure();

                        for (Map.Entry<String, PriorityQueue<Integer>> wordAndPositions : words.entrySet()) {
                            dataStructure.addWord(wordAndPositions.getKey(), wordAndPositions.getValue().peek());
                        }
                        resultQueue.add(new Interval(dataStructure.minPosittion, dataStructure.maxPostition));

                        PriorityQueue<Integer> integers = words.get(dataStructure.maxWord);
                        if (integers.size() > 1) {
                            integers.poll();
                        } else {
                            Interval poll = resultQueue.poll();

                            writer.write("Document: " + i + ": " + poll.start + " " + poll.end + "\n");
                            break;
                        }
                    }
                }
                writer.flush();
                return;
            }
        } catch (IOException e) {
        }
    }

    private static Map<String, PriorityQueue<Integer>> readTheWords(final Reader reader, int position)
            throws IOException {
        int intch;


        Map<String, PriorityQueue<Integer>> words = new HashMap<String, PriorityQueue<Integer>>();


        StringBuilder builder = new StringBuilder();
        boolean isEnd = false;
        while (!isEnd) {
            intch = reader.read();
            if (intch == -1) {
                return words;
            }
            char c = (char) intch;
            if (Character.isLetter(c)) {
                builder.append(c);
            } else {
                String value = builder.toString();
                isEnd = value.equals(END);
                if (builder.length() > 0 && !isEnd) {
                    PriorityQueue<Integer> priorityQueue = words.get(value);
                    if (priorityQueue == null) {
                        priorityQueue = new PriorityQueue<Integer>(10000, new Comparator<Integer>() {
                            public int compare(final Integer o1, final Integer o2) {
                                return o2.compareTo(o1);
                            }
                        });
                        words.put(value, priorityQueue);
                    }
                    priorityQueue.add(++position);
                }
                builder = new StringBuilder();
            }
        }
        return words;
    }


    static class WordsDataStructure {
        String maxWord;
        int maxPostition = Integer.MIN_VALUE;
        int minPosittion = Integer.MAX_VALUE;


        public void addWord(String word, int position) {
            if (position > maxPostition) {
                maxPostition = position;
                maxWord = word;
            }
            if (position < minPosittion) {
                minPosittion = position;
            }

        }
    }


    static class Interval implements Comparable<Interval> {
        final Integer start;
        final Integer end;


        Interval(final int start, final int end) {
            this.end = end;
            this.start = start;
        }

        public int compareTo(final Interval o) {
            int compareResult = (end - start) - (o.end - o.start);
            if (compareResult != 0) {
                return compareResult;
            }
            return start.compareTo(o.start);
        }
    }
}
brianfry713
Guru
Posts: 5947
Joined: Thu Sep 01, 2011 9:09 am
Location: San Jose, CA, USA

Re: 11860-Document Analyzer

Post by brianfry713 »

My C++ code uses a map<string,int> and an array, no priority queue. I use an algorithm similar to the one described by Leonid in this thread and get AC in 0.868 sec.
I rewrote my code in JAVA and got AC in 2.044 sec.
Check input and AC output for thousands of problems on uDebug!
Scarecrow
Learning poster
Posts: 69
Joined: Wed Oct 19, 2011 9:06 pm

Re: 11860-Document Analyzer

Post by Scarecrow »

please help someone. getting WA

Code: Select all

AC
Do or do not. There is no try.
uDebug
A great helper
Posts: 475
Joined: Tue Jul 24, 2012 4:23 pm

Re: 11860-Document Analyzer

Post by uDebug »

Leonid wrote: Suppose you have an interval of words [a, b], where b > a. There are 2 possibilities: either there are all unique words in the interval or not. If interval contains all unique words, then we can shorten the interval to [a + 1, b]. If there aren't yet all words then we should increase the length of the interval: [a, b + 1]. The solution can be found by starting with interval [0, 0], and moving this interval towards [n, n]. It has complexity O(N * LOG(N)), if map is used to lookup similar words, and can be increased to O(N) by using hashes (disregarding word length).
Thank you very much, Leonid. This was very useful.
Check input and AC output for over 7,500 problems on uDebug!

Find us on Facebook. Follow us on Twitter.
Post Reply

Return to “Volume 118 (11800-11899)”