First non-repeat word in a file
Last updated
Was this helpful?
Last updated
Was this helpful?
divide files into N chunks
read each chunk, use a map to store each word
key is word(string), value is [file_position, count],
use map reduce to aggregate from several chunks, The counts will be aggregated and the minimum file_position would be chosen.
then get all words which count == 1
sort by file_position to get first non-repeating word