# First non-repeat word in a file

## Find first non repeating word in 100GB file

1. divide files into N chunks
2. read each chunk, use a map to store each word
3. key is word(string), value is \[file\_position, count],&#x20;
4. use map reduce to aggregate from several chunks, The counts will be aggregated and the minimum file\_position would be chosen.
5. then get all words which count == 1
6. sort by file\_position to get first non-repeating word

{% embed url="<https://leetcode.com/discuss/interview-question/system-design/124858/First-non-repeating-word-in-a-file-File-size-can-be-100GB>." %}
