Full instructions are in [login to view URL]
1/implement two classes that can read individual documents from trectext and trecweb format collection file.
• [login to view URL] is a general interface for sequentially reading documents from collection files
• [login to view URL] is the class for trectext format
• [login to view URL] is the class for trecweb format
2/implement classes to tokenize document texts into individual words, normalize all the words into their lowercase characters, and finally filter stop words.
• [login to view URL] is a class for sequentially reading words from a sequence of characters
• [login to view URL] is the class that transform each word to its lowercase version
• [login to view URL] is the class that can recognize whether a word is a stop word or not. A stop word list file will be provided, so that the class should take the stop word list file as input.
Hi, I have read the requirements and the instructions and I find it doable which is why I am placing my bid. Let me know if you are interested so we can talk about further details. Thanks.
$35 USD in 1 day
4.9 (544 reviews)
7.0
7.0
2 freelancers are bidding on average $40 USD for this job