I'm looking for someone to configure a job search engine, similar to [url removed, login to view], smaller but similar.
PROJECT MAIN OBJECTIVE
To develop a fully customizable, fully operational and easy to manage (through a graphic user interface) job search engine like [url removed, login to view] - similar as capacity, accuracy, search speed and features.
I'm thinking about two main objectives:
Configure Nutch&Hbase with hcode to crawl and store crawled pages (pages will be crawled and stored in RAW HTML format.) - The nutch crawler should be able to crawl around 10000 websites dailly.
Pages collected will be than separated into two categories: JOB POSTING PAGES and NON JOB POSTING PAGES using Apache Mahout, GATE or UIMA - whatever reaches best accuracy and speed.
JOB POSTING PAGES in RAW HTML FORMAT will be than pushed into a CDH 5 - Claudera EXPRESS & Claudera Search machine (single node) and indexed so that users from the web will aceess the Solr index through a very simple interface (see [url removed, login to view]). The Claudera EXPRESS & SEARCH CDH5 Machine should be configured on a single node and in such a way that it would permit very fast search and management of about 10 mil RAW HTML pages.
Web user - Query should be very simple and very very fast. After query the users from the web should see a list like the one below and be able to fallow the links to the original job posting websites:
[url removed, login to view]
This is a very brief form of the project.
PLEASE CHECK THE DETAILED VERSION OF THE PROJECT ATTACHED AND LET ME KNOW OF YOUR COMMENTS AND OFFER.
12 freelancers are bidding on average $3858 for this job
Thanks for inviting me to your project, but it was long ago since I used nucth. I guess I could get back to it, but I would need some extra time. Regards, Sergio.