Web crawler needed. LAMP environment. Simple quick job for experienced programmer.
$100-500 USD
In Progress
Posted over 12 years ago
$100-500 USD
Paid on delivery
We require a crawler to crawl real estate related websites in the UK. We want to monitor when the content of listings that we are tracking change and when certain criteria are met, we want to display the details of properties on our website that match certain criteria.
On a daily basis we want to crawl real estate websites that list properties for sale all over the country. This particular project will involve the crawling of only 3 websites in the beginning so that we can test your solution. If it works as required, we intend to scale it up to monitor around 5,000 sites in total by creating a new more comprehensive project.
## Deliverables
We require a crawler to crawl real estate related websites in the UK. We want to monitor when the content of listings that we are tracking change and when certain criteria are met, we want to display the details of properties on our website that match certain criteria.
On a daily basis we want to crawl real estate websites that list properties for sale all over the country. This particular project will involve the crawling of only 3 websites in the beginning so that we can test your solution. If it works as required, we intend to scale it up to monitor around 5,000 sites in total by creating a new more comprehensive project.
The types of sites that the crawler will be required to crawl can be viewed through this portal [login to view URL] - type any UK town to see a list of results, then click through to see a result on the globrix website, then click on the listing to go through to the website that was crawled to provide the listing. All of these sites are ok with crawling taking place on their website and are mainly simple html. Some of them have API. To see more about the technical side of globrix go to <[login to view URL]>
As new properties are added to the sites being monitored the crawler will need to find them and start monitoring them. Also if properties are removed from the web, we will need the properties removed from our system.
The following monitoring is required:
* Time since listing (difference between date of initial listing and today)
* Price changes (up or down)
* Does property still exist?
* Have any of the description details changed?
* Flag whether certain keywords exist in the property description
If any of the above change then our database needs to be updated.
Only properties that fit our criteria are to be displayed on our website - the rest stay in our database and continue to be monitored. For properties that fit the set criteria, we want to save the following information to our database:
* Property ID [new unique identifier created in our DB for each property]
* Source of property [which website did the property come from?]
* Address
* Description
* Images
* Date of first listing
* Price
* Price change since last crawl
* Features of the property (e.g. 2 bedroom, has a garage etc)
These properties are to be displayed on our website with the following additional information:
* Postcode of the property (easily obtainable using the street name displayed on the real estate agents website - there are external websites that provide postcodes if you send address)
* Rental valuation - available by posting property details to external website via its API
* Approximate value - available by posting property details to external website via its API
* Google map - - available by posting property details to Google via its API
Further information..
The crawler:
* Will not crawl portals that aggregate lots of properties from individual real estate agents. It will only crawl the individual websites of real estate agents.
* Will not crawl pages on the agents site that are unrelated to listings of property.
* Exact duplicates are to be removed.
If you know what you are doing, the crawler is a couple of days work using a scripting language like Perl or Python. If you can do this in java and you feel this is the best way for us to go, then please tell me more about your reasoning and I can make a decision.