Find Jobs
Hire Freelancers

Simple web crawler + MySQL database

$100-500 USD

Cancelled
Posted over 17 years ago

$100-500 USD

Paid on delivery
We need an experienced MySQL 4.1.14 & PHP 4.3.9 developer to write a basic web crawler that uses a MySQL DB. We have a PHP script that parses web pages, but it must be changed to save data to MySQL. PHP web crawler (see diagram) -Written in PHP 4.3.9 using OOP, must be flexible & well documented, must return success or failure outcome & fail gracefully: doesn’t break if errors occur & returns error status Running our PHP script *List of URLs to be crawled created & passed to Queue each time script runs *Date & time of each URL’s successful completion is recorded in DB Queue -FIFO Queue of URLs to be crawled. Rule for adding to Queue: URL has not been crawled before OR URL was last crawled over [60] days ago -[60] day time frame must be flexible so it’s easy to change: NO hardcoding -Each item must have status field: Empty status (not been touched), Pending status (currently processing), Failed status (processing failed). Only process Queue items w/ Empty or Failed status -If script succeeds, remove URL & place in Archives. If script fails, URL stays in Queue to be re-crawled later. Our script saves current state in temp files on failure so re-crawling can resume using same state, but it needs to be changed to save state to DB Scheduler -Use Linux crond daemon to run web crawler every [30] seconds (NO PHP daemon). [30] second time frame must be flexible so it’s easy to change: NO hardcoding -Scheduler to be optimized to run max of [2] concurrent sessions of our PHP script. Max concurrent sessions must be flexible so it’s easy to change: NO hardcoding -Scheduler starts new crawling session IF: we haven't reached max concurrent sessions AND Queue has URL with Empty status Build MySQL 4.1.14 DB -DATA MODEL DESIGN MUST BE APPROVED BEFORE ANY DB WORK IS STARTED. Data model will incl. these entities: *Queue *Archive of successfully completed items *Meta data we parse out when crawling *Crawling session state (only used when crawling fails & current state is saved) ## Deliverables 1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done. 2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables): a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment. b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request. 3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement). ## Platform LAMP environment
Project ID: 2820275

About the project

5 proposals
Remote project
Active 17 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
5 freelancers are bidding on average $183 USD for this job
User Avatar
See private message.
$212.50 USD in 7 days
4.7 (229 reviews)
6.2
6.2
User Avatar
See private message.
$314.50 USD in 7 days
5.0 (26 reviews)
3.9
3.9
User Avatar
See private message.
$114.75 USD in 7 days
3.9 (9 reviews)
3.4
3.4
User Avatar
See private message.
$102 USD in 7 days
4.9 (12 reviews)
2.4
2.4
User Avatar
See private message.
$108.80 USD in 7 days
4.3 (11 reviews)
2.4
2.4
User Avatar
See private message.
$170 USD in 7 days
3.5 (2 reviews)
0.7
0.7

About the client

Flag of
4.8
46
Member since Dec 21, 2006

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.