Attached is an outdated script that is used to scrape two sites. ? The sites have not had huge changes but enough to break the scripts. ? You may or may not wish to use the existing script, you may find it easier to start new.
Below is a description of what is needed. ? This script does not have a scheduling function as described below so it will have to be added...
The following to example sites are what the script is made to scrape from.
[login to view URL]
[login to view URL]
The scripts must be dynamic and able to scrape all of the different universities. The urls to the universities are already available in a MySQL database.
The scraper should scrape the following data,
Term
Department
Course
Section
Title
Author
Publisher
ISBN
NewPrice
UsedPrice
and insert them into a mySQL database table. You will see after following the above links that in order to manually browse to all of the possible books, you have to use the pull-down menus.
The script must also be able to stop and restart where it left off because there is a lot of data and server errors may occur. A scheduling feature may be needed to help this.
Please feel free to ask any questions before bidding.
## Deliverables
1) All deliverables will be considered "work made for hire" under U.S. Copyright law. Employer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the employer on the site per the worker's Worker Legal Agreement).
2) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
3) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Employer's environment--Deliverables must be installed by the Worker in ready-to-run condition in the Employer's environment.
b) For all others including desktop software or software the employer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this project.
* * *This broadcast message was sent to all bidders on Wednesday Nov 24, 2010 10:23:31 PM:
The second website link was not added correctly. it should be [login to view URL]
## Platform
apache php 4 mysql