Looking for a website copier that has the following features.
1) Exclude certain paths/extention's from being crawled.
2) Filter out javascript/certain text strings
3) Dynamic renaming of pages
4) Ability to set refererer on each page crawled/browser type.
5) Maximum level to crawl
6) Limit to domain only or visit and crawl outside domains.
7) Ability to rename file extentions .html to php for instance.
8) I will consider hacks of existing 3rd party solutions for this project.
9) Ability to set a start and end text to search for and only get the content within that start and end text.
10) Ability to specify new colors/formating of content.
11) Basicly this project is for getting public domain content/articles from remote websites and making it my own.
12) Ability to create a new sitemap with just the content i am looking for.
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
Program will install all needed libraries/3rd party progarms on a unix system.
Mainly linux