Here is an example of a page -- <[login to view URL]> -- [there should be no spaces in this link -- if there are, remove them by hand] from which I need to collect the data. I need to collect records for each of the individuals listed there: the top-level information (name, certification, experience), and the information from the "personal information" drop-down. Note that the server serves only 10 individual records on a page, so I would need the information from the rest of the pages (see the link "2" on the bottom of than page, for two more individuals).
All the fields for each individual should be parsed and recorded into a well-formatted CSV file (one line record per individual).
The crawler should behave in a human-like fashion, with a few second delay between each page request.
In addition, I would need to collect the picture for each individual, each in a separate file, with a name that clearly connect the individual to a record in the CSV file.
There are 11499 pages that are very similar to the sample page I reference here. I will provide the list of pages.
The successful project will deliver:
1. The well-formatted CSV file with a line for every individual record on every page.
2. A folder with image files, each corresponding to a line-item record in the CSV file, via the image file name.
?
I also would like to retain the code and the rights to the code for the crawler/parser, but I do not particularly care which language it is written in.