Need someone to write a script to pull data out of here: [login to view URL] This is a public database, but it is a really crappy design - probably because they don't want people data mining it. For example, there are 545,309 labor condition applications filed in FY 2015. But the system will not display more than 1,000 results. This means the script that scrapes the database via this web interface must:
1. Use a key of employer names that I can provide from a separate Excel spreadsheet file that Department of Labor already provides (but is missing the data I want)
2. Enter the employer name in the form with the start and end date for certification, then press search
3. The results page will include a small JPEG that says "HTML" on it and clicking it generates an HTML page - I need the script to click on one of these "HTML" images to generate the HTML page
4. I need the script to scrape the name of the signatory, the phone number of the signatory and the e-mail address of the signatory and put that in a data file (XML, Excel, MDB, doesn't matter) in the same record as the employer's name
5. I need the script to do this probably 150,000 times or so, but it would be nice if I can customize it to do this again in the future with different "certificate date" ranges.
What I want is a data file that has the company name, signatory name, signatory phone number and signatory e-mail address. What I have is a data file that has company names and this crap web interface that makes it almost impossible to get the data out of there unless you sit there and enter one company name at a time and do it manually, which will take 500 years.
The developer will probably have to test the script because I don't know how fast the DOL UI will respond to inquiries, if the script can execute multiple inquiries at the same time, etc. I don't think DOL is sophisticated enough to block an IP address that is making thousands of requests, but I'm not sure. The script will have to be optimized to make inquiries fast enough to complete the data scrape before I am an old man, but slow enough to not freak out DOL's server.
Hi, I have done many scraping projects in C# & PYTHON..I have also worked with APIs for scraping as well...I have read the description & would like to discuss further..