Need to gather information on attorneys from 10 different websites. This will involve crawling the sites with a multi-threaded program you develop to gather all the records and store them in a MySQL database.
Some of the fields we'd like collected and normalized include:
State Bar + Bar Admission Date (could be multiple)
Attorney ID (if present)
County (if present)
Email (if present)
Picture (if present)
Website (if present)
Description (if present)
Phone(s) Email (if present)
1) When website URL present but no email, crawl Website and sub pages to find email address. This could be done as phase 2 or preferably phase 1 if you have a way to do it well.
2) Very few sites have Captcha. It would be helpful if you have worked with Depatcha before, but not a requirement.
Part of this will also include crawling [login to view URL] and we'd like these mapped out into separate tables as well:
* Nearby Cities - seperate table mapping by city
* Related Practice Areas - seperate table mapping by practice area
* Common Legal Issue? True or False
* Top County? True or False (find on home page)
* Top City? True or False (find on home page)
I will relay the exact websites we need crawled in private so you can review and bid accordingly. I have already mapped out how to do it for 95% of them. We just need someone to write the code to scrape and parse the sites programmatically.
24 freelancers are bidding on average $150 for this job
Hi Dear Client. I am expert Data Miner and Web developer. I have experienced with data mining and site crawling and automatics uploading. I can complete your job quickly and perfectly. Thanks for your time.
Hello, I can Crawl 10 State Bars , Please see my reviews, i can delivered the work with 100% accuracy, Please ping a message for further discussion, Thanks.