Hi Guys,
I looking for someone to work on a scrapy project for me. I need a simple generic crawler that will start at a given domain and on each page (only within that domain) extract
->the anchor (or anything between the )
->and corresponding link
The crawler need to be able to pull the domain to start crawling from a MYSQldb and one other variable which would need to be pass back as a value when the results back to a database
It must allow for more than one spiders to be running at the same time as well as I'll have it on a cron job. It should work something like
START SCRIPT
CONNECT TO DB
SELECT FROM TABLE
WHILE(TRUE)
GET URL PLUS CORRESPONDING DOMAIN-ID VARIABLE FROM TABLE
START NEW SPIDER
LOAD URLS
EXTRACT ALL URLS AND ANCHORS FOUND ON EACH PAGE
SAVE RESULT TO DB (insert into %s set myurl, myanchor, urlid value ( url, anchor,%s domain-id)
LOOP
When each spider is done crawling I need it to update another table to say its finished
update crawldone where id = %s,domain-id
If you already have a scrapy spider running and you can modify it to do something similar that's nice as well
Hi,
Do you like me do develop a fully working demo for you with vb.net 2010?
Before you ask me for that you might like check out one of my scrapers attached herewith.
Eagerly waiting to hear from you.
Shamim Hossain
Hi,
I have delivered many projects successfully for web scraping + MYSQL using scrapy tool in python. It will be developed as a utility which can be used in future also for extracting data. I can quickly deliver this task. Please let me know when to start on this.
Thanks