01. Script should use scrapy so that it could be fast...
02. I will use this on my Ubuntu PC.
03. Script MUst use some logic so that if script is interrupted then it can start from the point where it was interrupted.
04. Script should save content in .html format having only one <p> </p> and all content should go inside the p tag.
05. Script should Save each and every scraped links.
06. On rerun script should skip those files which are already scraped.
07. Once script starts it should start scraping from first link.
08. once script finds 10 continious scraped links script would jump to the last link that is already scraped, and it would
start scraping from next link.
[login to view URL] should create folder named 1 and inside two folders should be created named Question and Answer.
10. In both folders same named html files should be there like [login to view URL] should be in both folders having same content should be there.
[login to view URL] there is any readable attachments then script should also save the content into the same html file.
12. html files name would be [login to view URL], [login to view URL], [login to view URL] etc...
13. if the folder named 1 is already there where question and answers included, then a new folder should be created named 2 and
files should be saved there.
link to target is
[login to view URL]
budget is AUD 10 max
I have an experience in Webcrawlers using beautifulsoup and handles millions of data . But i can also do this using scrapy but according to me beautifulSoup is best to scrap and handle the html contents and placed at specified location
Hi,
I would love to work on this script for you. I have over 4 years experience with Python and scripting in general, and also experience with web-scraping in Python. I've also done a lot of work dealing with web applications so I am well versed in HTML.
I've thought about your problem and I know how I would approach it.
Thanks,
Joe