We are looking for a crawler to crawl every page of a website looking for external links pointing to expired domains.
User should definde a list of sites to crawl via text file. Crawler should work logically crawling all pages of a site and not be sitemap dependent. Only unique external domains should be logged to prevent duplicate domain availability lookups.
User should also be able to define a list of urls to ignore checking for availability; eg. [login to view URL] etc. these domains should be user defined in a blacklist text file.
Results should be given in a csv file listing linking domain and available domain.
5 freelancers are bidding on average $65 for this job
hi sir, i'm a python developer. i read the project explanation, actually i've programmed kind of a site crawler before, there should a broken-link finder there shouldn't be problem, can do it for you. thanks
Hey, I think i may be able to help you out on this project since i have worked on a similar project at work. Im new to freelancer, though i have years of experience in software development in python.