A PHP script reads domains to be crawled from the database table t_domain. The script must honor the domain’s robot.txt. We want to recursively collect all links (html a element) from the domain up to a depth of 5 from the entry point. Only local links should be followed. Only links to text/html should be followed (via header check). Only follow up to 100 links per page. Do not wait longer than 10 seconds for a page to load. Every link found (either local or pointing to a different domain) will be stored in the table t_links. The following things should be stored: timestamp of crawl, full URL, the ID from t_domain of the domain the link was found on, the ID from t_domain the link points to. If the destination domain does not exist yet it must be added to t_domains.
Once a domain has been completely crawled a timestamp is added to the domain in t_domain.
Then the next domain is select from t_domain to be crawled. The next domain is defined aa having no timestamp and having the lowest id.
This does not have to be completely from scratch. We recommend using an existing framework like: [login to view URL] or [login to view URL] or another project of your choosing. The important part for us is to collect the links and the domains.
We will provide a server with PHP installed and a database, preferably MySQL. This server can be used for testing.
Hi there.....
Warm Greetings
We came along with your request for Collecting Domain Names via Web Crawler and we reviewed your project description. We'd like to help you with confidence and satisfying results...
We have professionals working here with 100% results and more creative and renovative ideas for our clients !
We have worked on several similar projects before!
We are offering our services for more than 5 years in the field of MySQL, PHP, Software Architecture, Web Scraping
We have worked on 300+ Projects. Please check the profile reviews
Feel free to message us to discuss briefly about your project !
Hi there, I'm a London based developer with a lot of experience in development of complex projects. I can create4 a project for you in php or python (multithreaded). Please drop me a message if you would like to discuss your requirements.