Creating a python crawler
$10-30 USD
Paid on delivery
For a new project, I am looking for a guy who can programm a python crawler.
The python crawler shall crawl through the web and save the all domain names with certain top-level-domain endings.
There shall be the option to easily define the "top-level-endings" (e.g. "de" / "at" / "ch")
The found secondlevel domains are stored together with the TLD in a database.
Domains with top level ending "de" are saved in a table 1.
Domains with top level ending "at" are saved in a table 2.
Domains with top level ending "ch" are saced in a table 3.
(and so on)
If a domain name doesnt fit the setting of the TLD, it is not stored anyway.
ATTENTION please:
Just the secondlevel domain with the TLD (example: "[url removed, login to view]") has to be stored, NOT every single URL that can be found on a certain secondlevel domain. (e.g. [url removed, login to view]; [url removed, login to view], ...)
Project ID: #5560761
About the project
2 freelancers are bidding on average $38 for this job
Hello sir, i propse you a python script, where you will give as input an url, and the script will fetch the url and discover every href in it and , if the TLD is in your list, will save it in the db. Then, second step, More
I have previously implemented web crawlers using python. Adding the extra features required will not be an issue. My code is guaranteed to be well commented and simple for possible future developments.