Creating a python crawler

In Progress Posted Mar 15, 2014 Paid on delivery
In Progress Paid on delivery

For a new project, I am looking for a guy who can programm a python crawler.

The python crawler shall crawl through the web and save the all domain names with certain top-level-domain endings.

There shall be the option to easily define the "top-level-endings" (e.g. "de" / "at" / "ch")

The found secondlevel domains are stored together with the TLD in a database.

Domains with top level ending "de" are saved in a table 1.

Domains with top level ending "at" are saved in a table 2.

Domains with top level ending "ch" are saced in a table 3.

(and so on)

If a domain name doesnt fit the setting of the TLD, it is not stored anyway.

ATTENTION please:

Just the secondlevel domain with the TLD (example: "[url removed, login to view]") has to be stored, NOT every single URL that can be found on a certain secondlevel domain. (e.g. [url removed, login to view]; [url removed, login to view], ...)

MySQL PHP Python Software Architecture

Project ID: #5560761

About the project

2 proposals Remote project Active Mar 15, 2014

2 freelancers are bidding on average $38 for this job

pythonpower

Hello sir, i propse you a python script, where you will give as input an url, and the script will fetch the url and discover every href in it and , if the TLD is in your list, will save it in the db. Then, second step, More

$45 USD in 3 days
(6 Reviews)
3.4
rboshra

I have previously implemented web crawlers using python. Adding the extra features required will not be an issue. My code is guaranteed to be well commented and simple for possible future developments.

$30 USD in 3 days
(0 Reviews)
0.0