Distributed web page scraper (preferably on EC2)

Closed Posted Aug 26, 2010 Paid on delivery
Closed Paid on delivery

As input to your script, I have a list of about 1M URLs. I want these URLs scraped, and inserted into a database. You do NOT need to recursively crawl the URLs. You just need to retrieve them.

I want a distributed scraper. In particular, I want to give a parameter N, and have the script automatically provision N scrapers, maybe N different Amazon EC2 instances, or some other cloud service. The N instances should avoid doing the same work.

I don't care you write a wrapper script around Scrapy, or another existing web scraper implementation. You can do this if you already know Scrapy or Bixo and want to use it.

The script should really require very little configuration. It should be convenient and one-click if possible. That way, the next time I have a batch of 1M URLs, I can easily run your script.

Amazon Web Services Engineering Java Linux Project Management Python Script Install Shell Script Software Architecture Software Testing

Project ID: #3680209

About the project

14 proposals Remote project Active Dec 16, 2010

14 freelancers are bidding on average $218 for this job

ddemidenko

See private message.

$255 USD in 14 days
(72 Reviews)
6.1
johnweavervw

See private message.

$170 USD in 14 days
(62 Reviews)
5.4
mlys

See private message.

$254.15 USD in 14 days
(36 Reviews)
5.7
alexferechin

See private message.

$233.75 USD in 14 days
(17 Reviews)
5.3
happytron

See private message.

$212.5 USD in 14 days
(9 Reviews)
4.8
happydotnet

See private message.

$235.45 USD in 14 days
(20 Reviews)
4.3
app2technologies

See private message.

$255 USD in 14 days
(16 Reviews)
3.9
readyfacts

See private message.

$212.5 USD in 14 days
(37 Reviews)
4.4
kwovw

See private message.

$254.15 USD in 14 days
(2 Reviews)
3.9
quintonwebz

See private message.

$204 USD in 14 days
(6 Reviews)
3.6
napoleonmr

See private message.

$255 USD in 14 days
(2 Reviews)
2.8
richmondcd

See private message.

$127.5 USD in 14 days
(2 Reviews)
0.7
woolee

See private message.

$170 USD in 14 days
(0 Reviews)
0.0
bryano

See private message.

$212.5 USD in 14 days
(0 Reviews)
0.0