Linux Automated Image Manipulation - clean up scanned pages from books
$5000-25000 USD
Closed
Posted about 13 years ago
$5000-25000 USD
Paid on delivery
I require a command-line only executable or script to run on Linux CentOS 5.5 x86_64 to automatically clean up scanned pages of old books into what will be new printable-quality books. The book must only be black & white, the text must be black, the background must be white, the images must be 2-bit. It must look as professional as possible and not obviously have been produced from scans, which will be of varying quality.
The server is very fast and has 12GB of RAM.
This is a slightly larger project than it first looks. This is a reasonably advanced coding job, you can't just put it together with a few calls to GhostScript or ImageMagick - it is more complicated than that. I want as high-quality as possible and most of the project will likely be dedicated to tweaking settings, and perhaps coding some clever routines for cleaning up the images that little bit more. It must be perfect. This is why I have given an estimated bid range of $5000+. I really expect 110% on this project, which is why I am prepared to pay this much, even though I know I could get the same done for around $1,000. I'm paying extra for dedication, and I would expect this project to be worked on full-time until it is complete, instead of just one project in the middle of several others.
The coder must be experienced with PDF specification, image manipulation and cleaning up scanned documents.
Any dependencies must be supplied with installation instructions where required.
The full spec will be given if you reply to this bid request. Following is a rundown of what is required:
Input will be in PDF or DjVu file. You must be able to extract images and text out of the PDF object streams, which may require rebuilding the headers.
Prediction of title page, title, author, last page, etc. will be required.
Removal of blemishes - which is more difficult than it sounds.
General image manipulation - trim, resize, etc. (possibly using GIMP2 in batch mode - not difficult.)
Basically, you start with a low-quality scan of a book and make it look like new.