I have a txt (or searchable PDF, if preferred) document containing directory entries for a series of firms in an industry (including company name, address, employees, client types, etc).? The entries are listed consecutively in page style, rather than in a table or database.? I am looking for someone to turn this txt file into a spreadsheet/database (CSV or excel), where the columns are the information headings (company name, address, city, state etc) and the rows are the individual companies.?
There are 7 separate files, each approximately 800-900 pages, representing approximately 750 companies each year.? Sample pages are available in the "other files" section. Please note that headings are not consistent across all entries. This is most obvious in the job titles, but also appears in the company information. These different headings need to be extracted as separate columns. For example: "Broker-Dealer RRs: 300" is 1 column, while "Broker-Dealer RRs: 20 Institutional, 30 retail" would be two additional columns, even though they start from the same heading "Broker-Dealer RRs". The most critical information is the company information. The job titles are important but not critical.
## Deliverables
1) Complete and fully-functional working program(s) in executable form (i.e. a macro I can use in the future or a way to re-run the program on the files on my own machine) as well as complete source code of all work done.? Final CSV or excel document for each of 7 volumes.
2) Deliverables must be in ready-to-run condition, as follows? (depending on the nature? of the deliverables):
a)? For web sites or? other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software? installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
I use Windows XP and Microsoft Office (Excel), as well as SAS and Stata.