Convert Word 2007 file to Clean HTML Using jQuery/JavaScript
$30-100 USD
Cancelled
Posted over 14 years ago
$30-100 USD
Paid on delivery
I have a partially working solution for this project however it does not handle bulleted/numbered lists properly and I would like to added functionality.
Word 2007 allows you to save files as Filtered HTML which is intended to be clean HTML files but it is not. Instead this HTML contains many CSS styles and markup tags that I want removed. To see what I mean by this, save a Word 2007 file in Filtered HTML format then open the new file in Notepad.
The partially completed files attached attempt to save a Word 2007 file as clean HTML by:
1) Opening the [login to view URL] file
2) Enable macros security/option
3) Paste text to be converted (already done as part of the file or could paste new testing text)
4) Run the macro titled "CleanWordtoHTML"
The macro:
1) saves the Word file as a [login to view URL] file of Filtered HTML format
2) opens an IE browser window
3) loads the "[login to view URL]" webpage into the browser
3) populates the first textarea of the webpage with the contents of [login to view URL]
4) executes the webpage's Javascript function which runs various jQuery based find/replace regular expressions
5) populates the second textarea with this clean html
6) copies the contents of the second textarea into the Windows Clipboard
7) closes the IE browser window
If you run the macro and then paste the clipboard contents into a Notepad file, you will see the following problems:
- the bulletted lists are saved a <p> not as <ul> list items
- the numbered lists complete lose their list format
- all indenting is lost
- all font colors are lost
- all heading formatting is lost (converting Word's heading1, heading2 styles to HTML css of head1, head2)
I need the above problems as well as the items below corrected:
- all of the text before "(Substituted)" removed from the final HTML file/text
- the final HTML text should have the text after the "Overview" heading but before the "Before you begin" heading moved right below the Title line
- the final HTML should not have any <html>, <head>, <meta>, <body> tags also
## Deliverables
Right now the files attached use a Word 2007 VBA macro along with jQuery to do all of the text manipulation. You can modify this code or create new VBA code to the do the same, either method is fine.
If you have any questions, please ask. I am hoping to get this work done over the weekend if possible, please be sure to mention how long it will take in your bid. Thank you.