Cleaning Word/Excel HTML

time to read 1 min | 78 words

If you ever viewed the output of the HTML from Word/Excel, you've noticed that it's ugly, and often only works on IE. Here is a simple trick to clean it up using regexes (you can use it in front page, visual studio, notepad2, etc).

For Excel, replace "<td [^>]+>" "<td>", and for Word replace "<p [^>]+>" "<p>".

It's the easiest way I know to clean up large documents.