MIA: Admin: Janitor

How to Convert Word to HTML

Suggestions from 2006

1. Use the program HTML Tidy with its word2000 option

2. Find & Replace in Word: use the find function to find italized text, then use replace with <em>^&</em>. Then use a word macro, or script in your favorite text editor, to replace all the 'smart' quotation marks, character encodings, etc.

3. Save as an RTF file in word, then use DocFrac to convert it to HTML.

Suggestions from 2002

I don't know if this was in the discussion, but it should be archived. An Office HTML Filter can be downloaded from:


To filter a document in Word 2000

1. Install the HTML Filter software. 2. Open the document you want to save as HTML and filter. 3. On the File menu (in Word) point to Export To and click Compact HTML (you can also select a CSS file).

There are still some MS tags after filtering, but not nearly as many. Macromedia Dreamweaver also filters doc tags, but is very expensive software.


Tidy does a good job of fixing bad MS HTML.

Also OpenOffice seems to be able to open many MS documents and although it's 'save as HTML' is also fairly crap at least it's free.


[1] http://tidy.sf.net/

[2] http://wwww.openoffice.org/

I never use word anymore - too many of these kinds of headaches. But Abiword appears to give a good clean html ouput from a doc or rtf file.



HoT MetaL Pro is a great choice too. Unfortunately it is a quite expensive tool.

It's current version is 6.0.3, but even 4.0, which I recieved as a freebie in a Norwegian computer magazine works fine (I've tried Open Office too, but I like HoT MetaL better). A trial version (30 days) is possible to download from www.hotmetalpro.com, though.


I do it always in a quite simple manner.

1. Run M$ Word and save to the memo whole text 2. Run M$ WordPad and paste and another once cut 3. Run M$ FrontPage and paste

That's all. This allows me to save any italics and bold that originally were in the text. Besides that I like FpontPage as the transformation to MIA CSS goes soothly there and it doesn't add any rubbish HTML tags, does it?

Greetings, Wojtek

Oh yes. I had a hell of a time with frontpage awhile back. Everything looks ok if you don't venture outside of microsoft reality, but once you do your world falls apart. The problem being that if one is not careful creating a page with frontpage to view it properly is contingent on having word installed. I would get all the word crap out and frontpage would automatically insert it again.

A few big problems,

1) mso fonts being a pain or impossible to get rid of. 2) Changing css (mia) does not get rid of microsoft css. 3) Microsost css used active x to display properly arrrrghhhhhhhhhhh

:-) nate

====>True, but in the case of DIRECT transformation from Word to FrontPage, you. If you want to do this properly, than just be sure that you don't ommit cut-paste in WordPad. If afterwards something iq not OK with font or its size then just select from their menus option "default font" and "normal" respectively. If you want to know if it's possible or not to make this, than just look at the codes of things published at Polish Section MIA - they're all in FrontPage. I like this programme so much because of the easy way in which MIA CSS can be inserted - just point the paragraph and for expample select "quote" from the list and all is done! Although this is a product of one of the most vicious capitalist firm, tthis is a relatively good product. :)

Greetings, Wojtek