High-res batch convert EMZ / WMZ grahpics to PNG a.k.a. "liberating your graphics"

Posted by hkwint on May 24, 2013 4:21 AM EDT
LXer.com; By Hans Kwint - The Netherlands
Mail this story
Print this story

LXer Feature: 24-May-2013

Recently I've been trying to 'minimize' some big 'legacy' MS Word files. When using MS Office to export to HTML (unfiltered), I've found out they're so big because they have EMZ (gzipped EMF) and WMZ (gzipped WMF) files in them, accompanied by large "msoledata-thingies". These WMZ's / EMZ's are (zipped) "container" formats, and some of them contain vector graphics, a bit like SVG - but sadly Microsoft doesn't seem to support SVG in MS Office. These EMF / WMG-graphics may contain vector images: That's text, fonts and lines - which has to remain readable of course. I found out replacing this EMZ / WMZ files by PNG files in the Word-documents dramatically reduces size.

Though EMF is a pretty open file format, documentation is available here, EMZ / WMZ files cannot be used cross-platform without hassle, because basically they contain programming instructions for Windows' Graphics Device Interface (GDI). There is a Linux-port of GDI made in Mono, which might help. However, 'transcoding' the GDI-instructions to a more broadly supported graphics format such as PNG or SVG which doesn't need Microsoft-specific code to render may be a good thing, it's a way of "liberating" your graphics from being dependent on the Windows GDI. However, it's only a good idea depending on what you need: You have to manually check your "converted" files if they're what you want. Please remember that converting from a "vector-format" like EMF may be to a "raster format" like PNG will not be lossles.

Because this has to be done at my current job, I have to use Windows 7. So, I want to use preferably portable free software. I want to batch-convert these files to PNG, preferably with a resolution of 300 DPI, as that will probably lead to readable results - even if printed.

Sadly, after trying a lot, it seems no solution can handle all EMF / WMF files with a batch-script: Some applications can do the raster-files but not the vector files, while some others can do the vector files but not the raster files. Some do all, but everything with low resolution leading to bad quality and unreadable texts.

Lacking solutions:

MS Office

Using MSO 2007, export .doc or .docx to filtered HTML. This also converts all WMZ files to GIF files, but not free software and resolution is 72 DPI, which means some WMZ-vector-graphics turn unreadable.

MS Paint

Copy paste to from MS image viewer to MS Paint. Not free software., this has to be done manually (no batch) and again resolution seems 72 DPI.

GIMP

Using GIMP 2.6, this works great manually: At import of a WMF / EMF file I can tell the resolution has to be 300DPI, and save as PNG. However, if I want a batch method, I'm afraid I have to write Scheme, and I'm not sure how to pass the DPI-setting to the import filter. So it can be done, but I don't know how and it's probably not very simple. I also tried David's Batch Processor-script, again no success because GIMP can't open all EMF/WMF files.

ImageMagick

This works great in a bash sh-script (bat is too fiddly for me) using convert 6.7.9.2 (portable), but only at 72DPI. Using arguments as "density" and "resample" / "resize" doesn't lead to results. I think, ImageMagick uses emfplus.exe to first convert the EMF / WMF to some other format and then does the editing. emfplus.exe doesn't have a switch for resolution, so I guess emfplus.exe uses 72 DPI (standard for IM). I'm not a programmer, but a quick glance at c++ source code of emfplus.exe didn't reveal anything related to 'import resolution' at first glance. I was not really sure about IM's license, it turns out to be Apache 2.0.

UNOConv

At what I read I found Unoconv. UNOConv is a Python-script which uses the UNO-bridge of Libre/OpenOffice. After downloading the script from GitHub and putting it at the right place, I could run it with Python and it seemed to work. However, sadly it doesn't have an option to set the resolution for the "import" filter. Which is weird, EMF's are really sharp while being displayed on the screen, but as soon as you save them as PNG, even with high resolution, the images still become 'pixelated' Maybe one can use options for the EMF-import filter. However, as Dag Wieers also notices himself: It's really hard to find the right documentation / settings; he claims it's more work to find certain options than to implement them in the UNOConv.

Inkscape

Using Inkscape 0.48 provided by my employer (yay!), I was able to convert the files ImageMagick only did in low resolution and GIMP only did manually! Inkscape can work from the command line, exactly what is needed for batch conversion. Sadly though, only three out of twentyfour EMF files were read, and they were also read incorrectly, leading to "data-loss". For the EMF-files which cannot be read, Inkscape exports "blank" PNG's. So Inkscape's CLI was everything I wished for and really well, but because of the lacking import filter, this is not a solution.

VeryPDF

You can get it here. Haven't tried it yet. If it works, and if it does batch, I still have to convert BMP to PNG. Reading the license, this seems to be freeware, not free software.

The script I tried

Let's say you have the file "legacyfile.doc". In MS Office 2007, you go to "Save As", and choose "html". Choose the normal one, not the filtered one. MS Office will make two important "linked" things: The file "legacyfile.htm" and the directory "legacyfile_files". In DOS Batch or Bash for Windows, go to this directory. If there were emz / wmz files, you will see them in this directory. This assumes Bash version 2 or higher (for parameter expansion), and working on the C:/-drive (seen by Bash as root). This also assumes Inkscape is in your path. In this *_files directory, run the script bash script:

mkdir convert
cp *.emz convert
cp *.wmz convert
cd convert
for i in *.wmz; do mv ${i} ${i/wmz/wmf.gz}; done
for i in *.emz; do mv ${i} ${i/emz/emf.gz}; done
rm *.wmz
rm *.emz
gunzip *.gz
for i in *.emf; do Inkscape.exe -f C:/$(pwd)/${i} -d 300  -e C:/$(pwd)/${i/emf/png}; done
for i in *.wmf; do Inkscape.exe -f C:/$(pwd)/${i} -d 300  -e C:/$(pwd)/${i/wmf/png}; done
mkdir new
mv *.png new
rm image*
mv new/* .
rmdir new


Now you should have a directory "convert" in the *_files directory, and in this directory you will find all the freshly created PNG-files.

Instead of the "Inkscape" - command, you could also try ImageMagick's convert:
$ convert C:/$(pwd)/${i} C:/$(pwd)/${i/emf/png}
The .png-extension will tell ImageMagick the output format you want.

Related StackOverflow questions

An unclear converted image wmf to png: Here I found the solution

Need script to batch convert vector eps to png images Converting EMF format to PNG or JPG

Need to convert emf to jpeg png file formats using Java I'm not limited to Java, and I don't have a web(server), just doing desktop-work.

Special thanks

*StackOverflow user turutosiya for coming up with the Inkscape-idea. *mageMagick Studio LLC for their great page on convertingc graphic files to other formats, and tons more, and of course for making IM. *Dag Wieers for writing / maintaining UNOConv.

Also special thanks to those who make / made Inkscape, GIMP and Bash / Bash for Windows, Python and LibreOffice.

Conclusion

The first conclusion is: Avoid EMF / WMF while you can, I really hate it that MS Word files contain them in the first place. Only PNG / JPG / BMP and SVG should do in my opinion, especially in a claimed "open" file format like OOXML (colloquially: docx).

The second conclusion: It's really hard to do convert those pesky files with a batch file. With my skills and time - or lack thereof - I found it impossible. For small non-vector E/WMF's, ImageMagick does the job pretty well. For vector-E/WMF's, GIMP performs the best, but a batch to do so would have to be written in Scheme I guess. Also, to do this reliably, one should come up with a way to distinguish between vector / raster W/EMF's.

  Nav
» Read more about: Story Type: Editorial, LXer Features, Tutorial

« Return to the newswire homepage

Subject Topic Starter Replies Views Last Post
Nice work penguinist 8 4,306 Jun 3, 2013 11:51 AM

You cannot post until you login.