Hardcore Linux

Anything about Ubuntu, Centos, openSuSe and Fedora

Document Converter for Multiple Files

I’ve found a python script that uses OpenOffice.org’s UNO library. The script can convert from Open Document formats (.odt, .ods, .odp) to Microsoft Office Formats (.doc, .xls, .ppt) or to comma separated value (.csv), a PDF or even to a text file (.txt).

It works great, but still I need some enhancement to make it more usable in my current situation. I need to convert  multiple files located in different folders inside a common path. So I created a bash script that can convert them faster and more efficient than doing it manually.

The script requires a running OpenOffice.org as service on port 8100 and python 2.4 or 2.6.  The script automatically create a headless OpenOffice.org service and terminate it afterwards.

You can get the python script here: DocumentConvert.py.

Here’s my script:


#!/bin/bash
# Document Converter Main
# To convert various document to different types (.doc, .xls, .odt, .ods, .ppt, .odp, .pdf)


#START OPENOFFICE HEADLESS MODE
DOCUMENTCONVERT=$HOME/.scripts/documentconverter/DocumentConverter.py
/usr/bin/soffice -headless -nologo -nofirststartwizard -accept="socket,host=127.0.0.1,port=8100;urp" & > /dev/null
sleep 2
GET_PID_SOFFICE=$(ps aux | grep "soffice.bin" | grep -v "grep" | grep "8100" | awk '{print $2}')

TEMP_FILE=/tmp/doc_source.txt
TARGET_DIR=$(zenity --file-selection --directory --title="Select Target Directory")

DOCTYPE=$(zenity --entry  --text="$RETURNVAL Enter Source File Extension" --title="Souce Document Type" --width=250)
SOURCE_DOCTYPE=$(echo "$DOCTYPE" | sed 's/[^a-zA-Z0-9]//g')

DOCTYPE=$(zenity --entry  --text="$RETURNVAL Enter Target File Extension" --title="Target Document Type" --width=250)
TARGET_DOCTYPE=$(echo "$DOCTYPE" | sed 's/[^a-zA-Z0-9]//g')

find "$TARGET_DIR" -iname "*.$SOURCE_DOCTYPE" > $TEMP_FILE

cat $TEMP_FILE | \
while read FILE; do
 FILENAME=$(basename "$FILE")
 GETFILENAME=${FILENAME%.*}
 PATHNAME="$( readlink -f "$( dirname "$FILE" )" )"
 python $DOCUMENTCONVERT "$PATHNAME/$FILENAME"  "$PATHNAME/$GETFILENAME.$TARGET_DOCTYPE"
done


kill -9 $GET_PID_SOFFICE
exit 0

 

 

About these ads

One response to “Document Converter for Multiple Files

  1. AC July 14, 2011 at 4:54 pm

    Do you have a perl script version instead of ipython?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 341 other followers

%d bloggers like this: