Added by Arjé Cahn, last edited by Johan Stuyts on Mar 27, 2008  (view change)

Labels:

fileimport fileimport Delete
wordimport wordimport Delete
word word Delete
office office Delete
import import Delete
openoffice openoffice Delete
fis fis Delete
Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.

How it works

The Hippo CMS will submit the file to the File Import Serive (FIS). The FIS is a cocoon instance that has a connection to OpenOffice. The FIS will send the document to OpenOffice which is running in deamon mode. OpenOffice will transform the office document to xml files. The FIS gets the xml files back from OpenOffice and makes them available for the Hippo CMS. The Hippo CMS will read the xml files and store them in the Hippo Repository.

OpenOffice.org JARs

To be able to control OpenOffice.org from Java JAR files for UNO and the OpenOffice.org components are needed. These JAR files are distributed with the OpenOffice.org SDK. It can be found in the directory 'stable/<version number>' on an extended OpenOffice.org mirror site.

Please note that not all OpenOffice.org versions have a corresponding SDK, so please check other version numbers besides the version number of OpenOffice.org that you have installed.

The JAR files in the SDK have to be uploaded to a Maven 1 repository that you have access to. Hippo has uploaded (some versions of) the JAR files to http://repository.hippocms.org/maven/openoffice/jars/.

Build instructions

  1. Check out the sources from SVN http://svn.hippocms.org/repos/hippo/file-import-service/trunk.
  2. Create a new or change an existing instance in 'src/instances' or a location external to the project. Please note that the value 'localhost' for property 'maven.ooocomponentloader.uno.hostname' does not work for Linux. Use '127.0.0.1' instead.
  3. Open a command line interface in the project directory.
  4. Execute 'BuildInstance <path to instance directory>'.
  5. Rename directory 'hippo-cocoon-*' to 'file-import-service-<instance name>-<version>'.
  6. Zip directory 'file-import-service-<instance name>-<version>' in 'target' to 'file-import-service-<instance name>-<version>.zip'.

Installation on Linux

Java Image I/O Tools

Install Image I/O Tools from Sun. Download the libraries from sun.

Xvnc

OpenOffice needs a X server to run on linux. Xvnc can be used as a small dummy X server, so there's no need to install a full X environment. Xvnc is part of all the main linux distributions. Use the packaging system of the linux distribution to install Xvnc or download the source from: TightVNC or RealVNC.

OpenOffice

The FIS needs at least version 2.0.2. Donwload OpenOffice from the http://www.openoffice.org and install it on the server/
You need the base, core & writer packages.
See the OpenOffice.org wiki on how to start OpenOffice in accept mode so that it can be driven remotely.

File Import Service

Unpack the zip created during building on the server. It is advisable to install the FIS as the same user as the CMS.

Start-stop Scripts

Check out the scripts from SVN http://svn.hippocms.org/repos/hippo/file-import-service/trunk/scripts. Install the script on the server as the same user as the FIS is installed under. Adjust the paths and ports in fileimport.conf according to the server settings.

Installation on Windows

Java Image I/O Tools

Install Image I/O Tools from Sun. Download the libraries from sun.

OpenOffice

The FIS needs at least version 2.0.2. Donwload OpenOffice from the http://www.openoffice.org and install it on the server/
See the OpenOffice.org wiki on how to start OpenOffice in accept mode so that it can be driven remotely.

Only do this if you truly know what you are doing

Running Openoffice.org as a servive:

Install the Windows Server 2003 Resouce Kit Tools
Create the service as defined here: http://support.microsoft.com/kb/137890
Add another 'String Value' to the Parameters key, call it AppParameters, and give it the following value:

-accept=socket,host=127.0.0.1,port=8100;urp; -invisible  -server -headless -nologo -nofirststartwizard

You should now be able to start the service from the Administrative Tools/services.

File Import Service

Unpack the zip created during building on the server.

Suppressing the firststartwizard when running OpenOffice.org as a service

OpenOffice will ask the first time on startup to click "Accept" for the license, with later versions(2.1+) this can usually be prevented by supplying -nofirststartwizard as a parameter.

In older versions or in cases where this doesn't work, there are three ways to solve this:
1]. start OpenOffice one time (as the user that will run the FIS) and click on the license agreement
2]. unpack OO_user.tgz in the home of the user that will run the FIS.
3]. edit share\registry\data\org\openoffice\Setup.xcu and change

<prop oor:name="ooSetupInstCompleted">
  <value>false</value>
</prop>
<prop oor:name="ooSetupShowIntro">
  <value>true</value>
</prop>

with: (make sure the date/time below is later than the installation)

<prop oor:name="ooSetupInstCompleted" oor:type="xs:boolean">
 <value>true</value>
</prop>
<prop oor:name="LicenseAcceptDate" oor:type="xs:string">
 <value>2007-11-10T11:04:16</value>
</prop>
<prop oor:name="FirstStartWizardCompleted" oor:type="xs:boolean">
 <value>true</value>
</prop>

Configure the CMS

Add the build property

cms.fileimportservice.url=http://localhost:60001

and rebuild the cms (adjust the url to match the configuration).

Using the service from other applications

API

To make use of the File Import Service the following steps must be followed:

  1. execute a HTTP PUT method to a URL containing a unique 32-character hexadecimal identifier (you can reuse the identifier if you process files sequentially and delete an imported file as described in step 5 before processing the next):
    http://localhost:60001/0123456789ABCDEF0123456789ABCDEF

    The upload can take a number of seconds because importing is an expensive operation.

    The URL you use to upload the file to import will become the root directory of the archive of the OpenDocument file the imported file is converted to.
  2. You can determine the type of file that was imported by reading the file document-type.ascii.txt from the root directory:
    http://localhost:60001/0123456789ABCDEF0123456789ABCDEF/document-type.ascii.txt

    As indicated by the filename of the file this file is encoded using the ASCII encoding.

    The file will contain one of these values:
    • word processing document
    • spreadsheet document
    • presentation document
    • drawing document
  3. To determine which files are present you can request a directory listing by executing a GET method on the URL (without terminating slash) of a directory:
    http://localhost:60001/0123456789ABCDEF0123456789ABCDEF
    http://localhost:60001/0123456789ABCDEF0123456789ABCDEF/Configurations2/images
    

    You will get an XML file back containing the directory listing. The XML file is generated by the Cocoon directory generator. A description of the structure of the XML file can be found in the documentation of the directory generator.
  4. Read the XML, graphics, etc. files that you need using GET methods.
  5. Delete the imported file by executing a DELETE method on the URL used to upload the file.

Additional options

When you upload a document you can specify three additional parameters by suffixing them to the URL using standard URL parameter syntax:

  • rescaleBitmaps of type boolean. This parameter defaults to false.

    Normally bitmaps are extracted as is. This could lead to bitmaps that are smaller or larger than they appear in the document. By setting this parameter to true, the bitmaps will be rescaled to the size specified in the document using the dots per inch (DPI) specified by dpiForBitmaps.
  • convertWmfsToBitmaps of type boolean. This parameter default to false.

    Some files contain images in the Windows Meta Files format. These files cannot be used as is for web pages. By setting this parameter to true the Windows Meta Files will be converted to bitmaps. The DPI specified by dpiForBitmaps is used to determine the size of the bitmap to produce.
  • dpiForBitmaps of type integer. This parameter defaults to 96.

    The DPI to use for rescaling bitmaps and converting Windows Meta Files to bitmaps. This parameter is only applicable if one of the options above is enabled.

Note that no error will be returned if invalid values are passed for the parameters. Instead the default values will be used.