Processing Outlook MSG Files - tsgrp/HPI GitHub Wiki
HPI provides some out of the box components to allow the repository to process MSG files when they are added to the system. The general goals and requirements are to:
- Parse the MSG file for any attachments and allow the user to specify an object type and individual properties for each attachment.
- Store any attachments in the repository and relate them back to the original email. The email is the parent, any attachments are children (if a nested MSG file has an attachment, it is related to the top level email, not the nested MSG file).
- Generate a PDF rendition for the uploaded MSG file. The PDF is generated by OpenContent if configured in the bulk import action config. Otherwise, the repository is responsible for adding PDF renditions to the uploaded MSG file.
- Documentum - in order for the repository to rendition MSG to PDF, ADTS must be installed.
- Alfresco - renditioning MSG to PDF is available OOTB if your object type has the
tsg:renditioned
aspect applied to it. To take advantage of Alfresco's renditioning engine, ensure you have the below settings. 1. Bulk Upload is configured to let the repository generate the PDF rendition for MSG files. 1. Inmodule-context.xml
, the bean in the code snippet below exists. 1. Inalfresco-defaults.properties
, the below property exists:content.transformer.complex.OcMsg2PdfTransformer.pipeline=OutlookMsg|txt|*|pdf|Pdf2swf
module-context.xml bean
<bean id="transformer.complex.OcMsg2PdfTransformer" class="org.alfresco.repo.content.transform.ComplexContentTransformer" parent="baseContentTransformer">
<property name="transformers">
<list>
<ref bean="transformer.OutlookMsg"/>
<ref bean="transformer.PdfBox.TextToPdf"/>
<ref bean="transformer.Pdf2swf"/>
</list>
</property>
<property name="intermediateMimetypes">
<list>
<value>text/plain</value>
<value>application/pdf</value>
</list>
</property>
</bean>
When an MSG file is uploaded through Bulk Upload, the MSG file is parsed for any attachments. The attachments are displayed to the user as documents to upload; this means the user can set individual properties for each attachment, including a specific document object type. When the user is done editing properties and chooses to upload the files the attachments are created as individual repository objects with the properties specified by the user. All attachments are placed in the same folder as the email and each attachment is related to the email. A folder tag may additionally be added to each attachment (or the original email) to utilize the folder tags related objects functionality (assuming the content / object model has a folder tag property specified).
The default renditioning behavior is to allow the repository to add a PDF rendition to the uploaded MSG file(s). It's also possible to generate a PDF rendition using iText (not recommended). This is configured in the Bulk Import action config in the HPI admin.
NOTE: A current limitation of the library used to parse the MSG files is that nested MSG attachments are not available as a byte array, which means the native content is not available. However, a PDF rendition is generated by OC for any nested MSG attachment.