Merging Word Documents on the Server Side with SharePoint 2010

Business application developers must often create solutions that automate day-to-day activities for their organizations. These activities typically involve processing and manipulating data in various documents—for example, extracting and consolidating data from multiple source documents, merging data into e-mail messages, searching and replacing content in documents, recalculating data in workbooks, extracting images from presentations … and the list goes on and on.

Microsoft Office makes these kinds of repetitive tasks simpler by providing a rich API that developers can use to automate them. Because such solutions work seamlessly for normal desktop users, developers have taken them to the next level: deploying the solutions to servers that provide a central point where all of this repetitive work can be addressed for multiple users without any human intervention.

Although moving solutions that complete repetitive Office tasks from the desktop to a server seems straightforward, it’s not quite as simple as it sounds.

Microsoft designed the Office application suite for desktop computer scenarios where a user is logged on to a machine and is sitting in front of it. For reasons of security, performance and reliability, Office applications are not the right tools for server-side scenarios. Office applications in a server environment may require manual intervention, and that’s not optimal for a server-side solution. Microsoft recommends avoiding this kind of solution, as explained in the Microsoft Support article, “Considerations for server-side Automation of Office”.

Since the release of Office 2007, however, the Office automation story has changed a great deal. With Office 2007 Microsoft introduced Office OpenXML and Excel Services for developers who would like to develop Office-based solutions on the server.

With Office 2010 and SharePoint 2010, Microsoft has come up with a new set of components called Application Services. These put a rich set of tools in a developer’s bag for Office automation solutions. Application Services include Excel Services, Word Automation Services, InfoPath Forms Services, PerformancePoint Services and Visio Services. You can learn more about the details of these services at msdn.microsoft.com/library/ee559367(v=office.14).

In this article, we will show you how to use Office OpenXML, Word Automation Services and SharePoint to build a simple application that merges separate status reports into a single document.

Status Report Workflow

Let’s say you’re a developer working at a services-oriented company in which many projects are managed by different teams. Every week, each project manager uses a common template to create a weekly status report and upload it to an internal SharePoint repository. Now your Group Manger wants to get a consolidated report that will contain all of these of weekly status reports and, guess what, you are the chosen one who has to implement this requirement.

You’re lucky, though. As we mentioned earlier, your life is easier today because you can implement this requirement with much less effort using OpenXML and Word Automation Services. You’ll be able to produce a more robust and stable solution than you could have without these technologies.

Let’s start by visualizing the solution. Figure 1 shows a proposed workflow. The process kicks off with individual project managers filling out status reports and uploading them to SharePoint on the server. The Group Manager can then initiate the process of merging any reports stored on the server and generating a combined report.

Figure 1 Workflow for Generating a Status Report

Figure 1 Workflow for Generating a Status Report

Building a Template

To implement this solution, the first step is to provide a common template to all the project managers for filling out the weekly status reports. When they finish filling in the data, they’ll upload the reports to a SharePoint repository. On Monday morning, the Group Manager can then log into the SharePoint site and fire up the logic that performs the following tasks:

  1. Reads all of the individual status report documents.
  2. Merges them into a single report.
  3. Saves the report in the repository for users to access.

Figure 2 shows what the status report template will look like (let’s call it WeeklyStatusReport.dotx). As you can see, the template includes fields to capture a title, dates, the project manager’s name, milestones and associated data, and text fields for entering details about accomplishments, future plans and problems. In this case we’ve used text fields and the date picker control for simplicity, but you could easily use drop-down lists, check boxes or a variety of other controls to streamline data entry.

Figure 2 Weekly Status Report Template

Figure 2 Weekly Status Report Template

The Document Library

The next step is to create a custom document library that hosts the weekly status reports based on this template.

In the SharePoint navigation pane, click Libraries and then Create to create a new library. In the Create dialog, filter by Library, select Document Library and type a name for the library (we used WSR Library). Now click Create.

Now you need to create a content type for the new library. Click Site Actions, then Site Settings, and under the Galleries section, click Site content types. Click Create and then type a name for the content type (we used Weekly Status Report).

In the Select Parent Content Type From list, select Document Content Types. In the Parent Content type list, select Document and click OK.

Under Settings, select Advanced Settings, then choose the “Upload a new document template” radio button and click Browse. Find the report template (WeeklyStatusReport.dotx) and upload it to the library.

Next, go to WSR Library and select Library Settings. Under General Settings, select Advanced Settings. Select Yes for “Allow management of content types,” then click OK.

You’ll see a list of Content types shown on the library setting page. Select the “Add from Existing Site Content Types” link. Select the content type you created earlier in the available site content types list. In my example, this is Weekly Status Report. Click Add, and click OK.

Again from the content types list, click on Document and select “Delete this content type.” Select OK in the warning message box.

Now you should see your content type when you select New Document in your WSR Library, as shown in Figure 3.

Figure 3 Selecting the Custom Content Type
Figure 3 Selecting the Custom Content Type

At this point you can go ahead and add a couple of status reports to the document library.

Creating the Web Part

Next, you need to enable a Group Manager to kick off the consolidation logic. You can do this via a button at the bottom of the default view of the document library.

There are two steps involved here. First, you’ll create a Visual Web Part using Visual Studio 2010. Second, you’ll add the Web Part to the document library using SharePoint Designer 2010.

To create a custom Web Part, start a new project in Visual Studio 2010 using the Visual Web Part project template. Give the project a name such as DocumentMerge, then click OK.

In the SharePoint Customization Wizard page, select your Web application (the URL to the SharePoint site hosting your document library), then click Finish.

Once the project is created, open the VisualWebPart1.cs file and modify the CreateChildControls method with the following code:

protected override void CreateChildControls() {
  Control control = Page.LoadControl(_ascxPath);
  Controls.Add(control);
  base.CreateChildControls();
  Button btnSubmit = new Button();
  btnSubmit.Text = "Merge Reports";
  btnSubmit.Click += new EventHandler(OnSubmitClick);
  Controls.Add(btnSubmit);
}

Also add an event handler for the button click:

void OnSubmitClick(object sender, EventArgs e) {
  // TODO : Put code to merge documents here
}

At this point you can build and deploy your project. We will add the implementation to our OnSubmitClick handler a bit later in this article.

The next step is to add the Web Part to the document library. In SharePoint Designer 2010, open the SharePoint site. Click All Files | WSR Library | Forms, then click on AllItems.aspx to edit it.

Click the bottom of the page. Click Insert | Web Part, and then select More Web Parts. In the search box, type VisualWebPart (the name of the Web Part you just created and deployed), and click OK (see Figure 4). Figure 5 shows the page with the Web Part in place. Save the page and close SharePoint Designer.

Figure 4 Inserting the Web Part

Figure 4 Inserting the Web Part

Figure 5 The Web Part in Place on the Page

Figure 5 The Web Part in Place on the Page

Merging the Reports

Now, let’s add the logic to merge the uploaded documents in the document library. For simplicity, this code will merge all the documents uploaded to this folder into a single file. A more realistic approach would be to merge only selected items or only items uploaded in a specified time period. You could also save the merged document to a different location or different library. This is when we’ll add the implementation to our OnSubmitClick handler of our VisualWebPart project in Visual Studio 2010.

In the OnSubmitClick handler of the Web Part, you need to provide logic for reading the reports that were uploaded to the document library, generating an empty OpenXML document, and merging the reports into the new document.

First, you need to read any documents in the current library. You can loop through the SPListItemCollection of the current SPContext, reading each file into a byte array using the SPFile.OpenBinary API:

 

SPListItemCollection files = SPContext.Current.List.Items;
  foreach (SPListItem item in files) {
    SPFile inputFile = item.File;
    byte[] byteArray = 
      inputFile.OpenBinary();

    // process each byte array 
  }

Next, generate the empty OpenXML document. This requires generating the document in memory using a MemoryStream because the OpenXML SDK does not let you save documents to a URI. Instead, the MemoryStream object can dump the document into the library as a new file. The code for creating the file is shown in Figure 6.

Figure 6 Creating a New File for the Merged Report

// String containing the blank document part for our new DOCX
string strEmptyMainPart = 
  "<?xml version='1.0' encoding='UTF-8' standalone='yes'?>" +
  "<w:document xmlns:w='http://schemas.openxmlformats.org/wordprocessingml/2006/main'>" +
  "<w:body><w:p><w:r><w:t></w:t></w:r></w:p></w:body></w:document>";

// In-memory stream for our consolidated DOCX.
MemoryStream memOut = new MemoryStream();

// Output document's OpenXML object
WordprocessingDocument outputDoc = 
  WordprocessingDocument.Create(memOut, 
  DocumentFormat.OpenXml.WordprocessingDocumentType.Document);

MainDocumentPart mainPart = outputDoc.AddMainDocumentPart();

Stream partStream = mainPart.GetStream();
UTF8Encoding encoder = new UTF8Encoding();

// Add blank main part string to the newly created document
Byte[] buffer = encoder.GetBytes(strEmptyMainPart);
partStream.Write(buffer, 0, buffer.Length);

// Save the document in memory
mainPart.Document.Save();

Note that you need to add DocumentFormat.OpenXml.dll and WindowsBase.dll in the references and the corresponding using statements to the code:

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

The next step is to implement the logic for saving the merged document to the library as a new document. This requires a bit of effort, but you can make it easier by using the SharePoint Managed Client Object Model. You’ll need to add two references to the project, Microsoft.SharePoint.Client.dll and Microsoft.SharePoint.Client.Runtime.dll, which are found in the following folder:

%ProgramFiles%\Common Files\Microsoft Shared\web server extensions\14\ISAPI

Create a new document in the SharePoint library with this code:

ClientContext clientContext = 
  new ClientContext(SPContext.Current.Site.Url);
ClientOM.File.SaveBinaryDirect(clientContext, 
  outputPath, memOut, true);

For these instructions to work, you’ll need the following using statements in the source file:

using Microsoft.SharePoint.Client;
using ClientOM = Microsoft.SharePoint.Client;

Making the Document Searchable

At this point you have the logic in place to generate fully functional consolidated documents on the server when a user clicks the Merge Reports button.

However, there’s one small catch: the generated document is not compatible with the SharePoint crawling mechanism because it contains OpenXML altChunk markup. This is a by-product of merging the reports into the blank document using the code we showed you earlier. The altChunks get replaced with original content when a document is opened in Word.

With the new Word Automation Services in SharePoint 2010, this task can be performed programmatically using ConversionJob class. This class is part of the Microsoft.Office.Word.Server.dll assembly, so add the reference to this assembly to the project manually. Once you’ve added this reference, you can use the code in Figure 7 to perform conversion of the altChunks.

Figure 7 Converting altChunks in the Merged Document

string docPath = string.Format(@"{0}{1}", 
  SPContext.Current.Site.Url.Replace(@"\\", ""), 
  outputPath);
            
ConversionJobSettings JobSettings = 
  new ConversionJobSettings();
JobSettings.OutputFormat = SaveFormat.Document;
JobSettings.OutputSaveBehavior = 
  SaveBehavior.AlwaysOverwrite;

ConversionJob ConvJob = new ConversionJob(
  "Word Automation Services", JobSettings);
ConvJob.UserToken = SPContext.Current.Site.UserToken;
ConvJob.AddFile(docPath, docPath);
ConvJob.Start();

See the code download for this article for additional details of the solution, which you can use as the basis of your own reporting system.

Final Steps

In order to test this code, we modified our SharePoint server’s configuration to run the Automation Service after one minute of getting a run request. By default, this interval is set to five minutes, and we didn’t want to wait that long for our conversion to happen.

If you’d like to change this setting, you can set it in SharePoint Central Administration under Application Management | Manage Service Applications | Word Automation Services, and set the Frequency to start conversions under Conversion Throughput to one minute.

The final generated report contains all the weekly status reports you created, merged into a single new document with each of the individual reports stacked one after the other.

And that’s it. In a future article we’ll take the concept of server-side merging of document contents to the next level. We’ll show you how to implement a mail-merge type of scenario on the server side, again using Office 2010, SharePoint 2010 and Visual Studio 2010. Until then, happy coding.

For more information on Office 2010 and SharePoint 2010, see the Office and SharePoint developer centers. Information about Office OpenXML can be found at msdn.microsoft.com/library/bb448854, and you can read about Word Automation Services at msdn.microsoft.com/library/ee558278(v=office.14).

Manvir Singh and Ankush Bhatia are part of the Visual Studio Developer Support Team in Microsoft Product Support Services (PSS), helping customers on programming issues involving Office client applications. You can reach Singh at manvir.singh@microsoft.com or manvirsingh.net. You can reach Bhatia at ankush.bhatia@microsoft.com or abhatia.wordpress.com.

Thanks to the following technical expert for reviewing this article: Eric White