Microsoft Sync Framework is a comprehensive platform for synchronizing offline and online data, and facilitates collaboration and offline access for applications, services and devices alike. It is protocol- and database-independent and includes technologies and tools that enable device roaming, sharing and the ability to take networked data offline before synchronizing it back at a later point in time.
Sync Framework can be used to build applications that synchronize data from any data store using any protocol over a network. It’s a comprehensive synchronization platform that facilitates offline and online data access for applications, services and devices. Sync Framework has an extensible provider model and can be used with both managed and unmanaged code to synchronize data between two data sources.
This article takes a look at the concepts of synchronization and how Sync Framework can be integrated into your own projects. Specifically, I’ll be discussing the fundamentals of data synchronization, the architectural components of Sync Framework and how you use sync providers.
To work with Sync Framework and the code examples in this article, you’ll need to install Visual Studio 2010 and the Sync Framework runtime 2.0 or later. You can download the runtime with the Microsoft Sync Framework 2.0 Redistributable Package from the Sync Framework Developer Center.
Sync Framework Basics
Sync Framework comprises four primary components: a runtime, metadata services, synchronization providers and participants.
The Sync Framework runtime provides the infrastructure for synchronizing data between data sources. It also provides an SDK that developers can extend to implement custom providers.
Metadata services provide the infrastructure to store sync metadata, which contains information used during a synchronization session. Sync metadata includes versions, anchors and change detection information. You’ll also use sync metadata in the design and development of custom providers.
Synchronization providers are used to synchronize data between replicas or endpoints. A replica is a unit of synchronization and is used to denote the actual data store. As an example, if you’re synchronizing data between two databases, then each of the databases is referred to as a replica. A replica is identified using a unique identifier called a replica key. An endpoint here also refers to a data store. I’ll discuss providers in more depth later in the article.
A participant refers to the location where the data to be synchronized can be retrieved. These can be full participants, partial participants and simple participants.
Full participants are devices that can create new data stores, store sync metadata information and run sync applications on the devices themselves. Examples of full participants include desktop computers, laptops and tablets. A full participant can synchronize data with another participant.
Partial participants are devices that can create new data stores and store sync metadata information, but cannot run applications on their own. A USB storage device or smartphone could be a partial participant. Note that a partial participant can synchronize data with a full participant, but not with another partial participant.
Simple participants include devices that cannot store new data or execute applications, but can only provide the requested information. Examples of simple participants include RSS feeds and Amazon and Google Web services.
A synchronization provider is a component that can participate in a synchronization process and enables a replica to sync data with other replicas. You should have one synchronization provider per replica.
To synchronize data, a synchronization session is started. The application connects the source and destination synchronization providers in the session to facilitate data synchronization between the replicas.
When a synchronization session is in progress, the destination provider provides information about its data store to the source provider. The source provider determines what changes to the source replica are not known to the destination replica, and then pushes the list of such changes to the destination provider. The destination provider then detects any conflicts between its own items and those present in the list, and then applies the changes to its data store. The Sync Framework engine facilitates all of this synchronization process.
Sync Framework provides support for three default providers for database, file system and feed synchronization:
- Synchronization provider for ADO.NET-enabled data sources
- Synchronization provider for RSS and Atom feeds
- Synchronization provider for files and folders
You can also extend Sync Framework to create your own custom sync provider to exchange information between devices and applications.
The database synchronization provider (previously called Sync Services for ADO.NET in Sync Framework 1.0) supports synchronization of ADO.NET-enabled data sources. You can build disconnected data applications that facilitate synchronization between ADO.NET-enabled data sources such as SQL Server. It enables roaming, sharing and taking data offline. Any database that makes use of the database provider can participate in the synchronization process with other data sources that are supported by Sync Framework including file systems, Web services or even custom data stores.
The Web synchronization provider (formerly Sync Services for FeedSync) supports synchronization of RSS and ATOM feeds. Before FeedSync, this technology was known as Simple Sharing Extensions and was originally designed by Ray Ozzie. Note that the Web synchronization provider doesn’t replace the existing technologies like RSS or Atom feeds. Rather, it provides you a simple way to add synchronization capabilities to existing RSS or Atom Feeds so that they can be consumed by other applications or services independent of the platform or device in use.
The file synchronization provider (formerly Sync Services for File Systems) supports synchronization of files and folders in your system. It can be used to synchronize files and folders in the same system or across systems in the network. You can synchronize files and folders in systems with NTFS, FAT or SMB file systems. The provider uses the Sync Framework metadata model to enable peer-to-peer synchronization of file data with support for arbitrary topologies (client/server, full mesh and peer-to-peer) including support for removable media. The file synchronization provider also enables incremental synchronization, conflict and change detection, synchronization in both preview and non-preview modes of operation, and filtering and skipping files in the synchronization process.
Working with Built-In Sync Providers
In this section I’ll demonstrate how to work with the built-in synchronization providers to implement a simple application that synchronizes the content of two folders in your system.
The FileSyncProvider class can be used to create a file synchronization provider. This class extends the UnManagedSyncProvider class and implements the IDisposable interface. The FileSyncScopeFilter class is used to include or exclude files and folders that will be participating in the synchronization process.
FileSyncProvider detects the changes in replica using sync metadata. Sync metadata contains information about all the files and folders that participate in the synchronization process. There are actually two kinds of sync metadata: replica metadata and item metadata. The file synchronization provider stores the metadata for all files and folders that participate in the synchronization process. Later, it uses the file size, attributes and the last accessed times of these files and folders to detect changes.
Open Visual Studio 2010 and create a new Windows Presentation Foundation (WPF) project. Save the project with the name SyncFiles. Open the MainWindow.xaml file and create a WPF form similar to what is shown in Figure 1.
Figure 1 The Sample Sync App
As you can see, you have controls to pick the source and destination folders. You also have controls to display the synchronization statistics and content of the source and the destination folders.
Right-click on the project in Solution Explorer, click Add Reference and add the Microsoft.Synchronization assemblies.
Now add a new GetReplicaID method in MainWindow.xaml.cs file to return a GUID as shown in the code in Figure 2. The Synchronize method, when called on the instance of SyncOrchestrator, creates a metadata file called filesync.metadata in each of the folders or replicas using the unique GUID. The GetReplicaID method persists this GUID in a file so that the next call to this method doesn’t generate a new GUID for that particular folder. The GetReplicaID method first checks whether the file containing a replica ID exists. If the file isn’t found, a new replica ID is created and stored in the file. If the file exists (because a replica ID for that folder was previously generated), it returns the replica ID from the file.
Figure 2 GetReplicaID
Next, add a method called GetFilesAndDirectories to return a list of the files and folders under the replica location (see Figure 3). The folder name should be passed to it as a parameter.
Figure 3 Getting Replica Files and Folders
This method would be used to display the list of files and folders inside the source and destination folders both before and after the synchronization process. The methods PopulateSourceFileList and PopulateDestinationFileList call GetFilesAndDirectories to populate the list boxes that display the files and directories inside the source and destination folders (see the code download for details).
The btnSource_Click and the btnDestination_Click event handlers are used to select the source and the destination folders. Both methods make use of the FolderBrowser class to display a dialog box from where the user can select the source or destination folders. The complete source code of the FolderBrowser class is available for download with the code download for this article.
Now I need to write the Click event handler of the Button control, which starts by disabling the button before synchronization starts. It then calls the Synchronize method with the source and destination paths as parameters. Finally, I start the synchronization process, catch any errors, and enable the button when synchronization completes:
The Synchronize method accepts the source and destination path and synchronizes content of the two replicas. In the Synchronize method, I take an instance of the SyncOperationStatistics class to retrieve statistical information on the synchronization process:
I also create the source and destination sync providers, create a SyncOrchestrator instance named synchronizationAgent, assign the GUIDs to the source and destination replicas and attach the two providers to it. The SyncOrchestrator is responsible for coordinating the synchronization session:
Finally, I start the synchronization process, catch any errors and release resources as appropriate as shown in Figure 4. The code download for this article includes the complete source project with error handling and other implementation details.
Figure 4 Synchronizing Replicas
You can also report the synchronization progress for a synchronization session. To implement this, follow these steps:
- Register an event handler for the ApplyingChange event.
- Enable preview mode by setting the PreviewMode property of FileSyncProvider to true.
- Take an integer counter and increase it each time the ApplyingChange event is triggered.
- Start the synchronization process.
- Set the PreviewMode property of FileSyncProvider to false to disable PreviewMode.
- Start the synchronization process again.
Filtering and Skipping Files
When synchronizing using Sync Framework, some files are skipped automatically, including Desktop.ini and Thumbs.db, files with system and hidden attributes, and metadata files. You can apply static filters to control the files and folders you want to be synchronized. Specifically, these filters exclude the files you don’t want to be a part of the synchronization process.
To use static filters, create an instance of the FileSyncScopeFilter class and pass the inclusion and exclusion filters as parameters to its constructor. You can also use the FileNameExcludes.Add method on your FileSyncScopeFilter instance to filter out one or more files from the synchronization session. You can then pass in this FileSyncScopeFilter instance when creating your FileSyncProvider instance. Here’s an example:
Similarly, you can exclude all .lnk files from the synchronization process:
You can even use FileSyncOptions to explicitly set options for the synchronization session:
To skip one or more files during the synchronization process, register an event handler on the ApplyingChange event and set the SkipChange property to true:
Now I can implement the OnAppliedChange event handler to show what changes occur:
Note that this example is simplified for clarity. A more robust implementation is included in the code download.
To understand why a particular file has been skipped during the synchronization session, you can implement the OnSkippedChange event handler:
Build and execute the application. Click on the Source Folder button to select the source folder. Use the Destination Folder to select the destination folder. You’ll see the list of the files in each of the folders before synchronization is displayed in the respective list boxes (see Figure 1). The Synchronization Statistics list box doesn’t display anything as synchronization is yet to be started.
Now click the Synchronize button to start the synchronization process. Once the source and destination folders have been synchronized, you’ll see the content of both folders after synchronization in the respective list boxes. The Synchronization Statistics list box now displays information about the tasks that were completed (see Figure 5).
Figure 5 Synchronization Finished
Sync Framework manages all the complexities involved in timestamp-based synchronization that include deferred conflicts, failures, interruptions and loops. To handle data conflicts when a synchronization session is in progress, Sync Framework follows one of the following strategies:
- Source Wins: In this strategy, the changes that have been made in the source data store in the event of a conflict always win.
- Destination Wins: In this strategy, the changes that have been made in the destination data store in the event of a conflict always win.
- Merge: In this strategy, the changes in the event of a conflict are merged together.
- Log conflict: This is a strategy in which the conflict is deferred or logged.
Understanding the Synchronization Flow
A SyncOrchestrator instance controls a synchronization session and the flow of data during the session. The synchronization flow is always unidirectional and you have a source provider attached to the source replica and a destination provider attached to the destination replica. The first step is to create your source and destination providers, assign unique replica IDs to them and attach the two providers to the source and destination replicas:
Next, create an instance of SyncOrchestrator and attach the two providers to it. A call to the Synchronize method on the SyncOrchestrator instance creates a link between the source and the destination providers:
From that point, a number of calls can be made by Sync Framework while a synchronization session is in progress. Let’s walk through them.
BeginSession is called on both the source and destination providers to indicate the synchronization provider is about to join a synchronization session. Note that the BeginSession method throws InvalidOperationException if the session cannot be started or the provider is not initialized properly:
Sync Framework calls GetSyncBatchParameters on the instance of the destination provider. The destination provider returns its knowledge (a compact representation of versions or changes that a particular replica is aware of) and the requested batch size. This method accepts two out parameters, namely, batchSize and knowledge:
Sync Framework invokes GetChangeBatch on the source provider. This method accepts two input parameters, the batch size and the knowledge of the destination:
The source synchronization provider now sends the summary of changed versions and knowledge to the destination provider in the form of changeDataRetriever object.
The ProcessChangeBatch method is called on the destination provider to process the changes:
SaveItemChange is called on the destination synchronization provider for each of the changes in the batch. If you’re implementing your own custom provider, you should update the destination replica with the changes sent by the source replica and then update the metadata in the metadata store with the source knowledge:
StoreKnowledgeForScope is called on the destination synchronization provider to save knowledge in the metadata store:
EndSession is called on both the source and destination providers to indicate that the synchronization provider is about to leave the synchronization session it joined earlier:
Custom Synchronization Providers
Now you’ve seen how the default synchronization providers work. As I’ve mentioned before, you can also implement custom synchronization providers. A custom synchronization provider extends the functionality of a built-in synchronization provider. You may need a custom synchronization provider if there’s no provider for the data stores to be synchronized. You can also create a custom synchronization provider that implements change units for better control over change tracking and to reduce the number of conflicts.
To design your own synchronization provider, create a class that extends the KnowledgeSyncProvider abstract class and implements the IChangeDataRetriever and INotifyingChangeApplierTarget interfaces. Note that these classes and interfaces are part of the Microsoft.Synchronization namespace.
As an example of a custom provider, say you wanted to implement a synchronization provider for synchronizing data between databases. This is just an overview of a simple example, and it could be extended to accommodate much more complicated scenarios.
Start by creating three databases in SQL Server 2008 (I named them ReplicaA, ReplicaB and ReplicaC) and create a table in each database called Student. The custom provider will sync records between these three Student tables. Next, create an entity called Student for performing CRUD operations on the Student table.
Create a class called Student with StudentID, FirstName, LastName as fields, and the necessary helper methods to execute CRUD operations in the database:
Create a class called CustomDBSyncProvider and extend it from the KnowledgeSyncProvider, IChangeDataRetriever, INotifyingChangeApplierTarget and IDisposable interfaces:
Implement the necessary methods in your custom database synchronization provider and create the UI to display the content of each of the Student tables (see the code download for this article for details).
Now, create three instances of the custom synchronization provider and attach them to each of the Student database tables. Finally, synchronize the content of one replica with another with the help of the custom synchronization provider:
As you’ve seen, Sync Framework provides a simple yet comprehensive synchronization platform that provides seamless synchronization between offline and online data. It can be used to synchronize data independent of the protocol and the data store in use. It could be used for simple file backup or easily extended for collaboration-based networks. You can also create custom synchronization providers to support data sources that aren’t accommodated out of the box.
Joydip Kanjilal is an independent software consultant as well as a Microsoft MVP in ASP.NET since 2007. He’s also a speaker and author of several books and articles and blogs at aspadvice.com/blogs/joydip.
Thanks to the following technical expert for reviewing this article: Liam Cavanagh