Configuring PowerPivot Workbook Data Refresh from SharePoint

This article is not about page refresh with slicers etc.  but rather Workbook DATA refresh

MSDN Articles

How to: Configure Stored Credentials for PowerPivot Data Refresh

How to: Enable and Configure Data Refresh (PowerPivot for SharePoint)

How to: Schedule PowerPivot Data Refresh

How to: Configure the PowerPivot Unattended Data Refresh Account

(The info below Posted by D.W. 20-Nov-2009 at http://blogs.msdn.com/powerpivot/archive/2009/11/20/powerpivot-data-refresh.aspx)

In summary, use the data refresh facility to keep your data up-to-date. Just setup your schedule and the system runs in the background automatically. Your data just arrives when asked for. The facility is designed as a simple, straightforward way to have end users schedule their own data refresh requests. The system runs the schedule automatically (typically ‘after business hours’) and then posts the updated workbooks back into SharePoint in the same document library where the workbooks came from.

Finally, we wouldn’t be honest if we said that the data refresh facility was the right solution for everyone. Clearly it isn’t. It is oriented to you, the end user; not to your IT colleagues. As a scheduler, it lacks many of characteristics that IT systems typically have. For example, there is no real-time or on-demand access to data; or the ability to issue frequent updates, such as hourly or every 5-10 minutes. These capabilities are important in their own way, and some systems demand them, but in the managed self-service world they come with a management overhead and complexity that is just not appropriate for our users.

What is data refresh? And why should I care? Data refresh is all about keeping your data current. It ensures that your users are making decisions based on the most current and up-to-date information possible. This is done with as little investment of your time as possible. We want to make it easy and straightforward to have your data be as current as it can be. Set it up once and it runs until you tell it to stop. Sure you could copy down the whole workbook and update the data on your local machine and re-publish, but there should be a way for you to automatically request that the data be updated for you. After all, all of the data is on the server. Why not? What about the fact that some of the corporate databases are updated weekly, some are monthly – do you have to manage that process yourself also? Why couldn’t the system update the data and then it is just there when you came into work the next day?

There are only 2 ways to update the embedded data contained in a PowerPivot workbook:

  1. Bring the workbook down to the client machine, use the PowerPivot Excel add-in and click on “refresh”, then republish the workbook back to SharePoint (which implies certain SharePoint rights to enable)
  2. Use the build-in PowerPivot data refresh facility (which also implies certain SharePoint rights)

Before we get too deep into it, let’s talk a bit about what the data refresh facility isn’t. The word ‘refresh’ and ‘data’ is used a lot in SharePoint; all over Excel and all over most IT systems. In our case, we are talking about a specific kind of data refresh. We are talking about the data that is stored inside the PowerPivot workbook. When you have the workbook open, it is the data stored in the PowerPivot Client Window. It is the thousands and thousands (possibly even millions) rows of data that you have imported into the workbook.

It is that data. . . To update that data on the server, the following two restrictions have to hold:

  1. The data has to be available to the server. This seems obvious, but sometimes its not. Files located on your local hard drive, or within the workbook itself (such as the local table) cannot be refreshed on the server. The data has to be located somewhere the server can get to it.
  2. You have to ask that the data be refreshed. You have to setup a refresh schedule. You have to indicate what data sources to be included. The system does not do the data refresh by itself – well, it does – but only if you request it.

 

Successful Configuration depends on how the user has defined a data source: using a trusted connection or a SQL login connection. So let’s take them one by one.

If using trusted Windows authentication against the data source, let’s look at this sequence of events:

  1. User A creates a PowerPivot document that access a relational DB using user A’s interactive credentials.
  2. User A publishes to SharePoint.
  3. As User A has contributor rights against the document, she can set up a data refresh schedule. As part of the schedule, she specifies what Windows credentials to use for the data refresh. The credentials are stored and encrypted in SharePoint Secure Store. This is specified once for the entire job (as this is the Windows environment that the job runs in).
  4. When data refresh runs at 2am, the data is updated using the schedule’s credentials. The Windows credentials must have contributor rights to the file because we read and write the file using the job’s Windows credentials SharePoint rights.
  5. If User B is allowed to copy down the document (which he can do if he has READ rights, but he cannot do if he has VIEW ONLY rights) and he refreshes the connection using the PowerPivot Excel add-in the DB accessed with User B’s interactive credentials.

If using SQL logins against the data source, the sequence is slightly different:

  1. User A creates a PowerPivot document that access a relational DB using the data source credentials specified in the connection – it would likely NOT be User A’s Windows uid/pwd.
  2. User A publishes to SharePoint.
  3. As User A has contributor rights against the document, she can set up a data refresh schedule. As part of the schedule, she can specify the non-Windows data source credentials to use for the data refresh. The data source credentials are stored encrypted in SharePoint Secure Store. This must be specified on a data source by data source basis. The Windows credentials specified for the schedule must have contributor rights to the file because we read and write the file using the job’s Windows cred’s SharePoint rights.
  4. When data refresh runs, the data is updated using the schedule credentials by modify the connect string to include uid/pwd.
  5. If User B is allowed to copy down the document and he refreshes the contents using the PowerPivot Excel add-in, then he may or he may not be prompted for the password. The default setting is to re-prompt for the password, but if User A wanted to, she could have embedded the non-Windows auth password in the connection and User B would re-use it also.

 

Technical Details FYI

Just before Excel Services calls PowerPivot (at the OLEDB provider level), it attempts to translate its claims token to a Windows token for establishing a “Windows environment” for PowerPivot. With CTP3, this translation requires access to the domain controller for where the account lies – and obviously that is not available as you are disconnected from the network. Ultimately the system should use cached credentials if they are available, but that does not happen in CTP3. To solve the problem, we can either: (1) install a local domain controller on the machine – this is surprising easy but a bit of an overkill for this situation. An alternate approach is to setup Excel Services (and some of the PowerPivot components) to use the unattended execution account (aka, the “NONE” authorization setting for the connection in Excel). In the rest of this posting, I’ll talk about how to do that.

 

 

How to setup a data refresh schedule

There are two ways of setting up a data refresh schedule. First, you can use the standard document library option menu: the item is called “Manage data refresh”. The other option is to use the Gallery Silverlight control and click on the “Manage data refresh” icon.

To setup the schedule, click on “Enable” and provide the following information:

  • When to run the schedule? Typically you would click on daily or weekly, but there are lots of options here. Many of the options are only displayed when the relevant time period is selected. For example if you click on “Monthly” then the following options appear.

    When should the refresh schedule run during the day? Normally you should enter “During after business hours” – which the administrator establishes as properties of the PowerPivot service application. If you don’t want to run the schedule then, you can enter in the starting time for when the job will run. This has two interesting side-effects: (1) the time specified is the ‘starting’ time – meaning that given system usage and available resources, the actual job may run later, this is just the earliest that it could run; and (2) that after a successful run, the schedule is automatically disabled. To be totally honest, the second side-effect is done on purpose to discourage users from not using ‘after business hours’. We think that that the ‘after business hours’ approach makes the most effective use of resources.

  • To whom should failure email notifications be sent to? You can enter several email addresses/users if you wish. To be notified when workbooks are refreshed successfully, use the normal SharePoint alerting facility to send emails out when new content on a document library arrives. This is normal SharePoint. This option is for failures.

Now let’s take a look at the Windows credentials to run the schedule under:

image

  • Who should the user be? There is one Windows user regardless of how many schedules are specified in this job. In this field you specify “Who” the user is that will run at 2am in the morning. It must be a Windows user. You have several options here, the end result of each one is an NT account:
    • “Connect using the credentials already stored inside the workbook” – in this case, the system will use the unattended execution account that is specified for the PowerPivot for SharePoint service application. Typically this is a low permissions account that has no trusted access to the various data sources. In this case, the data refresh facility will use the non-Windows authentication (i.e. sql logins) that have been specified on the connection string in the data sources.
    • “Connect using the following Windows user credentials” – The user can enter in a Windows username and password. We will store these credentials in the SharePoint Secure Store facility for the user. At 2am the data refresh facility will pull the credentials out and do a Windows logon using them – and the data sources can then make trusted connections.
    • “Connect using the credentials saved in Secure Store Service” – if the user has the ability to enter in their own Secure Credentials the data refresh facility will use them at 2am for the logon. If using this approach, the application ID must be a Secure Store group that includes the service account used by the PowerPivot service application (so we can read the credentials).

NOTE: These options are important because they establish the Windows environment for the job. If there are any trusted connections used in the workbook’s data sources, then this is the Windows user that will be logged on for the job; the credentials in the data source are used for non-Windows authentication..

Finally let’s take a look at the data sources contained in the workbook.

   image

Lastly you will see a list of data sources. You can select all data sources, or data source by data source within this job. For each data source you specify if you want it included in the data refresh schedule or if you would like to schedule it separately. For a data source you can specify:

  1. If you want this data source to be included in the default schedule, or you can specify a custom schedule. All of the same options for the workbook can be ran on each data source, e g. the third Tuesday of the month.
  2. For each data source you can specify if you want the data source included in the default schedule with the workbook or separately. If separately you can control which tables are updated and when. The system knows what tables have been imported from what data sources. When you select a specific data source, then only those tables that are built upon it are updated using this schedule. If you have two tables that you would like refreshed at different periods but they use the same data source, then you have to enter the same data source twice so you can schedule the two tables independently.
  3. The username and passwords that are specified for the data source can either be (1) from the embedded workbook (i.e. you entered the credentials in the client when the workbook was created), (2) custom username and password that you enter in by-hand (and we will store for you in Secure Store), or (3) they come from a Secure Store Application ID that you have created. In any case, these credentials are for non-Windows authentication. They are not used to establish a trusted connection – that is what the other Windows credentials are for up above in the dialog box.
  4. If you do not want the data source to be updated, such as if the data source is contained on your C:\ drive, then uncheck the data source and it will be ignored when the schedule runs.

Once you have established a data refresh schedule then you will find that the “Manage data refresh” option first points to the history page of the workbook. The history page shows you when the data refresh schedule(s) were run and what the outcome of the job was (i.e. success or failure).

image

Advertisements