Monday, August 9, 2010

Best Practices: Common Coding Issues When Using the SharePoint Object Model Overview

Best Practices: Common Coding Issues When Using the SharePoint Object Model
Overview
Developers who write custom code with the SharePoint object model will encounter common issues related to performance, extensibility, and scalability. This article is a resource for developers who are working to troubleshoot and improve the performance of existing SharePoint applications or who are writing new SharePoint applications. In both cases, it is important to understand how to make the SharePoint object model work efficiently, and how to apply general programming techniques (such as caching and threading) to the SharePoint platform specifically. This information can make it easier to design your applications correctly, find and fix problem areas in your code, and avoid known pitfalls of using the SharePoint object model.
The following areas reflect the most common general concerns encountered by SharePoint developers:
  • Using SharePoint data and objects efficiently
  • Performance concerns related to folders, lists, and SPQuery objects
  • Writing applications that scale to large numbers of users
  • Using Web controls and timer jobs
  • Disposing of SharePoint objects
This article addresses all but the last of these issues. For guidance on disposing of SharePoint objects, see Best Practices: Using Disposable Windows SharePoint Services Objects.
Additionally, we recommend that you read the following orientation topics as good starting points for using the SharePoint object model:
Using SharePoint Data and Objects Efficiently
Caching is one good way to improve system performance. However, you must weigh the benefits of caching against the need for thread safety. Additionally, you should not create certain SharePoint objects within event receivers because this will cause performance problems related to excessive database calls.
This section describes these common areas of concern so that you can learn ways to optimize your use of SharePoint objects and data.
Caching Data and Objects
Many developers use the Microsoft .NET Framework caching objects (for example, System.Web.Caching.Cache) to help take better advantage of memory and increase overall system performance. But many objects are not "thread safe" and caching those objects can cause applications to fail and unexpected or unrelated user errors.
Bb687949.note(en-us,office.12).gifNote:
The caching techniques discussed in this section are not the same as the custom caching options for Web content management that are discussed in Custom Caching Overview.
Caching SharePoint Objects That Are Not Thread Safe
You might try to increase performance and memory usage by caching SPListItemCollection objects that are returned from queries. In general, this is a good practice; however, the SPListItemCollection object contains an embedded SPWeb object that is not thread safe and should not be cached.
For example, assume the SPListItemCollection object is cached in a thread. As other threads try to read this object, the application can fail or behave strangely because the embedded SPWeb object is not thread safe. For more information about the SPWeb object and thread safety, see the Microsoft.SharePoint.SPWeb class.
The guidance in the following section describes how you can prevent multiple threads from attempting to read the same cached object.
Understanding the Potential Pitfalls of Thread Synchronization
You might not be aware that your code is running in a multithreaded environment (by default, Internet Information Services, or IIS, is multithreaded) or how to manage that environment. The following example shows the code some developers use to cache Microsoft.SharePoint.SPListItemCollection objects.
Bad Coding Practice
Caching an Object That Multiple Threads Might Read
C#
public void CacheData()
{
   SPListItemCollection oListItems;

   oListItems = (SPListItemCollection)Cache["ListItemCacheName"];
   if(oListItems == null)
   {
      oListItems = DoQueryToReturnItems();
      Cache.Add("ListItemCacheName", oListItems, ..);
   }
}
Although the use of the cache in this example is functionally correct, because the ASP.NET cache object is thread safe, it introduces potential performance problems. (For more information about ASP.NET caching, see the Cache class.) If the query in the preceding example takes 10 seconds to complete, many users could try to access that page simultaneously during that amount of time. In this case, all of the users would run the same query, which would attempt to update the same cache object. If that same query runs 10, 50, or 100 times, with multiple threads trying to update the same object at the same time—especially on multiprocess, hyperthreaded computers—performance problems would become especially severe.
To prevent multiple queries from accessing the same objects simultaneously, you must change the code as follows.
Applying a Lock
Checking for null
C#
private static object _lock =  new object();

public void CacheData()
{
   SPListItemCollection oListItems;

   lock(_lock)
   {
      oListItems = (SPListItemCollection)Cache["ListItemCacheName"];
      if(oListItems == null)
      {
         oListItems = DoQueryToReturnItems();
         Cache.Add("ListItemCacheName", oListItems, ..);
     }
   }
}
You can increase performance slightly by placing the lock inside the if(oListItems == null) code block. When you do this, you do not need to suspend all threads while checking to see if the data is already cached. Depending on the time it takes the query to return the data, it is still possible that more than one user might be running the query at the same time. This is especially true if you are running on multiprocessor computers. Remember that the more processors that are running and the longer the query takes to run, the more likely putting the lock in the if() code block will cause problems. If you want to make absolutely sure that another thread has not created oListItems before the current thread has a chance to work on it, you could use the following pattern.
Applying a Lock
Rechecking for null
C#
private static object _lock =  new object();

public void CacheData()
{
   SPListItemCollection oListItems;
       oListItems = (SPListItemCollection)Cache["ListItemCacheName"];
      if(oListItems == null)
      {
         lock (_lock)
         {
              //Ensure that the data was not loaded by a concurrent thread while waiting for lock.
              oListItems = (SPListItemCollection)Cache[“ListItemCacheName”];
              if (oListItems == null)
              {
                   oListItems = DoQueryToReturnItems();
                   Cache.Add("ListItemCacheName", oListItems, ..);
              }
         }
     }
}

If the cache is already populated, this last example performs as well as the initial implementation. If the cache is not populated and the system is under a light load, acquiring the lock will cause a slight performance penalty. This approach should significantly improve performance when the system is under a heavy load, because the query will be executed only once instead of multiple times, and queries are usually expensive in comparison with the cost of synchronization.
The code in these examples suspends all other threads in a critical section running in IIS, and prevents other threads from accessing the cached object until it is completely built. This addresses the thread synchronization issue; however, the code is still not correct because it is caching an object that is not thread safe.
To address thread safety, you can cache a DataTable object that is created from the SPListItemCollection object. You would modify the previous example as follows so that your code gets the data from the DataTable object.
Good Coding Practice
Caching a DataTable Object
C#
private static object _lock =  new object();

public void CacheData()
{
   DataTable oDataTable;
   SPListItemCollection oListItems;
   lock(_lock)
   {
           oDataTable = (DataTable)Cache["ListItemCacheName"];
           if(oDataTable == null)
           {
              oListItems = DoQueryToReturnItems();
              oDataTable = oListItems.GetDataTable();
              Cache.Add("ListItemCacheName", oDataTable, ..);
           }
   }
}
For more information and examples of using the DataTable object, and other good ideas for developing SharePoint applications, see the reference topic for the DataTable class.
Using Objects in Event Receivers
Do not instantiate SPWeb, SPSite, SPList, or SPListItem objects within an event receiver. Event receivers that instantiate SPSite, SPWeb, SPList, or SPListItem objects instead of using the instances passed via the event properties can cause the following problems:
  • They incur significant additional roundtrips to the database. (One write operation can result in up to five additional roundtrips in each event receiver.)
  • Calling the Update method on these instances can cause subsequent Update calls in other registered event receivers to fail.
Bad Coding Practice
Instantiating an SPSite Object Inside an Event Receiver
C#
public override void ItemDeleting(SPItemEventProperties properties)
{
    using (SPSite site = new SPSite(properties.WebUrl))

    using (SPWeb web = site.OpenWeb())
        {
        SPList list = web.Lists[properties.ListId];
        SPListItem item = list.GetItemByUniqueId(properties.ListItemId);
        // Operate on item.
        }
    }
}

Good Coding Practice
Using SPItemEventProperties
C#
// Retrieve SPWeb and SPListItem from SPItemEventProperties instead of
// from a new instance of SPSite.
SPWeb web = properties.OpenWeb();
// Operate on SPWeb object.
SPListItem item = properties.ListItem;
// Operate on item.
If you do not apply this fix in your code, when you call Update on the new instance, you must invalidate it with the Invalidate method in the appropriate child class of SPEventPropertiesBase (for example, SPItemEventProperties.InvalidateListItem or SPItemEventProperties.InvalidateWeb).
Working with Folders and Lists
When folders and lists grow in size, custom code that works with them needs to be designed in ways that optimize performance. Otherwise, your applications will run slowly and even cause timeouts to occur. The following recommendations for addressing performance concerns while working with large folders and lists are based on the test results reported in Steve Peschka's white paper, Working with Large Lists in Office SharePoint Server 2007.
  1. Do not use SPList.Items.

    SPList.Items selects all items from all subfolders, including all fields in the list. Use the following alternatives for each use case.
    • Retrieving all items in a list

      Use
      SPList.GetItems(SPQuery query) instead. Apply filters, if appropriate, and specify only the fields you need to make the query more efficient. If the list contains more than 2,000 items, you will need to paginate the list in increments of no more than 2,000 items. The following code example shows how to paginate a large list.

      Good Coding Practice

      Retrieving Items with SPList.GetItems
C#
SPQuery query = new SPQuery();
SPListItemCollection spListItems ; 
string lastItemIdOnPage = null; // Page position.
int itemCount = 2000

while (itemCount == 2000)
{
    // Include only the fields you will use.
    query.ViewFields = "";  
    query.RowLimit = 2000; // Only select the top 2000.
    // Include items in subfolder (if necessary).
    query.ViewAttributes = "Scope=\"Recursive\"";
    StringBuilder sb = new StringBuilder();
    // To make it order by ID and stop scanning the table, specify the OrderBy override attribute.
    sb.Append("");
    //.. Append more text as necessary ..
    query.Query = sb.ToString();
    // Get 2,000 more items.

    SPListItemCollectionPosition pos = new SPListItemCollectionPosition(lastItemIdOnPage);
    query.ListItemCollectionPosition = pos; //page info
    spListItems = spList.GetItems(query);
    lastItemIdOnPage = spListItems.ListItemCollectionPosition.PagingInfo;
    // Code to enumerate the spListItems.
    // If itemCount <2000, we finish the enumeration.
    itemCount = spListItems.Count;

}

The following example shows how to enumerate and paginate a large list.
C#
SPWeb oWebsite = SPContext.Current.Web;
SPList oList = oWebsite.Lists["Announcements"];

SPQuery oQuery = new SPQuery();
oQuery.RowLimit = 10;
int intIndex = 1;

do
{
    Response.Write("
Page: "
+ intIndex + "
"
);
    SPListItemCollection collListItems = oList.GetItems(oQuery);

    foreach (SPListItem oListItem in collListItems)
    {
        Response.Write(SPEncode.HtmlEncode(oListItem["Title"].ToString()) +"
"
);
    }

    oQuery.ListItemCollectionPosition = collListItems.ListItemCollectionPosition;
    intIndex++;
} while (oQuery.ListItemCollectionPosition != null);

    • Getting items by identifier

      Instead of using
      SPList.Items.GetItemById, use SPList.GetItemById(int id, string field1, params string[] fields). Specify the item identifier and the field that you want.
  1. Do not enumerate entire SPList.Items collections or SPFolder.Files collections.

    Accessing the methods and properties that are listed in the left column of the following table will enumerate the entire
    SPList.Items collection, and cause poor performance and throttling for large lists. Instead, use the alternatives listed in the right column.
Table 1. Alternatives to SPList.Items
Poor Performing Methods and Properties
SPList.Items.Count
SPList.Items.XmlDataSchema
SPList.Items.NumberOfFields
SPList.Items[System.Guid]
SPList.Items[System.Int32]
SPList.Items.GetItemById(System.Int32)
SPList.Items.ReorderItems(System.Boolean[],System.Int32[],System.Int32)
SPFolder.Files.Count

Better Performing Alternatives
SPList.ItemCount
Create an SPQuery object to retrieve only the items you want.
Create an SPQuery object (specifying the ViewFields) to retrieve only the items you want.
SPList.GetItemByUniqueId(System.Guid)
SPList.GetItemById(System.Int32)
SPList.GetItemById(System.Int32)
Perform a paged query by using SPQuery and reorder the items within each page.
SPFolder.ItemCount

Bb687949.note(en-us,office.12).gifNote:
The SPList.ItemCount property is the recommended way to retrieve the number of items in a list. As a side effect of tuning this property for performance, however, the property can occasionally return unexpected results. If the precise number is required, you should use the poorer performing GetItems(SPQuery query), as shown in the preceding code example.
  1. Use PortalSiteMapProvider (Microsoft Office SharePoint Server 2007 only).

    Steve Peschka's white paper Working with Large Lists in Office SharePoint Server 2007 describes an efficient approach to retrieving list data in Office SharePoint Server 2007 by using the
    PortalSiteMapProvider class. PortalSiteMapProvider provides an automatic caching infrastructure for retrieving list data. The GetCachedListItemsByQuery method of PortalSiteMapProvider takes an SPQuery object as a parameter, and then checks its cache to determine whether the items already exist. If they do, the method returns the cached results. If not, it queries the list and stores the results in a cache. This approach works especially well when you are retrieving list data that does not change significantly over time. When data sets change frequently, the class incurs the performance cost of continually writing to the cache in addition to the costs of reading from the database. Consider that the PortalSiteMapProvider class uses the site collection object cache to store data. This cache has a default size of 100 MB. You can increase the size of this cache for each site collection on the object cache settings page for the site collection. But this memory is taken from the shared memory available to the application pool and can therefore affect the performance of other applications. Another significant limitation is that you cannot use the PortalSiteMapProvider class in applications based on Windows Forms. The following code example shows how to use this method.

    Good Coding Practice

    Using PortalSiteMap Provider
C#
// Get the current SPWeb object.
SPWeb curWeb = SPControl.GetContextWeb(HttpContext.Current);

// Create the query.
SPQuery curQry = new SPQuery();
curQry.Query = "'Expense_x0020_Category'/>
'Text'>Hotel";

// Create an instance of PortalSiteMapProvider.
PortalSiteMapProvider ps = PortalSiteMapProvider.WebSiteMapProvider;
PortalWebSiteMapNode pNode = ps.FindSiteMapNode(curWeb.ServerRelativeUrl) as PortalWebSiteMapNode;

// Retrieve the items.

SiteMapNodeCollection pItems = ps.GetCachedListItemsByQuery(pNode, "myListName_NotID", curQry, curWeb);

// Enumerate through all of the matches.
foreach (PortalListItemSiteMapNode pItem in pItems)
   {
   // Do something with each match.
   }

  1. Whenever possible, acquire a reference to a list by using the list's GUID or URL as a key.

    You can retrieve an
    SPList object from the SPWeb.Lists property by using the list's GUID or display name as an indexer. Using SPWeb.Lists[GUID] and SPWeb.GetList(strURL) is always preferable to using SPWeb.Lists[strDisplayName]. Using the GUID is preferable because it is unique, permanent, and requires only a single database lookup. The display name indexer retrieves the names of all the lists in the site and then does a string comparison with them. If you have a list URL instead of a GUID, you can use the
GetList
method to look up the list's GUID in the content database before retrieving the list.
Writing Applications That Scale to Large Numbers of Users
You might not be aware that you need to write your code to be scalable so that it can handle multiple users simultaneously. A good example is creating custom navigation information for all sites and subsites on each page or as part of a master page. If you have a SharePoint site on a corporate intranet and each department has its own site with many subsites, your code might resemble the following.
C#
public void GetNavigationInfoForAllSitesAndWebs()
{
   foreach(SPSite oSPSite in SPContext.Current.Site.WebApplication.Sites)
   {
      try
      {
         SPWeb oSPWeb  = oSPSite.RootWeb;
         AddAllWebs(oSPWeb );
      }
      finally
      {
         oSPSite.Dispose();
      }
   }
}
C#

public void AddAllWebs(SPWeb oSPWeb)
{
   foreach(SPWeb oSubWeb in oSPWeb.Webs)
   {
       try
       {
           //.. Code to add items ..
           AddAllWebs(oSubWeb);
       }
       finally
       {
            if (oSubWeb != null)
            oSubWeb.Dispose();
       }
   }
}
While the previous code disposes of objects properly, it still causes problems because the code iterates through the same lists over and over. For example, if you have 10 site collections and an average of 20 sites or subsites per site collection, you would iterate through the same code 200 times. For a small number of users this might not cause bad performance. But, as you add more users to the system, the problem gets worse, as shown in Table 2.
Table 2. Iterations increase as number of users increases
Users
Iterations
10
2000
50
10000
100
200000
250
500000
Although the code executes for each user that hits the system, the data remains the same for each user. The impact of this can vary depending on what the code is doing. In some cases, repeating code might not cause a performance problem; however, in the previous example the system must create a COM object (SPSite or SPWeb objects are created when retrieved from their collections), retrieve data from the object, and then dispose of the object for each item in the collection. This can have a significant impact on performance.
How to make this code more scalable or fine-tune it for a multiple user environment can be a hard question to answer. It depends on what the application is designed to do.
You should take the following into consideration when asking how to make code more scalable:
  • Is the data static (seldom changes), somewhat static (changes occasionally), or dynamic (changes constantly)?
  • Is the data the same for all users, or does it change? For example, does it change depending on the user who is logged on, the part of the site being accessed, or the time of year (seasonal information)?
  • Is the data easily accessible or does it require a long time to return the data? For example, is it returning from a long-running database query or from remote databases that can have some network latency in the data transfers?
  • Is the data public or does it require a higher level of security?
  • What is the size of the data?
  • Is the SharePoint site on a single server or on a server farm?
How you answer the previous questions will determine in which of several ways you can make your code more scalable and handle multiple users. The intent of this article is not to provide answers for all of the questions or scenarios but to provide a few ideas that you can apply to your specific needs. The following sections offer areas for your consideration.
Caching Raw Data
You can cache your data by using the System.Web.Caching.Cache object. This object requires that you query the data one time and store it in the cache for access by other users.
If your data is static, you can set up the cache to load the data only one time and not expire until the application is restarted, or to load the data once per day to ensure data freshness. You can create the cache item when the application starts, when the first user session starts, or when the first user tries to access that data.
If your data is somewhat static, you can set up the cached items to expire within a certain number of seconds, minutes, or hours after the data is created. This enables you to refresh your data within a timeframe that is acceptable to your users. Even if the data is cached for only 30 seconds, under heavy loads you will still see improved performance because you are running the code only one time every 30 seconds instead of multiple times per second for each user who hits the system.
Security trimming is another issue to consider whenever you cache data. For example, if you cache items as you iterate through a list, you may get only a subset of the data (the data that the current user can see), or if you use a DataTable object to cache all of the items in a list, you have no easy way of applying security trimming to users who belong to groups that can see only a subset of the data. For more information about storing security trimmed data in caches, see the CrossListQueryCache class.
In addition, ensure you consider the issues described earlier in Caching Data and Objects.
Building Data Before Displaying It
Think about how your cached data will be used. If this data is used to make run-time decisions, putting it into a DataSet or DataTable object might be the best way to store it. You can then query those objects for the data to make run-time decisions. If the data is being used to display a list, table, or formatted page to the user, consider building a display object and storing that object in the cache. At run time, you need only retrieve the object from the cache and call its render function to display its contents. You could also store the rendered output; however, this can lead to security issues and the cached item could be quite large, causing increased page swapping or memory fragmentation.
Caching For a Single Server or Server Farm
Depending on how you set up your SharePoint site, you might have to address certain caching issues differently. If your data must be the same on all servers at all times, then you must ensure that the same data is cached on each server.
One way to ensure this is to create the cached data and store it on a common server or in a Microsoft SQL Server database. Again, you must consider how much time it takes to access the data and what security issues can arise from storing the data on a common server.
You can also create business-layer objects that cache data on a common sever, and then access that data by using various interprocess communications that are available in networking objects or APIs.
Using SPQuery Objects
SPQuery objects can cause performance problems whenever they return large result sets. The following suggestions will help you optimize your code so that performance will not suffer greatly whenever your searches return large numbers of items.
  • Do not use an unbounded SPQuery object.

    An
    SPQuery object without a value for RowLimit will perform poorly and fail on large lists. Specify a RowLimit between 1 and 2000 and, if necessary, page through the list.
  • Use indexed fields.

    If you query on a field that is not indexed, the query will be blocked whenever it would result in a scan of more items than the query threshold (as soon as there are more items in the list than are specified in the query threshold). Set
    SPQuery.RowLimit to a value that is less than the query threshold.
  • If you know the URL of your list item and want to query by FileRef, use SPWeb.GetListItem(string strUrl, string field1, params string[] fields) instead.

No comments:

Post a Comment

Sharepoint