Friday, April 25, 2014

Database performance optimization (through Indexing)


Non-Clustered Indexes

In non-clustered index:
  1. The physical order of the rows is not the same as the index order. 
  2. Typically created on column used in JOIN, WHERE, and ORDER BY clauses. 
  3. Good for tables whose values may be modified frequently. 
  4. Microsoft SQL Server creates non-clustered indices by default when CREATE INDEX command is given. Maximum number of non-clustered indexes is 249 per table in SQL Server 2005 and 999 non-clustered indexes per table in SQL Server 2008. 
  5. It is appropriate to create non-clustered index on columns which are: 
    • Frequently used in the search criteria. 
    • Used to join other tables. 
    • Used as foreign key fields. 
    • Of having high selectivity (column which returns a low percentage (0-5%) of rows from a total number of rows on a particular value). 
    • Used in the ORDER BY clause. 
    • Of type XML (primary and secondary indexes need to be created; more on this in the coming articles). 

Clustered Indexes

Clustering alters the data block into a certain distinct order to match the index, resulting in the row data being stored in order. Therefore, only one clustered index can be created on a given database table. Clustered indices can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items is selected.

Best practices for creating indexes

Each index you define can improve performance of a SELECT query, but on the other hand can decrease performance of INSERT and UPDATE queries. The reason for this is that SQL Server automatically maintains index keys. Therefore, each time you issue a data modification statement, not only data modification SQL Server provides, but also index it updates each index defined on affected table. The performance degradation is noticeable in case of large tables with many indexes., or few indexes with long keys. In some cases it is suitable to drop index before updating or inserting new records and than recreate index. This is in case of large tables where you need to update or insert a large amount of data. You will find out that insert into table without indexes will be much more faster than into table with indexes.

There are several tips you should keep in mind when implementing indexing.

Keep indexes lean. Try to build indexes on one or few columns at most. Wide indexes take longer to scan than narrow indexes.

Create the clustered index on every table. However, choose the column(s) for the clustered index judiciously. Try to create the clustered index on the column which is used most frequently for retrieving data.

Try to create the clustered index on the column with high selectivity; that is the column that does not have many duplicate values.

Try to create the clustered index on the column that will never be updated or will be updated infrequently. Every time the clustered index key is updated, SQL Server must maintain not just the clustered index but also the non-clustered indexes since non-clustered indexes contain a pointer to the clustered index. This is yet another reason why you shouldn't create the clustered index on multiple columns.

By default, SQL Server creates the clustered index on the PRIMARY KEY column(s) unless the clustered index is created prior to creating the primary key. It is often beneficial to have the clustered index on the primary key, but sometimes you're better off saving the clustered index for other column(s). Feel free to override the default behaviour if your testing shows that clustered index on a non-key column will help your queries perform better.

SQL Server has to perform some maintenance when index keys are updated, so each additional index could impose a slight performance penalty on the performance of INSERT, UPDATE and DELETE statements. Nevertheless the cost of index maintenance could be far less than the performance improvement you'll reap for your SELECT statements. Therefore you can be fairly liberal with the number of non-clustered indexes.

Be sure to eliminate duplicate indexes, that is, multiple indexes created on the same set of columns. Such indexes provide no additional benefit but use up extra disk space and might hinder performance of INSERT, UPDATE and DELETE statements.

Check the default fill factor level configured at the server level. If you don't specify the FILLFACTOR option with CREATE INDEX statement, every new index will be created using the default fill factor. This may or may not be what you intend.

Non-clustered indexes can be created in different file groups which can reside on separate disk drives to improve the data access i.e. I/O operations.

In a couple of articles I have found that the identity column is the best choice for clustered index, but you must keep in mind that this way shouldn't be the best way because it force users to enter new data into the last data page of the table. This condition is sometimes referred to as a “hotspot” since there may be multiple users competing for the last available spot on a page and therefore making INSERT statements slow.

In some cases you can find tables that are never queried based on one column. In this case some developers prefer to create clustered index on set of columns that are most frequently used for data retrieving and uniquely identifies each record. This types of indexes are called composite clustered indexes. However, from a performance view you should avoid composite clustered indexes. Generally speaking, the leaner index, the faster SQL Server can scan or seek through it. For small tables (or data sets) composite indexes perform relatively well, but as number of records grows, performance decreases.

How to choose appropriate order of fields on each index

It is very important to choose appropriate order of fields on each index. Bad order causes that index could be unused. First rule is that the most selective columns should go first. This rule sometimes can lead to misunderstanding that every index should contains most selective column as a leading column. Let’s consider you have table with 3 columns: ID, fname and lname. ID is most selective column with clustered index. Now you want to create non-clustered index for fname and lname. lname has higher selectivity than fname. If you want to create non-clustered index for the rest two columns, place lname on the first and fname on the second. Don’t place ID column on index. Be very careful which column you place on the first. This is because of the SQL server keeps histogram only for first column of an index. That means, that SQL Server knows only the actual distribution of values of the first column. If the first column is not selective, the index may not be used.

Example

Table with no indexes

Select * from Emp where LastName=’Ahmad’
Estimated time =

























Friday, April 11, 2014

Yield statement

Yield keyword is used in an iterator block to calculate the value for the enumerator object.


Can be understand by an example


1. class Program
2. {
3.        static void Main(string[] args)
4.        {
5.            foreach (int i in GetInt())
6.                Console.WriteLine("Got " + i.ToString());
7.        }
8.        public static IEnumerable<int> GetInt()
9.        {
10.            for (int i = 0; i < 5; i++)
{
11.               yield return i;

if(i==3)
yield break;

}
12.        }
}



As soon as control reach at line no 11 for i=0. It returns the value of i to caller.
And control reached to line no 5 again thereafter print its value.


Again call the same function GetInt but this time loop does not start from starting it start from where we have leave it i.e for i=1 and again when it reach to line no 11 it returns the value of i to caller. This process repeat until the GetInt() loop ends.


So output will be


0
1
2
3
4


This output we can without using yield statement but main point is that while using yield no need to iterate the whole loop if not required.



Friday, April 4, 2014

Difference between IEnumerable and IEnumerator

IEnumerable is an interface that has the definition one method named GetEnumerator which return an object of type IEnumerator.


public class OldEnumerable : IEnumerable
{
  public IEnumerator GetEnumerator()
  {
      return new OldEnumerator();
  }
}



Now let’s look at OldEnumerator


public class OldEnumerator : IEnumerator
{
  int x = 0;


  public object Current
  {
      get { return x; }
  }


  public bool MoveNext()
  {
      return x++ < 100;
  }


  public void Reset()
  {
      x = 0;
  }
}

Followers

Link