SQL Server Architecture

Components of the SQL Server Engine

Figure 1-1 shows the general architecture of SQL Server, which has four major components (three of whose subcomponents are listed): protocols, the relational engine (also called the Query Processor), the storage engine and the SQLOS. Every batch submitted to SQL Server for execution, from any client application, must interact with these four components. (For simplicity, I’ve made some minor omissions and simplifications and ignored certain “helper” modules among the subcomponents.)

Figure 1-1: The major components of the SQL Server database engine

The protocol layer receives the request and translates it into a form that the relational engine can work with, and it also takes the final results of any queries, status messages, or error messages and translates them into a form the client can understand before sending them back to the client. The Relational engine layer accepts SQL batches and determines what to do with them. For Transact-SQL queries and programming constructs, it parses, compiles, and optimizes the request and oversees the process of executing the batch. As the batch is executed, if data is needed, a request for that data is passed to the storage engine. The storage engine manages all data access, both through transaction-based commands and bulk operations such as backup, bulk insert, and certain DBCC (Database Consistency Checker) commands. The SQLOS layer handles activities that are normally considered to be operating system responsibilities, such as thread management (scheduling), synchronization primitives, deadlock detection, and memory management, including the buffer pool.

Protocols

When an application communicates with the SQL Server Database Engine, the application programming interfaces (APIs) exposed by the protocol layer formats the communication using a Microsoft-defined format called a tabular data stream (TDS) packet. There are Net-Libraries on both the server and client computers that encapsulate the TDS packet inside a standard communication protocol, such as TCP/IP or Named Pipes. On the server side of the communication, the Net-Libraries are part of the Database Engine, and that protocol layer is illustrated in Figure 1-1. On the client side, the Net-Libraries are part of the SQL Native Client. The configuration of the client and the instance of SQL Server determine which protocol is used.

SQL Server can be configured to support multiple protocols simultaneously, coming from different clients. Each client connects to SQL Server with a single protocol. If the client program does not know which protocols SQL Server is listening on, you can configure the client to attempt multiple protocols sequentially.

The following protocols are available:
Protocols used to establish communication between client and server.

There are 4 protocols

1. Shared Memory

2. Named Pipes

3. TCP/IP

4. VIA

· Shared Memory The simplest protocol to use, with no configurable settings. Clients using the Shared Memory protocol can connect only to a SQL Server instance running on the same computer, so this protocol is not useful for most database activity. Use this protocol for troubleshooting when you suspect that the other protocols are configured incorrectly. Clients using MDAC 2.8 or earlier cannot use the Shared Memory protocol. If such a connection is attempted, the client is switched to the Named Pipes protocol.

· Named Pipes A protocol developed for local area networks (LANs). A portion of memory is used by one process to pass information to another process, so that the output of one is the input of the other. The second process can be local (on the same computer as the first) or remote (on a networked computer).

· TCP/IP The most widely used protocol over the Internet. TCP/IP can communicate across interconnected networks of computers with diverse hardware architectures and operating systems. It includes standards for routing network traffic and offers advanced security features. Enabling SQL Server to use TCP/IP requires the most configuration effort, but most networked computers are already properly configured.

· Virtual Interface Adapter (VIA) A protocol that works with VIA hardware. This is a specialized protocol; configuration details are available from your hardware vendor.

Tabular Data Stream Endpoints

SQL Server 2005 also introduces a new concept for defining SQL Server connections: the connection is represented on the server end by a TDS endpoint. During setup, SQL Server creates an endpoint for each of the four Net-Library protocols supported by SQL Server, and if the protocol is enabled, all users have access to it. For disabled protocols, the endpoint still exists but cannot be used. An additional endpoint is created for the dedicated administrator connection (DAC), which can be used only by members of the sysadmin fixed server role.

The Relational Engine

As mentioned earlier, the relational engine is also called the query processor. It includes the components of SQL Server that determine exactly what your query needs to do and the best way to do it. By far the most complex component of the query processor, and maybe even of the entire SQL Server product, is the query optimizer, which determines the best execution plan for the queries in the batch.

The relational engine also manages the execution of queries as it requests data from the storage engine and processes the results returned. Communication between the relational engine and the storage engine is generally in terms of OLE DB row sets. (Row set is the OLE DB term for a result set.) The storage engine comprises the components needed to actually access and modify data on disk.

The Command Parser

The command parser handles Transact-SQL language events sent to SQL Server. It checks for proper syntax and translates Transact-SQL commands into an internal format that can be operated on. This internal format is known as a query tree. If the parser doesn’t recognize the syntax, a syntax error is immediately raised that identifies where the error occurred. However, non-syntax error messages cannot be explicit about the exact source line that caused the error. Because only the command parser can access the source of the statement, the statement is no longer available in source format when the command is actually executed.

The Query Optimizer

The query optimizer takes the query tree from the command parser and prepares it for execution. Statements that can’t be optimized, such as flow-of-control and DDL commands, are compiled into an internal form. The statements that are optimizable are marked as such and then passed to the optimizer. The optimizer is mainly concerned with the DML statement SELECT, INSERT, UPDATE, and DELETE, which can be processed in more than one way, and it is the optimizer’s job to determine which of the many possible ways is the best. It compiles an entire command batch, optimizes queries that are optimizable, and checks security. The query optimization and compilation result in an execution plan.

The first step in producing such a plan is to normalize each query, which potentially breaks down a single query into multiple, fine-grained queries. After the optimizer normalizes a query, it optimizes it, which means it determines a plan for executing that query. Query optimization is cost based; the optimizer chooses the plan that it determines would cost the least based on internal metrics that include estimated memory requirements, CPU utilization, and number of required I/Os. The optimizer considers the type of statement requested, checks the amount of data in the various tables affected, looks at the indexes available for each table, and then looks at a sampling of the data values kept for each index or column referenced in the query. The sampling of the data values is called distribution statistics. Based on the available information, the optimizer considers the various access methods and processing strategies it could use to resolve a query and chooses the most cost-effective plan.

The SQL Manager

The SQL manager is responsible for everything related to managing stored procedures and their plans. It determines when a stored procedure needs recompilation, and it manages the caching of procedure plans so that other processes can reuse them.

The SQL manager also handles auto parameterization of queries. In SQL Server 2008, certain kinds of ad-hoc queries are treated as if they were parameterized stored procedures, and query plans are generated and saved for them. SQL Server can save and reuse plans in several other ways, but in some situations using a saved plan might not be a good idea.

The Database Manager

The database manager handles access to the metadata needed for query compilation and optimization, making it clear that none of these separate modules can be run completely separately from the others. The metadata is stored as data and is managed by the storage engine, but metadata elements such as the data types of columns and the available indexes on a table must be available during the query compilation and optimization phase, before actual query execution starts.

The Query Executor

The query executor runs the execution plan that the optimizer produced, acting as a dispatcher for all the commands in the execution plan. This module steps through each command of the execution plan until the batch is complete. Most of the commands require interaction with the storage engine to modify or retrieve data and to manage transactions and locking.

The Storage Engine

The SQL Server storage engine has traditionally been considered to include all the components involved with the actual processing of data in your database. SQL Server 2005 separates out some of these components into a module called the SQLOS. In fact, the SQL Server storage engine team at Microsoft actually encompasses three areas: access methods, transaction management, and the SQLOS.

Transaction Services

A core feature of SQL Server is its ability to ensure that transactions are atomic–that is, all or nothing. In addition, transactions must be durable, which means that if a transaction has been committed, it must be recoverable by SQL Server no matter what–even if a total system failure occurs 1 millisecond after the commit was acknowledged. There are actually four properties that transactions must adhere to, called the ACID properties: atomicity, consistency, isolation, and durability.

Locking Operations Locking is a crucial function of a multi-user database system such as SQL Server, even if you are operating primarily in the snapshot isolation level with optimistic concurrency. SQL Server lets you manage multiple users simultaneously and ensures that the transactions observe the properties of the chosen isolation level. Even though readers will not block writers and writers will not block readers in snapshot isolation, writers do acquire locks and can still block other writers, and if two writers try to change the same data concurrently, a conflict will occur that must be resolved. The locking code acquires and releases various types of locks, such as share locks for reading, exclusive locks for writing, intent locks taken at a higher granularity to signal a potential “plan” to perform some operation, and extent locks for space allocation. It manages compatibility between the lock types, resolves deadlocks, and escalates locks if needed. The locking code controls table, page, and row locks as well as system data locks.

The SQLOS

Earlier versions of SQL Server have a thin layer of interfaces between the storage engine and the actual operating system through which SQL Server makes calls to the OS for memory allocation, scheduler resources, thread and worker management, and synchronization objects. However, the services in SQL Server that needed to access these interfaces can be in any part of the engine. SQL Server requirements for managing memory, schedulers, synchronization objects, and so forth have become more complex. Rather than each part of the engine growing to support the increased functionality, all services in SQL Server that need this OS access have been grouped together into a single functional unit called the SQLOS. In general, the SQLOS is like an operating system inside SQL Server. It provides memory management, scheduling, IO management, a framework for locking and transaction management, deadlock detection, and general utilities for dumping, exception handling, and so on.

Another member of the product team described the SQLOS to me as a set of data structures and APIs that could potentially be needed by operations running at any layer of the engine. For example, consider various operations that require use of memory. SQL Server doesn’t just need memory when it reads in data pages through the storage engine; it also needs memory to hold query plans developed in the query processor layer. Figure 1-1 (shown earlier) depicts the SQLOS layer in several parts, but this is just a way of showing that many SQL Server components use SQLOS functionality.

A Basic select Statement Life Cycle Summary

Figure 1-5 shows the whole life cycle of a SELECT query, described here:

1. The SQL Server Network Interface (SNI) on the client established a connection to the SNI on the SQL Server using a network protocol such as TCP/IP. It then created a connection to a TDS endpoint over the TCP/IP connection and sent the SELECT statement to SQL Server as a TDS message.

2. The SNI on the SQL Server unpacked the TDS message, read the SELECT statement, and passed a “SQL Command” to the Command Parser.

3. The Command Parser checked the plan cache in the buffer pool for an existing, usable query plan. When it didn’t fi nd one, it created a query tree based on the SELECT statement and passed it to the Optimizer to generate a query plan.

4. The Optimizer generated a “zero cost” or “trivial” plan in the pre-optimization phase because the statement was so simple. The query plan created was then passed to the Query Executor for execution.

5. At execution time, the Query Executor determined that data needed to be read to complete the query plan so it passed the request to the Access Methods in the Storage Engine via an OLE DB interface.

6. The Access Methods needed to read a page from the database to complete the request from the Query Executor and asked the Buffer Manager to provision the data page.

7. The Buffer Manager checked the data cache to see if it already had the page in cache. It wasn’t in cache so it pulled the page from disk, put it in cache, and passed it back to the Access Methods.

8. Finally, the Access Methods passed the result set back to the Relational Engine to send to the client.

Database Architecture

SQL Server

Microsoft® SQL Server™ data is stored in databases. The data in a database is organized into the logical components such as tables, views, procedures, and users visible to users. A database is also physically implemented as two or more files on disk.

The Essential part of SQL Server is Database Engine. There are 2 parts.

1. Relation Engine.

2. Storage Engine.

Relation Engine:

It prepares the query execution plan to execute query once plan is prepared it give to storage engine.

In relational Engine we have query parser it will check for syntax errors after that query passes to compiler here it converts into machine level language then it goes to query governor. It prepares plan to execute query in which order it has to perform we call this as execution plan.

Query Parser

Compiler

Query Governor

Storage Engine:

Here Actual Execution takes place

When using a database, you work primarily with the logical components such as tables, views, procedures, and users. The physical implementation of files is largely transparent. Typically, only the database administrator needs to work with the physical implementation.

Each instance of SQL Server has four system databases (master, model, tempdb, and msdb) and one or more user databases. Some organizations have only one user database, containing all the data for their organization. Some organizations have different databases for each group in their organization, and sometimes a database used by a single application. For example, an organization could have one database for sales, one for payroll, one for a document management application, and so on. Sometimes an application uses only one database; other applications may access several databases.

It is not necessary to run multiple copies of the SQL Server database engine to allow multiple users to access the databases on a server. An instance of the SQL Server Standard or Enterprise Edition is capable of handling thousands of users working in multiple databases at the same time. Each instance of SQL Server makes all databases in the instance available to all users that connect to the instance, subject to the defined security permissions.

When connecting to an instance of SQL Server, your connection is associated with a particular database on the server. This database is called the current database. You are usually connected to a database defined as your default database by the system administrator, although you can use connection options in the database APIs to specify another database. You can switch from one database to another using either the Transact-SQL USE database_name statement, or an API function that changes your current database context.

SQL Server allows you to detach databases from an instance of SQL Server, then reattach them to another instance, or even attach the database back to the same instance. If you have a SQL Server database file, you can tell SQL Server when you connect to attach that database file with a specific database name.

Describe in brief Databases and SQL Server Databases Architecture.

A database is a structured collection of data.

Database can be thought as simple data file.

It is more powerful than data file which stores data in organized way.

Database organizes the data into a table. It is made up of several tables with rows and columns.

SQL Database Architecture

SQL database is divided into logical and physical components.

Every DB is logically collection of objects & physically collection of files

Logical Component

Logical components comprises of following:

Database objects,

Collation,

Physical component

Physically components exist as two or more files on disk.

Physically files can be of three types.

Primary data files

This is starting point of the database and contain pointer to point other files. Extension is mainly .mdf .

Secondary data files

It consists of data other than primary data file. Extension is mainly .ndf .

Log files

It contains log information which is used to recover database. .ldf

Log file architecture in SQL Server.

In Log File each and every transaction will be recorded with an unique number called LSN(Log Sequence Number) Number.

Log file will be splitted into multiple parts called VLF(Virtual Log File)

1.Active VLF

2.Recoverable VLF

3.Reusable VLF

4.Un used VLF

Active VLF : In a VLF transactions are waiting to be deliver some other server due to log shipping or replication is called active VLF.

Recoverable VLF : Once transactions are deliverd to other server then that VLF become a Recoverable VLF.

Reusable VLF : Once we take the log backup then that VLF become a Reusable VLF.

Un Used VLF : Till now we never used that VLF that vlf is called Un Used VLF.

Default port no for TCP/IP is 1433

The port number of client and server is to be same then only we can establish connection.

Storage Structure of the SQL Server:

- By default all data stores in the form of record we call this a data rows.

- In SQL Server page is basic storage unit. It look like a normal book page with 8kb in size.

- Page header (96-byte header)is used to store system information of page. This information includes page numbers, page type, object ID.

Pages:

In SQL server, the page size is 8KB. This means SQL Server databases have 128 pages per megabyte. Each page begins with a 96-byte header that is used to store system information about the page. This information includes the page number, page type, the amount of free space on the page, and the allocation unit ID of the object that owns the page.

Types of Pages:

1. Data Page

2. Index Page
3. IAM Page
4. DCM (Differential Change Map)
5. BCM (Bulk Changed Map)
6. Page Free Space

7. Text/Image Page

8. GAM (Global Allocation Map)

Page Architecture

Types of pages:

1. Data page

- Consists of actual data.

2. IAM Page

- Consists of page no’s which are allocation to a table or index.

3. Index Page

- Consists of index entries

- It consists of

Column Name, Row Reference

4. DCM (Differential Change Map)

- Information about extents that have changed since the last BACKUP DATABASE

5. BCM (Bulk Changed Map)

- Information about extents modified by bulk operations since the last BACKUP LOG statement.

6. Page Free Space

- Consists of page no’s and free space available.

7. Text/Image Page

- Consists of data whose data type is

- Text

- Varbinary(max)

- Image

- Can contain 2GB of data.

8. GAM (Global Allocation Map)

- Consists of information of extents. How many are free, how many are used.

Note:

Log files do not contain pages; they contain a series of log records.

Data rows are put on the page serially, starting immediately after the header. A row offset table starts at the end of the page, and each row offset table contains one entry for each row on the page. Each entry records how far the first byte of the row is from the start of the page. The entries in the row offset table are in reverse sequence from the sequence of the rows on the page.

Large Row Support

Rows cannot span pages in SQL Server 2005, however portions of the row may be moved off the row's page so that the row can actually be very large. The maximum amount of data and overhead that is contained in a single row on a page is 8,060 bytes (8 KB). However, this does not include the data stored in the Text/Image page type. In SQL Server 2005, this restriction is relaxed for tables that contain varchar, nvarchar, varbinary, or sql_variant columns. When the total row size of all fixed and variable columns in a table exceeds the 8,060 byte limitation, SQL Server dynamically moves one or more variable length columns to pages in the ROW_OVERFLOW_DATA allocation unit, starting with the column with the largest width. This is done whenever an insert or update operation increases the total size of the row beyond the 8060 byte limit. When a column is moved to a page in the ROW_OVERFLOW_DATA allocation unit, a 24-byte pointer on the original page in the IN_ROW_DATA allocation unit is maintained. If a subsequent operation reduces the row size, SQL Server dynamically moves the columns back to the original data page. For more information, see Row-Overflow Data Exceeding 8 KB.

Extents

Extents are the basic unit in which space is managed. An extent is 8 physically continuous pages, or 64 KB. This means SQL Server databases have 16 extents per megabyte.

To group pages of same object in order to improve the performance we can use extents.

To make its space allocation efficient, SQL Server does not allocate whole extents to tables with small amounts of data. SQL Server has two types of extents:

Uniform extents are owned by a single object; all eight pages in the extent can only be used by the owning object.(Consists of pages from same object)

Mixed extents are shared by up to eight objects. Each of the eight pages in the extent can be owned by a different object.(Consists of pages from different objects)

A new table or index is generally allocated pages from mixed extents. When the table or index grows to the point that it has eight pages, it then switches to use uniform extents for subsequent allocations. If you create an index on an existing table that has enough rows to generate eight pages in the index, all allocations to the index are in uniform extents.

Files:

All the extents are stored in the form of files either it is mdf or ldf files.

Database:

All the files make a database

File Group

Logically dividing database into groups

Ex:

If 12 files are there some store system data and some files stores user data. We group them with “Filegroup”.

Storage Structure:

Records

Pages

Extents

Files

Database

In SQL Server data will be stored either in mdf or ldf. Mdf stores permanent data. It will not allow to do changes on it directly.

Ldf stores modified data.

Buffer stores whatever updation done it will record action in buffer(transaction log)

Transaction:

Set of T-SQL statements which have some dependencies whenever changes happened this changes should effect on all dependencies of data. Transaction should process all ACID properties. Then only changes move from ldf to mdf files.

ACID properties

Atomicity:

SQL Server should execute all statements inside transaction at once. There is no partial execution of transactions. Whatever statement given has to be executed full or none of the statements has to be executed.

Consistency:

Whenever any change (modification) happened on parent table it automatically reflect on dependent child table.

Isolation:

Isolation means independent. One transaction do not depend on other transaction. Users able to access multiple transactions at a time.

Durability:

Once any transaction is updated it should be a permanent change in mdf file.

Write Ahead Login (WAL):

Before going to mdf, all changes should log in ldf file to check consistency. The statement should log in the transaction log file is called WAL.

Once it confirm consistency in ldf file that chage move to mdf file. Transaction which are not commited in ldf, will be rolled back.

Checkpoint Process in SQL Server:

Checkpoint is one of the core functions SQL Server engine. checkpoint is a internal process that writes all dirty pages (modified pages) from Buffer Cache to Physical disk, apart from this it also writes the log records from log buffer to physical file. Writing of Dirty pages from buffer cache to data file is also known as Hardening of dirty pages.

Implementation of Checkpoint is completely automatic and typically administrator does not have to worry about customizing it. Automatic execution of checkpoint largely depends on recovery model and usages of the database.

If the database is using either full or bulk-logged recovery model, an automatic checkpoint is generated whenever the number of log records reaches the number the Database Engine estimates it can process during the time specified in the recovery interval option.
Further information go through link: Checkpoint

Checkpoint is a process of checking transactions. Whenever it runs it scans the transactions and check the state of log file. It there any committed data it moves to mdf and failed transactions flushed out of log file.

Recovery Process:

Whenever SQL Server started it scans log file and check what is state of log file in this we have 2 processes:

1. Roll Back

2. Roll Forward

Committed transactions moves from log file to mdf. Failed transactions moves out of ldf.

If check point runs in regular interval there will be no pending files so that recovery process will be very fast. For running check point there is no time interval, it dependent on transaction it runs.

Lazy writer:

Most of time it is in sleep mode. Whenever RAM space is less than it invokes means acts active ad clears buffer pages.

Dirty Pages:

The page which has modified in buffer but not yet committed in mdf file called dirty page.

Whenever every lazy writer runs it checks buffer and log file for committed files. It is going to verify how many times this page was used this we called referred pages it will known fro “Reference Counter”

By using this reference counter least referred pages will be deleted.

Transaction Recoverability:

The transaction log file divided into 2 parts. Active and Inactive

Active contains the transactions which are in processing state.

Inactive contains the transactions which are committed and moved to inactive part of ldf before it moves to mdf.

Even though committed transactions moves to mdf file. It takes back up of those transactions in inactive portion.

\3GB \PAE and AWE

1) With default settings:

Each process will be assigned memory split between:

Physical Memory - user-mode - therefore max 2GB (max depending on phy mem or other apps using the same space) + Disk (Paged)

2) Same (nearly) happens with \3GB.

Each process will be assigned memory split between:

Physical Memory - user-mode - therefore max 3GB (max depending on phy mem or other apps using the same space) + Disk (Paged)

3) With \PAE and AWE

PAE\AWE-aware processes will be assigned up to 64GB of memory (depends on Windows Edition) split between:

Physical Memory - user-mode - therefore max 2GB or 3GB depending on \3GB option (see rule of thumb below) + Physical Memory (Additional) + Disk (Paged)

Rule of Thumb

If your system has < 4 GB - use only /3GB

If your system has > 4 GB and < 16 GB - use /3GB + /PAE + AWE

If your system has > 16 GB - use /PAE + AWE

Now then, so why considering x64 Architecture if we have AWE?

The use of SQL Server (32-bit) with AWE has several important limitations. The additional memory addressability is available only to the relational database engine’s management of database page buffers. It is not available to other memory consuming database operations such as caching query plans, sorting, indexing, joins, or for storing user connection information. It is also not available on other engines such as Analysis Services.

What about max_server_memory and min_server_memory?

By default, SQL Server can change its memory requirements dynamically based on available system resources. Which means, it can use the amount of memory specified between min_server_memory and max_server_memory.

Use min server memory to guarantee a minimum amount of memory available to the buffer pool of an instance of SQL Server. SQL Server will not immediately allocate the amount of memory specified inmin server memory on startup. However, after memory usage has reached this value due to client load, SQL Server cannot free memory from the allocated buffer pool unless the value of min server memoryis reduced.

Use max server memory to prevent the SQL Server buffer pool from using more than the specified amount of memory, thus leaving remaining memory available to start other applications quickly.

Usually (underlined, yes it is underlined) you should leave these settings at the default value, BUT the following considerations\exceptions apply:

o As there is a short delay between the start of a new application and the time SQL Server releases memory, using max server memory prevents this delay and may give better performance to the other application.

o With SQL Server 2000 AWE or SQL 2005\2008 AWE on Windows 2000, memory is not managed dynamically. This means, min server memory is ignored and max server memory is never released so must be set (underlined again).

Min server memory and max server memory are advanced options. If you are using the sp_configure system stored procedure to change these settings, you can change them only when show advanced options is set to 1. These settings take effect immediately (without a server stop and restart).

Tab

Architecture