SQL Azure provides a large-scale multi-tenant database service on shared resources. In order to provide a good experience to all SQL Azure customers, your connection to the service may be closed due to several conditions. This article introduces SQL Azure and its network topology. Then, it lists the reasons for connection-losses or throttling cases and provides guidelines and best coding practices on how to manage the connection life cycle in SQL Azure.

Last Reviewed: 8/19/2011

 Note
If you wish to contribute to this page, use the Edit tab at the top (sign-in required). If you wish to provide feedback for this documentation please either send e-mail to azuredocs@microsoft.com or use the Comment field at the bottom of this page (sign-in required).

Note: If you are already familiar with this information and are looking for example code for connection retry, please see Retry Logic for Transient Failures in SQL Azure.

Introduction

Microsoft SQL Azure Database is a cloud-based relational database service that is built on SQL Server technologies and runs in Microsoft data centers on hardware that is owned, hosted, and maintained by Microsoft.

SQL Azure is built on a scalable platform involving numerous physical servers, and manages all the connections routing between your application and the physical servers where your data resides. Each SQL Azure server can be associated with one or more databases. A SQL Azure server is a logical group of databases.

SQL Azure provides a large-scale multi-tenant database service on shared resources. To enable load balancing and high availability, databases associated with your SQL Azure server may reside on separate physical computers in a Microsoft data center. Each physical computer can service many databases. When there is enough capacity, all the sessions can take full advantage of the available resources in SQL Azure. However, if resources run low at peak times, SQL Azure terminates the sessions using excessive resources to provide a stable system and prevents sessions from monopolizing all the resources till resources become available.

In addition to this, SQL Azure employs a built-in load-balancing technology to ensure the optimal usage of the physical servers and services in the data centers. Load balancing in SQL Azure is achieved by moving the databases within the physical machines and services based on periodic audit of system usage.

The following diagram illustrates the overall SQL Azure network topology.

As seen in the diagram, SQL Azure provides the same tabular data stream (TDS) interface as SQL Server. Therefore, you can use familiar tools and libraries to build client applications for data that is in the cloud.

The load balancers ensure the optimal usage of the physical servers and services in the data centers.

The TDS Gateway functions as a gateway between your application and the underlying platform, where your data resides. It performs the functions of data center isolation, provisioning, billing and metering, and connection routing.

The underlying platform consists of many instances of SQL Server, each of which is managed by the SQL Azure fabric. The SQL Azure fabric is a distributed computing system composed of tightly integrated networks, servers, and storage. It enables automatic failover, load balancing, and automatic replication between physical servers.

Causes of Connection Termination

Connection-loss is not uncommon when databases encounter resource shortages. A unique feature of SQL Azure is its ability to monitor and rebalance active and online user databases in the Microsoft data centers automatically. To achieve this, SQL Azure continuously gathers and analyzes database usage statistics and will terminate connections when necessary.

The following list describes the reasons and thresholds that trigger connection-loss. All connection terminations happen asynchronously from the event causing it.

  • Lock consumption: Sessions consuming greater than one million locks are terminated. When this happens, you will receive error code 40550. You can query the sys.dm_tran_locks dynamic management view (DMV) to obtain information about the current state of locking in SQL Azure.

    In SQL Azure, you can use the following DMVs to monitor your transactions. For more information, seeSystem Views (SQL Azure Database)  .

    • sys.dm_tran_active_transactions
    • sys.dm_tran_database_transactions
    • sys.dm_tran_locks
    • sys.dm_tran_session_transactions
  • Uncommitted transactions: Uncommitted transactions can block the truncation of log files. To prevent this, the distance from the first or oldest active transaction log sequence number (LSN) to the tail of the log (current LSN) cannot exceed 20% of the size of the log file. When violated, the offending transaction is terminated and rolled back so that the log can be truncated. The error code is 40552.
  • Transactions blocking system calls: All transactions request locks of different types on the resources, such as rows, pages, or tables, on which the transaction is dependent. Each transaction frees its locks when it no longer has a dependency on the locked resources. Due to these locks, some transactions might block resources required by system sessions. In SQL Azure, if a transaction locks a resource required by an underlying system operation for more than 20 seconds, it is terminated. The error code is 40549. The system has to perform certain operations to maintain the availability and consistency of databases. Some examples of system operations include, but not limited to:
    • Failover configuration management
    • Replica reconfiguration
    • Configuration when a database or replica goes offline
  • Log file size: Transactions consuming excessive log resources are terminated. The maximum permitted log size for a single transaction is 1-gigabyte (GB). The error code is 40552.
  • TempDB usage and size: Large transactions, transactions executed in large batches, or large sorts might consume a significant tempdb space. When a session uses more than 5 GB of tempdb space (= 655,360 pages), the session is terminated. In SQL Server, you can use the sys.dm_db_file_space_usage DMV to learn the size of the tempdb. However, SQL Azure does not provide any public solution to monitor tempdb currently. When the connection is lost due to this reason, you will receive error code 40551. If you get error code 40551, try modifying your query to reduce the temporary table space usage. If you are using temporary objects, conserve space in the tempdb database by dropping temporary objects after they are no longer needed by the session. If possible, truncate your tables or remove any unused tables.
  • Excessive memory usage: When there is memory contention, sessions consuming greater than 16-megabyte (MB) for more than 20 seconds are terminated in the descending order of time the resource has been held, such as the oldest session is terminated first. Termination of sessions stops as soon as the required memory is available. When the connection is lost due to this reason, you will receive error code 40553.
  • Maximum database size: A database will be read-only when it reaches its maximum database size. Transactions attempting to updates or inserts will be terminated when this happens. When the connection is lost due to this reason, you will receive error code 40544 . To resolve this problem, you can partition or delete data, drop indexes, or upgrade to a larger database by using the ALTER DATABASE  statement.
  • Idle connections: Connections to your SQL Azure database that are idle for 30 minutes or longer will be terminated. Since there is no active request, SQL Azure does not return any error.
  • Transaction termination: SQL Azure kills all transactions after they run for 24 hours. If you lose a connection due to this reason, you will receive error code 40549.
  • Denial of Service attacks: When there are a high number of login failures from a particular source internet protocol (IP) address, SQL Azure will block the connections from that IP address for a period of time. In this case, SQL Azure just terminates the connection and does not return any error. In addition, you can configure SQL Azure Firewall  to protect against DoS attacks by allowing access to only the client IP addresses that are defined by the firewall rules.
  • Network problems: Your connection might be disconnected due to various network problems. Hosting your application code in Windows Azure is beneficial to the performance of your application because it minimizes the network latency associated with your application's data requests to SQL Azure. For example, create a Web-based user interface in Windows Azure for your database application that is hosted in SQL Azure. In this scenario, the client code and data is hosted in the same physical data center, so network latency is minimal between the corresponding servers. Clients will still have to traverse the Internet to reach your database application, but the network traffic between the client code and data is much faster because those numerous data requests have a much shorter distance to travel. Note that if a network problem causes a connection-loss, SQL Azure cannot return an error message because the session has already been terminated. However, if you reuse the same connection just like when you use connection pooling, the following error message is returned:
    10053: A transport-level error has occurred when receiving results from the server. (Provider: TCP Provider, Error: 0 - An established connection was aborted by the software in your host machine).
  • Failover problems: SQL Azure is built on proven Windows Server and SQL Server technologies, and is flexible enough to cope with any variations in usage and load. The service replicates multiple redundant copies of your data to multiple physical servers to maintain data availability and business continuity. In the case of a hardware failure, SQL Azure provides automatic failover to optimize availability for your application. Currently, some failover actions result in an abrupt termination of a session and the client application receives generic errors, such as:
    10053: A transport-level error has occurred when receiving results from the server. (Provider: TCP Provider, error: 0 - An established connection was aborted by the software in your host machine)

    or

    40197: The service has encountered an error processing your request. Please try again. Error code %d.

    When this happens, reconnecting to your SQL Azure server will automatically connect you to a healthy copy of your database.

  • Other reasons for connection-loss: Load balancer in SQL Azure ensures the optimal usage of the physical servers and services in the data centers. When the CPU utilization, input/output (I/O) latency, or the number of busy workers for a machine exceeds thresholds, SQL Azure might terminate the transactions and disconnect the sessions. When this happens, you will receive error code 40501. Currently the number of busy workers threshold is set to 300.

Connection-loss Errors

The following table lists the SQL Azure-specific transient errors that might occur when your connection to the service is closed.

Error

Description (message text)

40197

The service has encountered an error processing your request. Please try again. Error code %d.

Note: You will receive this error, when the service is down due to software or hardware upgrades, hardware failures, or any other failover problems. Reconnecting to your SQL Azure server will automatically connect you to a healthy copy of your database.

Note: You may see error codes 40143 and 40166 embedded within the message of error 40197. The error codes 40143 and 40166 provide additional information about the kind of failover that occurred. Do not modify your application to catch error codes 40143 and 40166. Your application should catch 40197 and try reconnecting to SQL Azure until the resources are available and your connection is established again.

40501

The service is currently busy. Retry the request after 10 seconds. Code: %d. For more information on how to decode the returned codes, see SQL Azure Throttling and Reason Codes  .

Note: When error code 40501 is returned, retry to connect to SQL Azure in intervals of 10 seconds until the resources are available and your connection is established again.

40544

The database has reached its size quota. Partition or delete data, drop indexes, or consult the documentation for possible resolutions.

40549

Session is terminated because you have a long-running transaction. Try shortening your transaction.

40550

The session has been terminated because it has acquired too many locks. Try reading or modifying fewer rows in a single transaction.

40551

The session has been terminated because of excessive TEMPDB usage. Try modifying your query to reduce the temporary table space usage.

40552

The session has been terminated because of excessive transaction log space usage. Try modifying fewer rows in a single transaction.

40553

The session has been terminated because of excessive memory usage. Try modifying your query to process fewer rows.

40613

Database '%.*ls' on server '%.*ls' is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of '%.*ls'

In addition, the following table lists some additional transient errors that might occur when using SQL Azure:

Error

Description (message text)

20

The instance of SQL Server you attempted to connect to does not support encryption.

64

A connection was successfully established with the server, but then an error occurred during the login process. (provider: TCP Provider, error: 0 - The specified network name is no longer available.)

233

The client was unable to establish a connection because of an error during connection initialization process before login. Possible causes include the following: the client tried to connect to an unsupported version of SQL Server; the server was too busy to accept new connections; or there was a resource limitation (insufficient memory or maximum allowed connections) on the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)

10053

A transport-level error has occurred when receiving results from the server. An established connection was aborted by the software in your host machine.

10054

A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)

10060

A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 - A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)

For a complete list of errors that can occur only when using Microsoft SQL Azure Database, see Error Messages (SQL Azure Database)  .

Troubleshooting Connection-loss Errors

  • Check the status of the SQL Azure service: To check the status of the SQL Azure service, use theAzure Health Status  page. You can also click the Service Dashboard button on the Windows Azure Support  page to get the status. The status page contains current service status information and historical service data.
  • Trace connections: When the client application connects to SQL Azure, CONTEXT_INFO (Transact-SQL) is set with a unique session specific GUID value automatically. You may retrieve this GUID value and use it in your application to trace the connectivity problems when necessary. For example code, seeDevelopment Considerations in SQL Azure  .
  • Check DMV: SQL Azure enables a subset of DMVs to diagnose the performance problems, which might be caused by blocked or long-running queries, resource bottlenecks, poor query plans, and so on. For a list of supported and unsupported DMVs in SQL Azure, see System Views (SQL Azure Database)  . TheMonitoring SQL Azure Using Dynamic Management Views  topic also provides examples on how to detect common performance problems by using the DMV in SQL Azure.
  • Stress test: To reduce connection-losses from excessive resource usage, you can stress test your database application by slowly increasing the test workload until the desired level of concurrency is reached. When stress testing triggers a connection-loss, identify the resource limits by analyzing the returned reason codes and tune your application appropriately. For more information on reason codes, see Reason Codes  in the SQL Azure Database documentation.
  • Contact customer support: If you think that your connection is terminated when it should not be, contact the Microsoft Customer Support  . Please be prepared to provide the following information when contacting the Microsoft Customer Support:
    • Your subscription ID(s), your server name, and the error message,
    • The time your connection was terminated, and
    • Your session ID, which is returned by the CONTEXT_INFO (Transact-SQL) function. For more information, see Trace Connections section above.

For additional troubleshooting articles, see:

General Best Practices to Prevent Connection-Losses

To prevent the connection-losses, you must manage the application resources properly in your SQL Azure applications. The established connections are the connections that are returning data or the open connections in the connection pool, or the connections being cached in the client side variables. To provide a seamless user experience when a connection is closed, re-establish the connection and then re-execute the failed commands or the query.

The following list provides a list of recommended coding practices when connecting to SQL Azure. These recommended coding practices are not significantly different than the coding practices that apply to on-premise SQL Server.

  • Minimize network latency
    • When creating a SQL Azure server, choose the Microsoft data center location that is closest to you (or users) to avoid extra bandwidth costs and achieve better performance.
    • (use windows azure)
  • Reduce network usage
    • Implement an application-level caching to avoid rendering large data in every database call and also to minimize round trips between your application and SQL Azure.
  • Manage network connection
    • Keep the connection open as short as possible.
    • Set connection timeout to 30 seconds in connection string. If your application cannot establish a connection to SQL Azure longer than 30 seconds and if you want to report a failure, contactMicrosoft Customer Support  .
    • Use connection pooling. The connection pooling increases code efficiency by reducing the number of times that new connections must be opened and closing the idle connections automatically. SQL Azure terminates connections that have been idle for 30 minutes or longer. Implementing connection pooling in your application is beneficial as the connection pooler automatically removes a connection from the pool after it has been idle for a long time. Consider designing your application in a way that connections are opened late and closed early. For more information, see SQL Server Connection Pooling  .
    • Perform all database operations in transactions within a TRY...CATCH block including all exceptions. If an error prevents the successful completion of a transaction, SQL Azure automatically rolls back the transaction and frees all resources held by the transaction. If an exception occurs, retry the connection.
  • Tune T-SQL
    • The following example returns information about the top five queries ranked by average CPU time. This example aggregates the queries according to their query hash, so that logically equivalent queries are grouped by their cumulative resource consumption.
      -- Find top 5 queries
      SELECT TOP 5 query_stats.query_hash AS "Query Hash", 
          SUM(query_stats.total_worker_time) / SUM(query_stats.execution_count) AS "Avg CPU Time",
          MIN(query_stats.statement_text) AS "Statement Text"
      FROM 
          (SELECT QS.*, 
          SUBSTRING(ST.text, (QS.statement_start_offset/2) + 1,
          ((CASE statement_end_offset 
              WHEN -1 THEN DATALENGTH(st.text)
              ELSE QS.statement_end_offset END 
                  - QS.statement_start_offset)/2) + 1) AS statement_text
           FROM sys.dm_exec_query_stats AS QS
           CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) as ST) as query_stats
      GROUP BY query_stats.query_hash
      ORDER BY 2 DESC;
      GO
      
    • Query parameterization limits compiles: Declare the type and length of each parameter as if not specified, a parameter size might cause a cache bloat.
       // if the length is not specified, a cache bloat happens.
      cmd.CommandText = "SELECT c1 FROM dbo.tbl WHERE c2 = @1";
      cmd.Parameters.Add("@1", SqlDbType.NVarChar).Value = "1";
      cmd.Parameters.Add("@1", SqlDbType.NVarChar).Value = "22";
      
      (@1 nvarchar(1)) SELECT  c1 FROM dbo.tbl WHERE c2 = @1
      (@1 nvarchar(2)) SELECT  c1 FROM dbo.tbl WHERE c2 = @1
      
      // if the length is specified, there won’t be any cache bloat. 
      cmd.CommandText = "SELECT c1 FROM dbo.tbl WHERE c2 = @1";
      cmd.Parameters.Add("@1", SqlDbType.NVarChar, 128).Value = "1";
      
      (@1 nvarchar(128)) SELECT  c1 FROM dbo.tbl WHERE c2 = @1
      
    • Transactions and Query Batching:
      • Push query / business / application logic to SQL Azure.
      • Use stored procedures and batching.
      • Limit the number of round-trips to the server. For example, batch 10 statements in one round-trip instead of using ten round-trips.
  • Implement re-try logic in your application.  The SQL Azure TDS Gateway retries connections for about 30 seconds before reporting a failure. If you expect a high volume of application traffic, build re-try logic in your application. If a connection fails, do the followings:
    • Handle idleness and transient disconnections.
    • Re-try to connect to SQL Azure in intervals of 10 seconds until the resources are available and your connection is established again. Depending on your application, database, and network workload, increase the delay time as necessary.
    • Change your workload if a connection is terminated again.

      If a connection is terminated again, look at the error code, find out the real problem, and then try to change your workload. You can implement a queue or a delay mechanism in your client application to reduce your workload.

      One other solution could be to re-design your application and database to remove resource bottlenecks. Ensure that your application do not overload tempdb through excessive DDL or DML operations. In addition, ensure that transactions do not block any resource. When appropriate, consider partitioning your database into multiple databases. For more information on partitioning, see Scaling out with SQL Azure.

    • Implement the following recommended coding practices in your application:
      • Execute transactions in a continuous loop.
      • Catch transient connection termination errors.
      • Pause for a few seconds and then reconnect.

    Windows Server AppFabric Customer Advisory Team developed a helper library to show the best practices for handling transient conditions in SQL Azure client applications. For more information, seeBest Practices for Handling Transient Conditions in SQL Azure Client Applications  . You can download the helper library  from MSDN Code Gallery.

Migration Best Practices to Prevent Connection-Losses

To move data into and out of SQL Azure, you can use SQL Server Integration Services (SSIS), the bulk copy utility (bcp.exe), Microsoft Sync Framework 2.1 (or later), or custom Transact-SQL scripts. For more information on SQL Azure data migration, see Overview of Options for Migrating Data and Schema to SQL Azure.

When uploading data by using custom Transact-SQL scripts (INSERT statements), try to break your upload into multiple transactions to avoid connection-losses. This will ensure to give breaks at some intervals and therefore allows you to control the upload process. If there is any unexpected connection-loss, you can resume importing data starting after the last successful batch. Note that uploading data by using Transact-SQL scripts is recommended only for small data uploads.

To transfer data to SQL Azure, you can use one of the following tools:

  • Bulk copy utility (bcp.exe)  : The bcp utility bulk copies data between SQL Azure (or SQL Server) and a data file in a user-specified format. The bcp utility can be used to import large numbers of new rows into SQL Azure tables or to export data out of tables into data files.
  • SQL Server Integration Services (SSIS)  : Integration Services can extract and transform data from a wide variety of sources such as XML data files, flat files, and relational data sources, and then load the data into one or more destinations.
  • Microsoft Sync Framework 2.1  : Provides synchronization capabilities between on-premise and SQL Azure servers, as well as between two or more SQL Azure databases in the same or different data centers.

When using the bcp.exe utility, all the rows in the data file are imported as one batch by default. Importing a large data file as a single batch can be problematic and might cause connection-losses. To distribute the rows among multiple batches, specify a batch_size that is smaller than the number of rows in the data file. The -b batch_size option allows you to specify the number of rows per batch of imported data. Each batch is imported and logged as a separate transaction that imports the whole batch before being committed. If the transaction for any batch fails, only insertions from the current batch are rolled back. Therefore, batches already imported by committed transactions are unaffected by a later failure. In addition, when uploading very large data, consider dividing your data into multiple sections and upload each section by executing multiple bcp commands simultaneously. To specify the number of bytes per network packet, use –a packet_size option. Increased packet size can enhance performance of bulk-copy operations. To avoid splitting data physically, you can use –F and –L options to specify the first and last rows for the upload.

When using SQL Server Integration Services (SSIS) to upload very large data, consider dividing your data into multiple files on the file system and upload each file by executing multiple streams simultaneously. For an example analysis of SSIS and BCP, see Loading Data to SQL Azure  blog post at SQL CAT Blogs.

Important: Before uploading large data to SQL Azure:

  • Perform pilot testing and determine the best bcp –b batch_size and –a packet_size values for your application.
  • Consider reducing the number of indexes on the server side.
  • If possible, disable the indexes and database constraints on the server side. After upload is done, you can enable these indexes and database constraints.

For more information about migration options for SQL Azure, see Migrating Databases to SQL Azure  .