Note on <Beginning Microsoft SQL Server 2012 Programming> - 02

41 篇文章 0 订阅
14 篇文章 0 订阅



Chapter 5: Creating and Altering Tables


Object Names in SQL Server

There are actually four levels in the naming convention for any SQL Server table (and any other SQL Server object for that matter). A fully qualified name is as follows:

[ServerName.[DatabaseName.[SchemaName.]]]ObjectName


  • In addition to naming other databases on the server you're connected to, you can also "link" to another server. Linked servers give you the capability to perform a JOIN across multiple servers — even different types of servers (SQL Server, Oracle, DB2, Access — just about anything with an OLE DB provider).
  • The current database (as determined by the USE command or in the dropdown box if you're using the SQL Server Management Studio) is always the default, so, if you want data from only the current database, you do not need to include the database name in your fully qualified name.
  • If you want to access an object that is not in your default schema (set on a login-by-login basis), you'll need to specifically state the schema name of your object.
  • Reviewing the Defaults:
    1. Object Name: There isn't a default — you must supply an object name.
    2. Ownership: You can leave this off, in which case it will resolve first using the current user's name, and then, if the object name in question doesn't exist with the current user as owner, it will try the dbo as the owner.
    3. Database Name: This can also be left off unless you are providing a Server Name — in which case you must provide the Database Name for SQL Servers (other server types vary depending on the specific kind of server).
    4. Server Name: You can provide the name of a linked server here, but most of the time you'll just leave this off, which will cause SQL Server to default to the server you are logged in to.


Options for Creating Database



CONTAINMENT

ON
Have something to do with filegroup.
NAME
For the logical file name.
FILENAME

SIZE

MAXSIZE
The slight twist is that there is no firm default. If you don't supply a value for this parameter, there is considered to be no maximum — the practical maximum becomes when your disk drive is full.
FILEGROWTH
You provide a value that indicates by how many bytes (in KB, MB, GB, or TB) at a time you want the file to be enlarged, or alternatively, a percentage value.
LOG ON

COLLATE
 
FOR ATTACH
 
WITH DB CHAINING ON|OFF
 
TRUSTWORTHY
 
C
 
OE
 
CE
 
CE
 


What the author said:
If your log reaches its maximum size, you will not be able to perform any logged activity (which is most activities) in the database. Personally, I recommend setting up what is called an alert. You can use alerts to tell you when certain conditions exist (such as a database or log that's almost full).

SQL Server provides a command especially tailored for database structure information, and often provides better information if you're more interested in the database itself than in the objects it contains. sp_helpdb takes one parameter — the database name:

EXEC sp_helpdb '<db_name>'



Options for Creating Table



Data Types
DEFAULT
IDENTITY
NOT FOR REPLICATIONIt determines whether a new identity value for the new database is assigned when the column is published to another database (via replication), or whether it keeps its existing value.
ROWGUIDCOLGUID is considered to be unique across both space & time. It comes in play when you want to keep the values in identity columns from overlapping on separate database during replication.
COLLATE
NULL/NOT NULLThere is, however, a very large number of settings that can affect this default and change its behavior. For example, setting a value by using the sp_dbcmptlevel stored procedure or setting ANSI-compliance options can change this value.
ON
TEXTIMAGE_ONThis clause is valid only if your table definition has large column(s) in it, including the following:
  • text or ntext
  • image
  • xml
  • varchar(max) or nvarchar(max)
  • varbinary(max)
  • Any CLR user-defi ned type columns (including geometry and geography)

When you use the TEXTIMAGE_ON clause, you move only the BLOB information into the separate filegroup — the rest of the table stays either on the default filegroup or with the filegroup chosen in the ON clause. There can be some serious performance increases to be had by splitting your database into multiple fi les, and then storing those fi les on separate physical disks. When you do this, it means you get the I/O from both drives.


IDENTITY Column

  • when you make a column an identity column, SQL Server automatically assigns a sequenced number to this column with every row you insert. The number that SQL Server starts counting from is called the seed value, and the amount that the value increases or decreases by with each row is called the increment.
  • An identity column must be numeric, and, in practice, it is almost always implemented with an integer or bigint data type. 
  • An identity option cannot be used in conjunction with a default constraint.
  • An IDENTITY column and a PRIMARY KEY are completely separate notions — that is, just because you have an IDENTITY column doesn't mean that the value is unique (for example, you can reset the seed value and count back up through values you've used before). IDENTITY values are typically used as the PRIMARY KEY column, but they don't have to be used that way.
  • If you want to fill in blank spaces like that, you need to use SET IDENTITY_INSERT ON, which allows you to turn off the identity process for inserts from the current connection.


Computed Column

<column name> AS <computed column expression>

The first item is a alias that you're going to use to refer to the value that is computed. The expression can be any normal expression that uses either literals or column values from the same tables.


  • You cannot use a subquery, and the values cannot come from a different table.
  • You cannot directly specify the data type of a computed column; it is implicitly of whatever type the expression produces. That said, you can use CAST or CONVERT as part of your expression to explicitly impose a type on the result.
  • In SQL Server 2000 and earlier, you could not use a computed column as any part of any key (primary, foreign, or unique) or with a default constraint. For SQL Server 2005 and later, you can now use a computed column in constraints (you must flag the computed column as persisted if you do this, however).
  • Special steps must be taken if you want to create indexes on computed columns.


To verify that the table was indeed created, and that it has all the columns and types that you expect. you can execute:

EXEC sp_help <object name>


VARCHAR vs. CHAR:


  • To recapture the space of a column that is defined somewhat longer than the actual data usually is (retrieve blank space)
  • To simplify searches in the WHERE clause — fixed-length columns are padded with spaces, which requires extra planning when performing comparisons against fields of this type


For the code that you write directly in T-SQL, SQL Server automatically adjusts to the padded spaces issue, say you have:

@A CHAR(5) = 'XX'
@B VARCHAR(5) = 'XX'


@A is treated as being equal (if compared) to @B, this is not, however, true in your client APIs such as SqlNativeClient and ADO.NET. But VARCHAR will automatically have any trailing spaces trimmed, and comparing it to 'xx' in ADO.NET will evaluate to True.


ALTER TABLE

  • If you want to add a NOT NULL column after the fact, you have the issue of what to do with rows that already have NULL values. You can provide a default value in such case.
  • If you want to move a column to the middle, you need to create a completely new table (with a different name), copy the data over to the new table, DROP the existing table, and then rename the new one.
  • You should be aware that even if your view is built using a SELECT * as its base statement, your new column will not appear in your view until you rebuild the view. Column names in views are resolved at the time the view is created for performance reasons. That means any views that have already been created when you add your columns have already resolved using the previous column list — you must either DROP and re-create the view or use an ALTER VIEW statement to rebuild it.

GUI

Database Diagrams Node


What the author said:
If you're going to use the schema features at all, I highly recommend using two-part naming (schema and table name) in all of your queries. If you're not utilizing different schemas in your database design, it's fine to leave them off (and make your code a fair amount more readable in the process), but keep in mind there may be a price to pay if later you start using schemas and suddenly have to update all your old code to two-part naming.


What is Schema?

  • What you see referred to in SQL Server 2012 and other databases such as Oracle as "schema" was usually referred to as "owner" in SQL Server 2000 and prior.
  • What is now "schema" is something that overlaps with an older concept called "ownership".
  • The schema feature in SQL Server 2012 is now purely organizational and shouldn't be confused with ownership any longer.
  • Users can still own schemas or have a default schema, but the schema is not the same thing as the owner.
  • By default, only users who are members of the sysadmin system role, or the db_owner or db_ddladmin database roles, can create objects in a database.


It used to be:



And now it is:



  • Whoever creates the database is considered to be the "database owner", or dbo.
  • There is a role named "sysadmin", members of the sysadmin role always alias to the dbo. That is, no matter who actually owns the database, a member of sysadmin will always have full access as if it were the dbo, and any objects created by a member of sysadmin will, unless explicitly defined otherwise, show ownership belonging to the dbo.
  • Besides sysadmin role, there is a role named "db_owner" (it is not dbo, different concept completely), objects created by members of the db_owner database role do not default to dbo as the default schema — they will be assigned to whatever that particular user has set as the default schema.
  • By default, only users who are members of the sysadmin system role, or the db_owner or db_ddladmin database roles, can create objects in a database.
  • Any objects that a dbo creates within that database shall be listed with a schema of dbo rather than that person's individual username.
  • Individual users can also be given the right to create certain types of database and system objects. If such individuals do indeed create an object, then, by default, that object will be assigned to whatever schema is listed as default for that login.

Consider the scenarios:

  • Scenario 1: An individual with login name "MySchema", has been granted CREATE TABLE authority to a given database. SQL Server needs you to provide schema-qualified name to resolve the table name, which is MySchema.MyTable.
  • Scenario 2: A user belonging to a Windows domain group called Group1, with name "Group1Member" and a default schema of Group1, creates a new table called "MyTable", it must be referenced as Group1.MyTable.
  • Scenario 3: The database owner, or any member of sysadmin role (as opposed to just any member of db_owner), with login name "Fred", creates a new table called "MyTable", it can be referenced as dbo.MyTable, and in addition, as dbo also happens to be the default owner, any user could just refer to Fred's table as MyTable.





































Chapter 7: Adding more to your queries


Building a Nested Subquery

A nested subquery is one that goes in only one direction — returning either a single value for use in the outer query, or perhaps a full list of values to be used with the IN operator.

SELECT <SELECT list>
FROM <SomeTable>
WHERE <SomeColumn> = (
	SELECT <single column>
	FROM <SomeTable>
	WHERE <condition that results in only one row returned>)
SELECT <SELECT list>
FROM <SomeTable>
WHERE <SomeColumn> IN (
	SELECT <single column>
	FROM <SomeTable>
	[WHERE <condition>)]


For performance reasons, you want to use the join method as your default solution if you don’t have a specific reason for using the nested SELECT, however, SQL Server is actually smart enough to resolve the nested subquery solution to the same query plan it would use on the join. So the truth is that most of the time, there really isn’t that much diff erence.


The following two queries are working equivalently:

SELECT ProductID, Name
FROM Production.Product
WHERE ProductID IN (
	SELECT ProductID FROM Sales.SpecialOfferProduct);


SELECT DISTINCT pp.ProductID, Name
FROM Production.Product pp
JOIN Sales.SpecialOfferProduct ssop
ON pp.ProductID = ssop.ProductID;


NOT IN is a more intuitive way to find orphaned records.


Building Correlated Subqueries

Under correlated subqueries, the information travels in two directions rather than one, the inner query runs on information provided by the outer query, and vice versa. And it can show up in WHERE and SELECT clause.

Process:

  1. The outer query obtains a record and passes it into the inner query.
  2. The inner query executes based on the passed-in value(s).
  3. The inner query then passes the values from its results back to the outer query, which uses them to finish its processing.

Correlated Sub-queries in the WHERE Clause

SELECT <select list>
FROM (<query that returns a regular resultset>) AS <outer alias name>
WHERE col = (<query that return a single value, and refers to <outer alias name> in an equvalent comparision expression>)

Correlated Sub-queries in the SELECT List

SELECT <select list>,
(<query that return a single value, and refers to <outer alias name> in an equvalent comparision expression>) AS <col alias name>
FROM (<query that returns a regular resultset>) AS <outer alias name>


Derived Tables

Put it simple, when you use a JOIN operator on your subquery, you are there. So it usually shows up in the FROM clause, led by a JOIN operator.

To create a derived table, you need to do two things:

  • Enclose the query that generates the result set in parentheses
  • Alias the results of the query, so the alias can be referenced as a table


Its syntax looks like:

SELECT <select list>
FROM (<query that returns a regular resultset>) AS <alias name>
JOIN <some other base or derived table>


Common Table Expressions (CTES)

In its simplest use, a CTE is like a derived table that’s declared up front rather than inline. Rather than trying to arrange your derived table’s indenting and parentheses so it’s clearly set apart from the rest of your joins (or WHERE conditions, or however you’re using it), the CTE way of doing this puts the derived table’s definition at the top of your query. Once you've defined a CTE, you can use it as many times as you like within the current statement.


The syntax looks like:

WITH <expression_name> [ ( column_name [ ,...n ] ) ]
	AS
	( CTE_query_definition )
	[, <another_expression>] 
<query>


And as that expression shows, you can define multiple CTEs with one 'with', what is more, each one can use (as a part of its definition) any CTE defined earlier in the statement. 

WITH CustomerTerritory AS (
	
), MyCTE AS ( 
	SELECT ...
	FROM ...
	JOIN CustomerTerritory
	ON ...
) 
SELECT DISTINCT Rear.SomeField, Rear.SomeField 
FROM MyCTE Rear
JOIN MyCTE Front
ON Rear.SomeField = Front.SomeField 
WHERE Rear.SomeField = '' AND Front.SomeField = '';


How is recursive CTE like?


EXISTS Operator

Two ways to use it:

  • Filtering in JOIN:
    SELECT Field1, Field2
    FROM Table1 t1 
    WHERE [NOT] EXISTS
    	(
    		SELECT Field3
    		FROM Table2 t2
    		WHERE t2.SomeField = t1.SomeField
    	
    	);
    It is usually faster than JOIN.
  • Expression in condition control:
    IF [NOT] EXISTS
    	(
    	...
    	)
    BEGIN
    	..
    END
    GO


CAST And CONVERT

Both CAST and CONVERT perform data type conversions for you. In most respects, they both do the same thing, with the exception that CONVERT also does some date-formatting conversions that CAST doesn't offer.

CAST (expression AS data_type) 
CONVERT(data_type, expression[, style])


Notice that CAST can still do date conversion; you just don’t have any control over the formatting as you do with CONVERT.


MERGE Command

(skip)


Windowing Function


Desc
ROW_NUMBERIt simply outputs a unique, incrementing value for each returned row.
RANKAllows multiple rows to have the same value if their order value is the same, but resumes counting at the ROW_NUMBER value; for example, where the first rank is an n-way tie, the second rank starts with a RANK value of n+1.
DENSE_RANKAlso keeps a matching value for matching order values, but the ranks always increment; for example, the second rank is always 2 regardless of an n-way tie for first.
NTILEDivides the total result into x categories, and ranks them from 1-x; thus NTILE(4) gives the fi rst quarter of the results a 1, the second quarter a 2, and so on.




Syntax:

SELECT a.SomeField,
	ROW_NUMBER() OVER (ORDER BY a.FieldA) AS 'Row Number',
	RANK() OVER (ORDER BY a.FieldA) AS 'Rank',
	DENSE_RANK() OVER (ORDER BY a.FieldA) AS 'Dense Rank',
	NTILE(4) OVER (ORDER BY a.FieldA) AS 'Quartile'
FROM TableA a
WHERE <condition>


PARTITION BY tells the function to reset its count when a column (or combination of columns) changes value.

SELECT SomeField,
	ROW_NUMBER() OVER (PARTITION BY FieldA ORDER BY FieldB DESC) AS 'Row Number',
	OtherField
FROM
WHERE


In the case where you want to use the result ROW_NUMBER() output as a criteria to filter your query result, namely use its value as a condition, you may want to compare it with a value in WHERE clause, like:

... AND ROW_NUMBER() OVER (PARTITION BY PostalCode ORDER BY s.SalesYTD DESC) = 1;

However, that won't work, what you should do is place the windowed output within a CTE:

WITH Ranked AS (
	SELECT SomeField,
		ROW_NUMBER() OVER (PARTITION BY FieldA ORDER BY FieldB DESC) AS 'Row Number',
		OtherFields
	FROM 
	WHERE 
)
SELECT SomeField
FROM Ranked
WHERE [Row Number] = 1;


ONLY AVAILABLE AFTER SQL SERVER 2012 
SELECT SomeField
FROM TableA
ORDER BY FieldA
OFFSET 20 ROWS
FETCH NEXT 20 ROWS ONLY
Restrictions:
  • ORDER BY is required to use an OFFSET ... FETCH clause.
  • Although OFFSET can be used without FETCH, FETCH cannot be used without OFFSET.
  • You can’t use SELECT TOP with OFFSET ... FETCH.
  • Although you can use arithmetic or variables to determine how many rows to offset or fetch, you cannot use a scalar subquery.


Performance Considerations

How to measure?

  • For rough numbers, just run the query.
  • Have a brief look at a query plan.
  • Using SET STATISTICS TIME ON and/or SET STATISTICS IO ON, you can get numeric ➤ values for time, disk reads, and other actual values.

Remember this when comparing two versions of a query: make sure you’re testing with either the data in cache for both or empty for both, or you may fool yourself into using the wrong version

SITUATIONFAVORS
The value returned from a subquery is going to be the same for every row in the outer query.Pre-query. Declaring a variable and then selecting the needed value into that variable will allow the would-be subquery to be executed just once, rather than once for every record in the outer table.
Both tables are relatively small (say 10,000 records or fewer).
Subqueries. I don't know the exact reasons, but I've run several tests on this and it held up pretty much every time. I suspect that the issue is the lower overhead of a lookup versus a join.
The match, after considering all criteria, is going to return only one value.
Subqueries. Again, there is much less overhead in going and finding just one record and substituting it than having to join the entire table.
The match, after considering all criteria, is going to return relatively few values, and there is no index on the lookup column.
Subqueries. A single lookup or even a few lookups will usually take less overhead than a hash join.
The lookup table is relatively small, but the base table is large.
Nested subqueries, if applicable; joins or CTEs if versus a correlated subquery. With subqueries, the lookup will happen only once and is relatively low overhead. With correlated subqueries, however, you will be cycling the lookup many times — in this case, the join is a better choice.
Correlated subquery versus join versus CTE
CTE or join, based on readability. Internally, a correlated subquery creates a nested-loop situation. This can create quite a bit of overhead. It is substantially faster than cursors in most instances, but slower than other options that might be available. Use a join if it's simple, or a CTE if the join obfuscates the query's meaning.
Derived tables versus whatever
Derived tables typically carry a fair amount of overhead to them, so proceed with caution. The thing to remember is that they are run (derived, if you will) once, and then they are in memory, so most of the overhead is in the initial creation and the lack of indexes (in larger result sets). They can be fast or slow — it just depends. Think before coding on these.
EXISTS versus whatever
EXISTS. EXIST does not have to deal with multiple lookups for the same match. Once it finds one match for that particular row, it is free to move on to the next lookup — this can seriously cut down on overhead.





  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值