T-SQL编程10大陋习（Top 10 T-SQL Code Smells）

最新推荐文章于 2024-09-08 08:52:05 发布

武哥博文

最新推荐文章于 2024-09-08 08:52:05 发布

阅读量1k

点赞数

文章标签：编程 database transactions refactoring sql server insert

Top 10 T-SQL Code Smells

I enjoy forums, and lately I've been really enjoying Stack Overflow. The few, but important enhancements the Stack Overflow team has made to the forum notion, such as easily ranking solutions, separating solutions and commentary into separate channels, and so on, make it much more usable and interesting than a traditional forum. In an environment like SO it's interesting to see all the posters' varying skill sets all mashed up together, from people you can see are adapting what they know to a new problem, to obvious newbies, to those who clearly are not "getting" some concept. I always learn a lot both by reading and by posting the odd solution.

Then again, there are multiple solutions to every problem. Many, maybe hundreds, of solutions are correct in the narrow sense of producing the right result. Of those many correct solutions, though, some stand out as clearer, or more elegant, or more flexible, and hopefully one wants to surface those even above the ones that are just correct. This is something I learned when I was in Architecture school [too many] years ago. A complex problem might have a million solutions, a thousand good solutions, and a handful of really exceptional solutions.

"Code Smell" has become a fashionable term for this, but you can call it whatever you like -- best practice, symptoms of design problems, refactoring opportunities. So, what is it that makes the difference? In SQL, code smells seem to emanate mainly from code that:

Defeats the optimizer. The query optimizer does a good job most of the time, though it can make errors occasionally. However, un-optimizable code -- code that defeats the optimizer or prevents it from working at all -- seems like a bad idea right across the board.
Defeats the transactional properties of the server. Same idea: transactions and ACID are good, and you already paid for them if you have SQL Server, so why just turn around and circumvent them?
Defeats other properties of the SQL language that would be useful, such as closure.
Assumes every operation will always succeed.
Bends SQL to some task for which it is not at all suited.
Works around or optimizes for a problem that doesn't actually exist.

In any case, these are some things I watch out for, and when I find them, I try look deeper and see if a problem is lurking about. There could be a problem, or perhaps not. In no particular order:

Loops. Loops of any kind, cursors or otherwise, are definitely code smell in SQL. Sometimes they are genuinely needed, sometimes they provide a legitimate workaround for a problem that really can't be done with sets, but very often they are just a nice Update or Select statement waiting to be written. In C# iterating over objects is normal. In SQL iterating over rows probably means you are doing something wrong. (There are some exceptions, like the well-known Running Totals problem, etc.)

Loops basically defeat the optimizer. If you have a nice 32 core box with 128 GB of RAM (whatever) the optimizer would happily, automatically rewrite and parallelize many queries for you, at no charge. Explicitly looping means you've probably just blown the opportunity to use that. WHILE is not really better than a cursor; that's a myth. This gets filed under Defeating the Optimizer.
CREATE PROCEDURE Foo AS BEGIN INSERT INTO #myTemp EXEC Bar ... CREATE PROCEDURE Bar AS BEGIN INSERT INTO #myTemp2 EXEC Grok ... CREATE PROCEDURE Grok ... (see a pattern here?).

INSERT ... EXEC is a workaround that indicates the stored procedure probably ought to have been a view (or just a plain ol' select statement, or derived table) and not a stored procedure. File under both Closure and Defeating the Optimizer. By chunking a process up into nested procedures and temp tables, you limit the opportunity to optimize across the whole operation. Unless you have some ultra-performant system and you need all the esoteric features of forced plans around stored procs for your select statement, and most people don't, delivering results from views gives a lot of benefit. Views are "expanded" by the optimizer, and the thing that they return is defined as a predictable, single result set. They can easily be nested, while procs can't. It's what views were designed for right from the beginning.
CREATE PROCEDURE mySproc AS BEGIN SELECT ... FROM ... WHERE ... END.

Procs that have no other purpose than to return a result set should probably be views. Closure again. The reason is that the "shape" of what is returned from a stored procedure is undefined. It could be any result set, or several, of any form. I have a hunch this is why the old system procs like "sp_help" are being turned into Dynamic Management Views as we move to new versions of SQL Server. And thank goodness for that. Views still allow you to encapsulate logic and refactor later, and have essentially the same security advantages as Procs as long as your app parameterizes queries to protect from SQL injection. Elaborate chains of IF statements inside a proc are a further indication that something is amiss.
Three-part object names. Four-part object names. For that matter, One-part object names. (Hmm, maybe that's three items.) If your app uses five databases instead of one, and executes cross-database queries, that's a design smell. Are those various databases transactionally consistent? Do they need to be? How would database mirroring or log shipping work for you, when one DB fails over to another box? How yould you change the name of one of the databases, if that name is sprinkled about in your code? Or a linked server name? Multiple databases like this, in my experience, usually want to be either schemas inside one database, if they need to be consistent, or completely separate databases that the app connects to independently, for the purpose of sharding or scale-out, if they don't. A connection string from an app specifies server and database, and it's best to stick to that model, and not hop from one database to another at the SQL Server end of the pipe.

One-part names, with no schema qualifier, will get to be a problem later, after you've combined all those databases into one database with different schemas :-).
Fishing around inside a column for values, as in ... WHERE SUBSTRING( t.myColumn, 1, 3 ) = 'ABC'. Sometimes you need to peek into a column, but it should be the exception and not the rule. Are the values you're storing atomic?
EXEC xp_cmdshell. Even the FBI says this is a bad idea. No kidding. SQL Server is not good at OS operations. If you must go there, at least go with CLR.
Roll-your-own locking. I've seen a few apps that use values within tables to "flag" items as locked. All I can say is, please get to a phone and call the American Society of Wheel Reinventors if you think you're going to write and then maintain a better locking mechanism than the one that's already baked into SQL Server. Seriously.

[Here I mean short-duration, normal, transactional SQL locking, not other types of longer-term "locks" that might be needed for an app to flag items as in progress, or for workflow, etc. Those situations need values in tables AND, incidentally, short SQL transactions just at the moment the data is actually modified. No, I wouldn't want to leave a SQL transaction open for two days while an invoice gets approved...]
SELECT * FROM <whatever>. INSERT INTO myTable VALUES ( 1, 2, 3 ). Depending on the order or number of columns is always a bad idea. Put in a column list; it's worth it. Corollary: figuring out how to re-order the columns in an existing table. The column order should not matter, ever.
Code having no error handling flow control or transactions. If you don't see BEGIN TRANSACTION or BEGIN TRY or even @@ERROR, then you definitely have some issues to investigate.
Code that drops and creates objects in the database as part of the normal operation of an app. A database schema should be pretty static, with DDL (e.g. Drop Table, Create Table, etc.) happening only in installers and patches. If you find a stored proc dropping and recreating tables it's definitely something to check over. Are you losing indexes and statistics when those objects disappear? Permissions? What if the DBA has has to change the indexing to tune for performance?

I also dislike hungarian notation (dbo.tbl_MyTable, dbo.vw_trans), but if I am honest I have to admit that is really a personal stylistic preference and can't be Smell Number 11. Plus, it's a Top 10 list, right? Anyway, if you think I'm right or wrong with these, let me know. I'm always eager to learn from someone else.

["Code Smell," originally from Refactoring: Improving the Design of Existing Code by Martin Fowler, et. all, has become a fashionable term for correct-but-not-optimal OO code.]

Edit: #7 I intended to refer to roll-your-own transactional locking, distinct from longer-timespan "business" locks or flags that apps legitimately use to mark items as in progress, or for control of workflows, etc. Thanks to commenters for pointing that out.

武哥博文

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
T-SQL编程10大陋习（Top 10 T-SQL Code Smells）

Top 10 T-SQL Code SmellsI enjoy forums, and lately Ive been really enjoying Stack Overflow. The few, but important enhancements the Stack Overflow team has made to the forum notion, such as easil
复制链接

扫一扫