The MySQL SET Datatype

 Description

One of the least commonly used of the MySQL datatypes has to be the SET datatype. The SET datatype is a string type, but is often referred to as a complex type due to the increased complexity involved in implementing them. A SET datatype can hold any number of strings from a predefined list of strings specified during table creation. The SET datatype is similar to the ENUM datatype in that they both work with predefined sets of strings, but where the ENUM datatype restricts you to a single member of the set of predefined strings, the SET datatype allows you to store any of the values together, from none to all of them.
Storage

The MySQL SET datatype is stored as an integer value within the MySQL tables, and occupies from one to eight bytes, depending on the number of elements available.
Storage Required for MySQL SET Datatype
Number of Elements     Number of Bytes
1 - 8     1
9 - 16     2
17 - 24     3
25 - 32     4
33 - 64     8

The SET elements are stored in the MySQL table as a bitmap: each element is represented by a single bit. To demonstrate this, lets look at the following sample table, which will be the basis for the rest of this tutorial. (PLEASE NOTE: In the examples that follow, there are never whitespaces between elements and their comma seperators. Also note that only during the definition of a set are the elements individually wrapped in single quotes, otherwise the set as a whole gets only one set of quotes. Please do likewise.)

CREATE TABLE set_test(
  rowid INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  myset SET('Travel','Sports','Dancing','Fine Dining')
  );

Each element in our set is assigned a single bit in our bitmap (note that in spite of there only being 4 elements our set will still occupy 1 byte). If our row incorporates a given element, the associated bit will be a one. Because of this storage approach, each element also has an associated decimal value.
Element     SET Value     Decimal Value
Travel     00000001     1
Sports     00000010     2
Dancing     00000100     4
Fine Dining     00001000     8

Multiple elements can be represented by adding their decimal values (for example, a person’s interests in Travel and Fine Dining can be represented as 00001001 or the decimal value of 8+ 1 = 9).
Why You Shouldn’t Use SET

The MySQL SET datatype is not commonly used for a few reasons; First, using the MySQL SET datatype limits you to 64 elements. While you could get around this by using multiple SETs, this still represents a loss of versatility. Second, you cannot include commas in your set elements, as the comma is an element separator. Third, using a set means your data is not normalized. In our above example, we are tracking a person’s interests for a hypothetical dating site. In a normalized schema, there should be three tables: one for the person, one for all possible interests, and one that links a person to their particular interests. Fourth, an INDEX on a set datatype is going to refer to the set as a whole and will not be used for searching individual elements (this may or may not be a problem for certain applications).

So why do we use the MySQL SET datatype? Well there are a few reasons; The MySQL SET datatype allows us to handle multiple values with a lot less trouble than if we had our table fully normalized. Our schema is simplified because we only need one column instead of three tables to store our list of interests. The MySQL SET datatype allows us to compare multiple values without using complex JOIN operations. We can manipulate the set with binary functions to do complex comparisons by comparing bit values on a single column instead of comparing multiple rows of multiple tables.

You may find that the MySQL SET datatype is something that you never need to use. If you find that it’s advantages outweigh the disadvantages, read on to discover how to use it.
INSERTing SET Data

There are two ways to INSERT data into MySQL SET columns: by decimal value or by using strings. To INSERT the combination of Travel and Sports into our sample tables, we can use the following:

INSERT INTO set_test(myset) VALUES(3);
  INSERT INTO set_test(myset) VALUES('Travel,Sports');

Because the decimal value of Travel is 1 and the decimal value of Sports is 2, the combination of the two can be inserted at the sum of the decimal values, or 3. Another example of matching INSERT statements:

INSERT INTO set_test(myset) VALUES(7);
  INSERT INTO set_test(myset) VALUES('Travel,Sports,Dancing');

UPDATEing SET Data

There are three kinds of UPDATE that we need to deal with: complete replacement of the SET data, addition of set members, and removal of set members. We will deal with these in the order listed above.

To completely replace the data in a SET, we simply specify the new values:

UPDATE set_test SET myset = 'Travel,Dining' WHERE rowid = 1;
  UPDATE set_test SET myset = 2 WHERE rowid = 2;

To add an element to an existing SET we use the CONCAT() function to add the new element to our comma separated list. To work with decimal values, we can use the bitwise OR operator | .

UPDATE set_test SET myset = CONCAT(myset,",Travel")
  WHERE rowid = 3;
  UPDATE set_test SET myset = myset | 1 WHERE rowid = 3;

We can also use the CONCAT_WS() function, which handles list separators for us:

UPDATE set_test SET myset = CONCAT_WS(",",myset,"Dancing")
  WHERE rowid = 6;

To remove set elements from an existing set, we use the REPLACE function to remove the element. If using decimal values, we use a combination of the bitwise AND operator & with the bitwise NOT operator ~ .

UPDATE set_test SET myset = REPLACE(myset,"Dancing","")
  WHERE rowid = 6;
  UPDATE set_test SET myset = myset & ~2 WHERE rowid = 6;

With the bitwise operators we can add and remove multiple items at once by using the sum of the decimal values of the elements we wish to remove.
SELECTing SET Data

There are several ways we can select data from a SET column, but they come down to two distinct types: SELECT queries that have a specific WHERE clause, and SELECT queries that have pattern matching in the WHERE clause.

Before discussing the different WHERE clauses, it would probably be good to cover what the data will look like as it is returned from the SELECT query. If the column is merely SELECTed, you will get a comma separated list of the elements involved in your SET. If you add a +0 to the column name, you will see the integer values in the columns.

mysql> SELECT rowid, myset, myset+0 FROM set_test;
+-------+-----------------------+---------+
| rowid | myset                | myset+0 |
+-------+-----------------------+---------+
|    1 | Sports                |      2 |
|    2 | Travel,Sports        |      3 |
|    3 | Travel,Dancing        |      5 |
|    4 | Travel,Sports        |      3 |
|    5 | Travel,Sports,Dancing |      7 |
|    6 | Travel,Dancing        |      5 |
|    7 | Sports                |      2 |
|    8 | Travel,Dancing        |      5 |
+-------+-----------------------+---------+
8 rows in set (0.00 sec)

All of the SELECTs we will be doing will be based on one of these two values.

Lets look at WHERE clauses with specific values. When searching against a specific set of values, we can specify them in our where clause:

SELECT * FROM set_test WHERE myset = 'Travel,Sports';
  SELECT * FROM set_test WHERE myset = 3;

To check for items with a specific element, we can use a couple of methods:

SELECT * FROM set_test WHERE myset LIKE '%Sports%';
  SELECT * FROM set_test WHERE myset & 2;

It should be noted that the bitwise AND exhibits an OR-like behavior when used with combinations of values. Take for example the following query, which involves Travel (1) and Sports (2).

SELECT * FROM set_test WHERE myset & 3;

This query actually returns rows that contain Travel OR Sports, and not just rows that contain both. For rows with have both Travel AND Sports, we need to also check the result of our bitwise AND (&)operation against the value we are performing the bitwise AND (&) against, or use an AND clause in our query:

SELECT * FROM set_test WHERE myset & 3 = 3;
SELECT * FROM set_test WHERE myset & 1 AND myset
  & 2;

This returns all rows that contain Travel AND contain Sports, and also returns the combination of the two. The & 3 = 3 syntax is preferable as it does not require an increasing number of AND clauses.

We can also use multiple wildcard queries like the following:

SELECT * FROM set_test WHERE myset LIKE '%Travel%Sports%';

It should be noted that the elements must be in order (that is, in the order they are listed in the initial table creation) as order is important in the multiple wildcard queries. For example %Travel%Sports% will return rows, but %Sports%Travel% will not.

Another way to retrieve rows is to use the FIND_IN_SET() function as follows:

SELECT * FROM set_test WHERE FIND_IN_SET('Sports',myset) > 0;

FIND_IN_SET() will return 0 if there is no match and a number greater than zero if there is a match. NULL will be returned if either the set or the search string is NULL.
Clever Bit Tricks

We can take advantage of the bitmap approach to storing SET data to rank our users in order of matching interests to help our users find people who best match their own interests, thus allowing you to find users who also like Sports, Travel, and Dancing (or at least like some of the things you do).

Here how it works: each of our users has their interests stored in the SET datatype defined above. We want to rank their fellow users by how well their interests overlap. First lets look at the interests in binary form using the BIN() function:

mysql> SELECT rowid, myset, BIN(myset+0) FROM set_test;
+-------+-----------------------------------------------+--------------+
| rowid | myset                                                  | BIN(myset+0) |
+-------+-----------------------------------------------+--------------+
|    1    | Travel,Sports,Dancing                      | 111          |
|    2    | Travel,Dancing                                    | 101          |
|    3    | Travel                                                    | 1              |
|    4    | Dancing                                                | 100          |
|    5    | Dancing                                                | 100          |
|    6    | Sports,Dancing                                  | 110          |
|    7    | Travel,Sports,Dancing,Fine Dining | 1111        |
|    8    | Travel,Fine Dining                              | 1001        |
|    9    | Sports,Fine Dining                            | 1010        |
|    10  |  Travel,Dancing,Fine Dining            | 1101        |
+-------+-----------------------------------------------+-------------+
10 rows in set (0.00 sec)

What we want to do is order our table by the number of matching interests. For our example we will order our users by how well they match with user 10, who is interested in Travel, Dancing, and Fine Dining

mysql> SELECT A.rowid, A.myset, BIN(A.myset+0 & B.myset+0)
    -> FROM set_test AS A, set_test AS B
    -> WHERE b.rowid = 8
    -> AND A.rowid != 8
    -> ORDER BY BIT_COUNT(A.myset & B.myset) DESC;
+-------+-----------------------------------+----------------------------+
| rowid | myset                            | BIN(A.myset+0 & B.myset+0) |
+-------+-----------------------------------+----------------------------+
|    7 | Travel,Sports,Dancing,Fine Dining | 1001                      |
|    10 | Travel,Dancing,Fine Dining        | 1001                      |
|    1 | Travel,Sports,Dancing            | 1                          |
|    2 | Travel,Dancing                    | 1                          |
|    3 | Travel                            | 1                          |
|    9 | Sports,Fine Dining                | 1000                      |
|    4 | Dancing                          | 0                          |
|    5 | Dancing                          | 0                          |
|    6 | Sports,Dancing                    | 0                          |
+-------+-----------------------------------+----------------------------+
9 rows in set (0.00 sec)

This results in our users being organized by how well they match with user 8. The BIT_COUNT() function in the above example will return the number of set bits in the expression passed to it. Our first two matches (users 7 and 10) have two overlapping interests, the next four have one match, and the remaining three users have no overlapping interests (they could be trimmed out by adding the following to the WHERE clause:

AND BIT_COUNT(A.myset & B.myset)

Because of the functions being performed on the rows, this type of query cannot benefit from an INDEX. This means that the entire table will have to be scanned to find matches to user 8. However, as this is a bitwise operation, it should be quite efficient in spite of the full table scan (I was able to perform this operation on a randomly generated table of 500,000 rows and have 250,000 rows returned in 1.89 seconds on my 1.6 Ghz PC).

By comparison, performing this operation on the normalized three table schema I described above would require a three table JOIN with N AND clauses in the WHERE section of the query, one for every interest user 8 possessed.
Retrieving Set Members

To retrieve a list of the set members, use the DESCRIBE syntax as follows:

mysql> DESCRIBE set_test myset;
  +-------+-------------------------------------------+------+-----+---------+-------+
  | Field | Type                                      | Null | Key | Default | Extra |
  +-------+-------------------------------------------+------+-----+---------+-------+
  | myset | set('Travel','Sports','Dancing','Dining') | YES  |    | NULL    |      |
  +-------+-------------------------------------------+------+-----+---------+-------+
  1 row in set (0.00 sec)

If working programmically, you need to use the second column and strip the ’set(’ and ‘)’ to have a comma separated list, which most programming languages can break into an array automatically (in Visual Basic you would use the split function).
The SET Datatype and Visual Basic

Because the MySQL SET datatype is returned as a string in most situations, it can be handled as one by ADO. You can also use the myset+0 syntax to return it as in integer and handle it accordingly.

The MySQL SET datatype is a natural choice for handling checkboxes in Visual Basic. The checkboxes can be generated dynamically be reading the SET definition and building an array of checkboxes. Because we can handle all modifications programmically, we really do not need to handle the modification of individual elements as we can build the integer to define the entire set quite easily.
Building The Array

By using dynamic controls and by parsing the output of DESCRIBE to get a list of elements, we can build at array of checkboxes to handle our SET data:

  rs.Open "DESCRIBE set_test myset", conn, adOpenStatic, adLockReadOnly
  temp = rs.fields("Type")
  temp = Replace(temp, "set(", "")
  temp = Replace(temp, ")", "")
  temp = Replace(temp, "'", "")
  elements = Split(temp, ",")

  Check1(0).Caption = elements(0)

  For counter = 1 To UBound(elements)
      Load Check1(counter)
      Check1(counter).Caption = elements(counter)
      Check1(counter).Top = Check1(counter - 1).Top + Check1(counter).Height + 20
      Check1(counter).Visible = True
  Next counter

Populating Checkboxes

We can recycle some of the code above to handle the assignment of the checkboxes when loading SET data from the MySQL database:

        rs.Open "SELECT * FROM set_test LIMIT 1", conn, adOpenStatic, adLockOptimistic
    temp = rs.fields("myset")
    elements = Split(temp, ",")

    For counter1 = LBound(elements) To UBound(elements)
        For counter2 = LBound(Check1) To UBound(Check1)
            If Check1(counter2) = elements(counter1) Then Check1(counter2).Value = 1
        Next counter2
    Next counter1

This represents one of the simplest approaches, iterating through each checkbox for each element and assigning matches.
Updating Values

Changing the SET to reflect our changes is then as simple as assigning the element in the array to the SET:

        If Check1(Index).Value Then
        conn.Execute "UPDATE set_test SET myset = CONCAT_WS(',',myset,'" &    Check1(Index).Caption & "') WHERE rowid = " & rs.fields("rowid")
  Else
        conn.Execute "UPDATE set_test SET myset = REPLACE(myset,'" & Check1(Index).Caption    & "','') WHERE rowid = " & rs.fields("rowid")
  End If

With this approach all changes are made as soon as the checkbox is checked. You could also assign the whole SET at once, by iterating through the checkbox array and adding up the checked items, using 2^counter to get the integer value of each checkbox.
Conclusion

The MySQL SET datatype is not the perfect solution for all MySQL databases, but can be quite powerful when appropriate. IF you need to track less than 64 attributes for a given entity and make comparisons between different entities, the MySQL SET datatype may be ideal for your needs.
3 Responses to “The MySQL SET Datatype”

  1. martin hoffmann Says:
      March 27th, 2007 at 7:30 pm

      finally someone who spend some time to explain this interesting datatype SET I was long time looking for, thanks a lot!
  2. Doug Dosdall Says:
      June 16th, 2007 at 10:52 pm

      Hi–Thanks for the article. Good to understand this! There’s one thing I’m having a little bit of trouble with is that I would like to create my table initially in MS Excel, save it as CSV and then import it into MySQL. But when I have a column like Interests with a cell value like “travel, sports, dancing”, MySQL just sets the value of Interests to “travel”, ignoring the rest of the list.

      Any ideas on how to format the CSV file so that MySQL will import it properly (without, hopefully, having to type in the numeric value for the interests combo).

      This is what the first 2 rows (header + 1st data row) of my CSV look like now (with the set column actually named features and the value I want being “waterfront, pool, titled”):

      listing_id,listing_type,general_area,land_size,asking_price,primary_photo,num_photos,descrip_en,descrip_es,map_id,features,agent,agent_listing_ref
      1,Hotel,Cahuita,2598.53,725000,1,13,Unique impeccable hotel for sale in Cahuita. This place has been nurtured with lots of love and now the owners would like to sell it to the special people who would like to carry on with the gentle spirit of the place. Rare: titled property 100 meters from the beach! Excellent reviews from all major guidebooks. Manicured gardens; outstanding construction; excellent maintenance; good revenues. ,,,”waterfront, pool, titled”,patricia,TDCA001
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值