Column AND Row Posted by David Dobrin

Let me start with stuff that we all know. HANA is a database. In the database, there are tables. The tables are essentially blank spaces where you can load collections of fields called records. In a row-oriented database, each row in the table contains a record, and each field in the record occupies space, even if there is no actual data there.

Here’s an example. Imagine a table of home addresses, which contains places to put the last name, street, city, and telephone number for a number of people. (Each of these “places” is called a field.) In a row-oriented table, the filled-in table would contain many lines (or “rows”), one for each address. Each line would contain all the values you had for that person, but if, for instance, you didn’t have a telephone number, the space for a telephone number would be left blank.

In a column-oriented table, each field occupies a separate column. When you load an address in, you don’t put everything on the same row. You just add any actual value you have to the end of the column that it belongs to. So you add the last name to the “Last Name” column, the city to the “City” column, etc. And if you don’t have a value, you don’t add it.

So how does the column-oriented table know that data which is now in different rows actually “belongs” to certain other values that you loaded in? (The technical term is “record.”) Well, essentially [handwaving alert!!] each value is tagged, and there’s another table where you use the tags to look up which record the value belongs to.

You can see that a row-oriented table is much easier to understand conceptually (it’s like rows in a spreadsheet), but a column-oriented table is much more compressed and, as it turns out, much faster if you want to get data back out again and work on it.

In a really interesting paper on HANA (Sikka, et al., “Efficient Transaction Processing in SAP HANA Database – The End of a Column Store Myth,”), Sikka characterizes row-oriented tables as “write-optimized,” and column-oriented tables as “read-optimized,” and this is certainly a fair description of the two kinds of tables’ essential strengths. Use rows for transactions, where you’re trying to enter data. Use columns for analytics, where you’re trying to look at data.

It would seem that this choice (row <it>or</it> column) is just built into the nature of the world. It’s either in rows or it’s in columns. And perhaps for that reason, most databases make a choice about how they’re going to operate. Either all the tables in the database are going to be row-oriented, or all the tables in the database are going to be column-oriented.

But not HANA. In HANA, a table can be row-oriented or column-oriented; the programmer (or data designer) simply decides which it is to be.

But that’s not all there is to it. Because, for a programmer, inside HANA, the choice of row <it>or</it> column is not the same as it would be elsewhere. In HANA, if you decide that a table is to be column-oriented, you don’t necessarily give up on write-optimization (what row tables are good at). HANA has set up an internal system for column-oriented tables that essentially [hand-waving alert] allows you to enter data into a buffer that is row-oriented, then, in a series of steps, push the data into the column-oriented table, without impairing the performance of the column-oriented table.

So, in two quite different senses, HANA is both row-oriented <it>and</it> column-oriented. A. It allows you to create either row-oriented tables or column-oriented tables. B. If you create column-oriented tables, the column-oriented table has a row-oriented buffer for data entry which is write-optimized, so effectively [hand-waving alert] the column table behave as if it were a row table.

Write-Optimizing the Column

It’s worthwhile to understand how this is done, because it gives you some insight into how HANA is using in-memory.

The following diagram is drawn from the aforementioned paper.

/wp-content/uploads/2012/06/pastedgraphic-1.jpg

Figure 4. Overview of the unified table concept

Source: “Efficient Transaction Processing in SAP HANA Database – The End of a Column Store Myth” Vishal Sikka, et al., pp. 731-741.


The columnar table in HANA can act like a row store for one simple reason. It is a row store. As you can see from the diagram above, there is a buffer store (L1) which is row-oriented. Data coming in goes into L1.

Every so often, an L1 buffer is effectively closed, and the data in it is inserted into a column-oriented buffer (L2). This involves the typical column-oriented things, where the values are added to their respective columns and the fact that they belong to a particular record is recorded in another table.

Every so often, the L2 buffer is closed and the data is pushed into the primary columnar store.

To get the data into the columnar table without hurting performance, it first puts the data into a “write-optimized” [row] store (L1). It then takes that data and restructures it, pushing it into a second store (L2), which is column-oriented. Finally, it pushes the data in L2 into the main column-oriented table which was the intended destination all along.

This process would be weirdly byzantine and slow if it weren’t done in-memory (and in parallel). But since L1 and L2 and the final column store are in memory and all the processing needed to make this work is done in memory, it can all be done in the background. And because L1 and L2 are relatively small, the data in them can be made available to other database operations.

So, for instance, if you run a query on the columnar database while some data is being added, the query just checks L1 and L2, as well as the primary store Since L1 and L2 are always relatively small and since the query is done in-memory, the performance cost of doing this is small.

When Do You Use Row, When Column: One Man’s Opinion

You can see, now, why I chose to do some hand-waving when I was explaining the design idea behind HANA. L1, L2, write-optimized, blah, blah, Important to know if you want to know what it does. But hard to grasp if you’re just trying to get the idea.

Besides, a little learning is a dangerous thing. When you know things, you start asking questions. Why provide that much flexibility? What is it good for? How will people use it? What does it enable? If we were working at the hand-waving level, you just wouldn’t be asking questions like that.

But now that I’ve given you the detail, I have some responsibility for explaining what this row-and-column stuff is doing. I myself can’t answer from my own experience. So I did the next best thing. I asked people who have built products in HANA

I started with Adam Thier, whose job title at SAP used to include the words “database” and “architect.”

Why or when do you use row-oriented tables and when do you use columnnar tables?

“I am a financial software lifer [he worked on Essbase at Hyperion and is now working on EPM at SAP] in a long term platonic relationship with boring algorithms like “depreciation” and “allocations”; and I’ve ground my teeth to nubs trying to make basic algebra fly in a relational database. That’s why I jumped ship to Columnar back in the 1990s and never looked back. I typically always start there with the exception of long text or Blobs (for things like descriptions).

“However, even in the world of HANA, that’s not a constant because of the wonders of text analysis.”

So, if you have to do calculations on the data, you use columns, and if you have to do other things, like store text, then you use rows, except when you don’t.

That seems clear enough.

When Do You Use Column, When Row: Another Man’s Opinion

Not content with that, I also asked Vijay Vijayasankar, who has done a fair amount of HANA development and has served as an InnoJam judge with me. He’s the sort of guy who would probably prefer to post on this himself, so I’ll just pick and choose from his answer.

“HANA can have column or row stores; there is no technical limitation. My simple rule of thumb is ‘use a column table until proven wrong.’

“All data is going to be stored in more than one table, so you also need to think about how you will join the tables. The worst performing apps are the ones that try to join columnar and row stores. The reason for the drop in performance is that the result of such a join needs to be temporarily materialized, which kind of defeats the purpose of HANA.’

“There are other constraints. You cannot build a column view if the underlying table is a row table. A row store table is loaded into memory the moment the HANA engine starts, whereas a column gets loaded into memory when a query needs it.

So again, you use columns, except when you use rows, except for when columns are better.

So Does It Matter?

I went back to Adam with a bit of a puzzled look on my face, and said, “?”

“The beauty is that with HANA it doesn’t matter. I can start columnar, and if I stumble I can spend a few hours and reconfigure certain tables over to rows, with a smile on my face the whole time.”

So, in effect, not all that much has changed in HANA? You did columns in the past, and now you’re still doing columns?

“No. I’m doing columns in a different, more transactional way. Remember, I — and a whole bunch of people much smarter than me — who go back to the original MOLAP engines (essentially column stores) never cracked the code on a hybrid database — which meant we never cracked the code on how to do an insert in a MOLAP structure without then taking it off-line to re-index. This isn’t a big deal — it’s a huge deal.”

Simple and Fruitful

In my earlier blog, I said there were too tests to apply to the design of HANA. Is it simplifying? And is it fruitful?

This post might seem to suggest that there’s a problem with the “simplification” criterion. But I think that’s a deep misreading of the situation.

To begin with, it’s clear that even in the simple case, where columnar tables are being created because what you want is ultrafast analysis, the underlying design that I described in the first blog post makes a material difference.

* In this design, for instance, the columnar tables can be updated continuously, whereas they can’t be in most analytic databases. (You need to take time out to reindex.)

* In this design, too, the creation of views–like the multi-dimensional view you see in most analytic databases–is logically separate from the table structure, so you can do lots of different views, including views that do algebra, without rebuilding the tables, and you can change them as much as you want.

* In this design, as well, you can embed things like statistical functions right in the database, whereas in most analytic databases, you have to pull the data out to do a lot of statistical analysis.

And the more complex cases? I’m not sure we know the answer yet. What we do know is that at least two sharp database guys are pretty much saying, “It depends,” when you ask then how to use the “row and column” structure. That could be because the structure isn’t all that useful, but it could also be that the flexibility that Adam describes gives you a lot of options.

At this point, should one have an absolutely clear idea of how people should use this flexibility?

I don’t think so. Certainly, you saw a similar fluidity in original electric car example. The basic design idea was there in the battery-powered ignition system for the 1912 Cadillac. But at the time, no one was quite sure about where it was going to lead. The original designers were mostly just trying to get rid of one limitation in automotive design, the fact that you had to turn a crank to start the car. (According to legend, the inventor came up with the idea after a wayward crank had broken the jaw of a friend of his, who later died of gangrene.) They certainly saw that there was more to the idea than that. But it would take at least ten years of experimentation before the standard modern automotive electrical system would emerge full-blown.

About the Author

I run a small analyst firm in Cambridge, Massachusetts that does strategy consulting in most areas of enterprise applications. I am not a database expert, but for the past year, I have been doing a lot of work with SAP related to HANA, so I’m reasonably familiar with it.

SAP is a customer, but they did not commission this piece, and they did not edit or offer to edit it in any way.


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
东南亚位于我国倡导推进的“一带一路”海陆交汇地带,作为当今全球发展最为迅速的地区之一,近年来区域内生产总值实现了显著且稳定的增长。根据东盟主要经济体公布的最新数据,印度尼西亚2023年国内生产总值(GDP)增长5.05%;越南2023年经济增长5.05%;马来西亚2023年经济增速为3.7%;泰国2023年经济增长1.9%;新加坡2023年经济增长1.1%;柬埔寨2023年经济增速预计为5.6%。 东盟国家在“一带一路”沿线国家中的总体GDP经济规模、贸易总额与国外直接投资均为最大,因此有着举足轻重的地位和作用。当前,东盟与中国已互相成为双方最大的交易伙伴。中国-东盟贸易总额已从2013年的443亿元增长至 2023年合计超逾6.4万亿元,占中国外贸总值的15.4%。在过去20余年中,东盟国家不断在全球多变的格局里面临挑战并寻求机遇。2023东盟国家主要经济体受到国内消费、国外投资、货币政策、旅游业复苏、和大宗商品出口价企稳等方面的提振,经济显现出稳步增长态势和强韧性的潜能。 本调研报告旨在深度挖掘东南亚市场的增长潜力与发展机会,分析东南亚市场竞争态势、销售模式、客户偏好、整体市场营商环境,为国内企业出海开展业务提供客观参考意见。 本文核心内容: 市场空间:全球行业市场空间、东南亚市场发展空间。 竞争态势:全球份额,东南亚市场企业份额。 销售模式:东南亚市场销售模式、本地代理商 客户情况:东南亚本地客户及偏好分析 营商环境:东南亚营商环境分析 本文纳入的企业包括国外及印尼本土企业,以及相关上下游企业等,部分名单 QYResearch是全球知名的大型咨询公司,行业涵盖各高科技行业产业链细分市场,横跨如半导体产业链(半导体设备及零部件、半导体材料、集成电路、制造、封测、分立器件、传感器、光电器件)、光伏产业链(设备、硅料/硅片、电池片、组件、辅料支架、逆变器、电站终端)、新能源汽车产业链(动力电池及材料、电驱电控、汽车半导体/电子、整车、充电桩)、通信产业链(通信系统设备、终端设备、电子元器件、射频前端、光模块、4G/5G/6G、宽带、IoT、数字经济、AI)、先进材料产业链(金属材料、高分子材料、陶瓷材料、纳米材料等)、机械制造产业链(数控机床、工程机械、电气机械、3C自动化、工业机器人、激光、工控、无人机)、食品药品、医疗器械、农业等。邮箱:market@qyresearch.com

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值