mtb13_Perform extract_blend_Super{Candidate(PrimaryAlternate)_Unique(可NULL过滤_Foreign_index_granulari

最新推荐文章于 2024-09-21 17:07:20 发布

LIQING LIN

最新推荐文章于 2024-09-21 17:07:20 发布

阅读量939

点赞数 1

分类专栏： Tableau 文章标签：大数据 big data

本文链接：https://blog.csdn.net/Linli522362242/article/details/125195973

版权

Tableau 专栏收录该内容

24 篇文章 1 订阅

订阅专栏

This chapter will address various aspects of performance with the intent of empowering you with techniques to create workbooks that load quickly and respond snappily[ˈsnæpəli]精练地,快速地 to end user interaction.

In this chapter, we will discuss the following topics:

• Understanding the performance-recording dashboard
• Hardware and on-the-fly techniques
• Connecting to data sources
• Working with extracts
• Using filters wisely
• Efficient calculations
• Other ways to improve performance

As you can see, there are many topics we have to cover with regard to performance improvement. Therefore, let's not lose any time and dive right in. Our first topic will be performance recording, which is the first thing you should look at when experiencing a drop in performance because it helps you identify the source of all the slowness.

Understanding the performance-recording dashboard

Tableau includes a performance-recording feature as part of the installation package and ships as a dashboard named PerformanceRecording.twb . The dashboard gives the Tableau author an easy way to understand and troubleshoot performance problems. The following exercises and associated discussion points will review various aspects of the performance-recording dashboard, including how to generate it, how to use it to improve performance, and how it's constructed.

Perform the following steps:

1. Navigate to https://public.tableau.com/profile/marleen.meier to locate and download the workbook associated with this chapter.

U.S. Regions not showing up :Tableau Community ForumsOR
ORsolution :

Click the fields(Country, Region, State, City) and select Geographic Role. Next, select an appropriate geographic role from the list(Country/Region, Create from | State, State/Province, City)

==>Create Hierarchy(Region,State)

solved ###################

Besides,
==>
2. Navigate to the Types of Events worksheet.
3. Select Help | Settings and Performance | Start Performance Recording.
4. Press F5 on Windows or Command + R on macOS to refresh the view.
5. Select Help | Settings and Performance | Stop Performance Recording. A
new dashboard will open:

There are 4 main parts to be looked at:

A. Performance Summary shows the amount of time the dashboard needed to execute all the steps that happened between step 3 (start performance recording) and step 5 (stop performance recording). The slider can be used to filter.
B. Timeline shows which step happened when and how long it took.
C. The steps are also color-coded and can be found in the Event bar chart as well.
D. Query will only show details when clicking on any event in (B) or (C).

1. Drag the Events timeline slider in (A) to the far left to show all events.

2. Within Events Sorted by Time (C), click on any green bar entitled Executing Query. Note that the Query section is now populated by the VizQL belonging to the highlighted bar:

Command
"
SELECT ""Extract"".""Region"" AS ""Region"",
       SUM( CAST(""Extract"".""Quota"" AS BIGINT OR NULL)
          ) AS ""sum:Quota:ok""
FROM ""Extract"".""Extract"" ""Extract""
GROUP BY 1
"

Command
"
SELECT ""Quota.csv"".""Region"" AS ""Region"",
       SUM(""Quota.csv"".""Quota"") AS ""sum:Quota:ok""
FROM ""TableauTemp"".""Quota#csv"" ""Quota.csv""
GROUP BY 1
"

3. To see the query in its entirety, navigate to the Query worksheet and set the fit to Entire View (by Query, Tableau is referring to a VizQL query). Note that an especially long query may not be recorded in its entirety. In order to see every line of such a query, reference the log files located in My Tableau Repository. C:\Users\liqin\OneDrive\Documents\My Tableau Repositorymtb2_VizQL_Cleaning_Regx phone email_CRISP-DM_pdf table to text then to Excel_LIQING LIN的博客-CSDN博客

The preceding steps were intended to provide you with an initial look at what the performance-recording dashboard can be used for. I usually use it to check whether a particular query took significantly longer than others. I then look at the query on the Query tab and check which part is affected by reading the fields used, try to change it, and then run the performance dashboard again. Now, let's look at some more details.

If you access the Events worksheet from the performance-recording dashboard, you will see the different events. We will now discuss those events and their impact on performance in more detail:

Connecting to Data Source
Poor performance when connecting to the data source could indicate network latency or database issues, or even outdated drivers.
Generating Extract
Aggregating data before you extract can increase performance because it will decrease the total row count.
Compile Query
Compile Query performance problems could indicate database issues.
Executing Query
If a query takes too long, you can improve performance by filtering data you don't need or hiding fields you don't use. Refer to the Using filters wisely section in this chapter for more information.
Sorting Data
Performance issues related to sorting issues may indicate too many marks in the view. This sorting issue can also be caused by table calculations that depend on sorting data in the view.
Geocoding
Geocoding performance issues may indicate too many marks in the view, internet latency issues, poor hardware, or a poor graphics card.
Blending Data
Blending Data performance may be improved by reducing the amount of underlying data or by filtering.
Computing Table Calculations
Since table calculations are typically performed locally, complex calculations may tax造成负担 the end user's computer.
Computing Totals
The Computing Totals performance may be improved by reducing the amount of underlying data or by filtering.
Computing Layout
Computing Layout performance issues may be indicative of a dashboard with too many worksheets or elements such as images.

After this overview, which you can always come back to, we will dive a little deeper and research what happens when performance recording is activated. Note that the following assumes that the author is working on Tableau Desktop, and not Tableau Server. In Chapter 14, Interacting with Tableau Server/Online, we will cover the performance recording on Tableau Server.

When recording performance, Tableau initially creates a file in My Tableau Repository\Logs(C:\Users\LlQ\OneDrive\Documents\My Tableau Repository\Logs), named performance_[timestamp].tab . Additionally, there is a file named PerformanceRecording_new.twb located in the Tableau program directory, for example, C:\Program Files\Tableau\Tableau.[version]\Performance(C:\Program Files\Tableau\Tableau 2022.1\Performance). That file is automatically opened once the recording stops, thereby allowing the author to peruse the results.

Hardware and on-the-fly techniques

The number-one performance inhibitor[ɪnˈhɪbɪtər]抑制者,抑制剂,障碍 for Tableau Desktop that I have observed while training in many different companies is underpowered machines. Developers almost invariably have excellent computers. Analysts and other business users, regrettably, often do not. In many cases, a few modest upgrades can make a significant improvement. Unfortunately, upgrading a computer may be impossible at many organizations due to a variety of factors, and procuring a new machine may also be quite difficult. Therefore, in this section, we will consider both: optimal computer specifications and techniques for working with Tableau on underpowered machines.

The currently published minimum requirements for Tableau Desktop are as follows: The specifications listed for macOS are adequate assuming sufficient RAM (a minimum of 8 GB). Those for Windows, however, are insufficient for many use cases. Instead, consider the following recommendations:
larger cache and faster processing of the i7 processor enables better multitasking and improves performance overall

An SSD outperforms an HDD by a wide margin. Part of the reason is simply better I/O (input/output) performance; Fragmentation is irrelevant for an SSD due to its technical specifications

You may have noticed that the preceding table mostly does not address specific brands, nor are there considerations of different brands of RAM or hard drives. Despite the proliferation of computer and component types (or perhaps as a result of this proliferation), computers and components have become commodities. In short, any brand name equipment should work fine. In fact, off-brand equipment will often perform just as well, although the relatively small price savings may not justify additional risks. A little research combined with common sense should lead to satisfactory results when considering which brand of computer or component to purchase.

Perhaps, as a result of the preceding section, you have ordered a new, more optimal computer. Or, more likely, you may keep it in mind should you have the opportunity to obtain a new machine in the near future, but for now, you have to make do with what you have. This section discusses tips that will help the Tableau author work more efficiently irrespective of the equipment used.

Configuring auto updates

Auto updates can be accessed either via the icon located on the toolbar, via Worksheet | Auto Updates, or by using the shortcut key F10 on Windows or Option + Command + 0 on Mac. Auto updates give the author the option of pausing/resuming auto updates for the worksheet and/or for filters, and can come in very handy if you want to make multiple changes to your layout but want to avoid Tableau loading after every change. You can simply pause the update, make your changes, and then run the update once.

The following exercise demonstrates how this works:

1. Open the workbook associated with this chapter and navigate to the Auto Updates worksheet.
2. In the Data pane, select the Superstore dataset.
3. Place State on the Rows shelf.
4. Deselect Auto Update Worksheet via the toolbar:
Figure 13.3: Auto Update Worksheet
5. Place City on the Rows shelf to the right of State. Note that the view does not update.
6. Enable Auto Update Worksheet via the toolbar. The view now updates.
7. Right-click on State on the Rows shelf and select Show Filter.
8. Right-click on City on the Rows shelf and select Show Filter.
9. On the City filter, click the drop-down menu and select Multiple Values (list) and Only Relevant Values:
10. Deselect Auto Update Filters as shown in Figure 13.3(step 4).
11. In the State filter, select only Alabama. Note that the City filter does not update.
12. Enable the Auto Update Filters via the toolbar. The City filter now updates.

Auto updates can be very helpful. The author may pause auto updates, make multiple changes, and then resume auto updates, thereby saving time and increasing performance indirectly.
and

As a sidenote, cascading filters级联过滤器(City filter is a kind of Dimension Filter) such as Only Relevant Values or All Values in Context may not populate when using the pause button or auto updates as they are reliant on a query being passed first.

The Run Update feature

The Run Update icon to the right of the pause/resume auto updates is meant to refresh once刷新一次, while the user can keep the disabled Auto Update feature in place. The following brief example should help clarify this option:

1. Duplicate the previous worksheet called Auto Updates and name the duplicate Run Updates.
2. Pause all updates by clicking on the Pause Auto Updates icon(OR Deselect Auto Update Worksheet). ==>
3. Select several states at random in the State filter.
4. Click on the Run Update icon as shown in Figure 13.5 and select either Update Dashboard or Update Worksheet. The shortcut key for running an update is F9 on Windows. The shortcut on macOS is Shift + Command + 0:
5. Select several more states at random in the State filter. Note that auto updating is still paused.
Click on the Run Update icon

To repeat, Run Update allows the Tableau author to intermittently[ˌɪntərˈmɪtəntli]间歇地 refresh the view while still keeping auto updates paused. The two update features that we just discussed will make your life as a dashboard developer easier, but if you want to tackle performance issues at their core, you need some more tools at hand. The following section will introduce extracts, a really good feature for speeding up calculations and rendering in general.

Small extracts

Although extracts will be discussed in more detail in the Working with extracts section, it seems fitting to mention extracts in the context of performance considerations while authoring. Even under optimal conditions, working with large data sources can be slow. If constant access to the entire dataset while authoring is not necessary, consider creating a small, local extract. Author as much of the workbook as possible and then when all the underlying data is truly needed, point to the original data source.

The following steps show a brief example of this technique in action:

1. In the workbook associated with this chapter, navigate to the Small Local Extract worksheet.
2. Select Data | New Data Source to choose a desired data source. This exercise assumes Sample - Superstore.xls , which installs with Tableau, but you can take any dataset you like.
3. Drag any field to the Text shelf. In my example, I used Number of Records.
```
1
```
4. Right-click on the data source (for me: Sample - Superstore) and select Extract Data…:
5. At the bottom of the Extract Data… dialog box, select Top, your data source, and choose 1000. In the following figure, Orders has been chosen, which is one of the two parts of the superstore join:
By creating a small, local extract, the Tableau author alleviates[əˈliːvieɪts]减轻,缓和 two performance inhibitors: network latency and dataset size.
6. Click the Extract button and note that Number of Records now displays 1000 rows.
7. In the Data pane, right-click on Sample - Superstore and deselect Use Extract. Note that Number of Records has reverted to its original value.
==>

By creating a small, local extract, the Tableau author alleviates two performance inhibitors: network latency and dataset size.

This section gave you the knowledge of how to reduce your dataset temporarily and the option to get back to the original size, both without leaving the Tableau interface. This is a great way to speed up your dashboard building process and avoid long waits. In the next section, we will talk about how to connect to data sources to achieve the best possible performance.

Connecting to data sources

One of the beauties of Tableau is the ease with which you can connect to many different data sources in various ways. As mentioned earlier in this book, there are many connectors defined in Tableau for interfacing with a variety of data sources. Furthermore, this flexibility extends beyond simply connecting to single tables or files.

Although Tableau makes it easy to connect to various data sources, it should be stressed that Tableau is not an Extract, Transform, and Load (ETL) tool. If complex joins and complex data blending are required to generate useful results, it may be advisable to perform ETL work outside of Tableau, for example, in Tableau Prep Builder (see Chapter 3, Tableau Prep Builder, for more information on this servicehttps://blog.csdn.net/Linli522362242/article/details/124977491). ETL work will ideally lead to better data modeling and thus easier authoring and quicker performance in Tableau.

The 4 ways in which Tableau connects to data are as follows:

• Tableau may connect to a single table. This is ideal as it allows the most functionality and easiest troubleshooting, while enabling Tableau to send the simplest queries and thus perform optimally. However, it is not always possible to connect to a single table and, although ideal, it is not reasonable to have such a strict limitation. The relationship between data sources and reporting tools is constantly changing. A reporting tool that is inflexible in the way it can connect to data will likely not be successful, no matter how elegant and beautiful the end results.
• The second option is relationships, a newer, more flexible way to combine two or more datasets. The level of detail will be defined per worksheet rather than in the data source tab, which makes this feature very powerful.
• The third way in which Tableau may connect to data is via joining. One table may not supply all the necessary data, but by joining two or more tables, all the needed data may be accessible. As the joins become more and more complex, performance may be impacted, and troubleshooting may become difficult. Fortunately, Tableau can assume referential integrityhttps://blog.csdn.net/Linli522362242/article/details/124335986 and thus work quite efficiently with even complex joins.
• Finally, Tableau may utilize data blending. Data blending often performs admirably, provided no more than one of the blended data sources is large and dimensions that are used for blending have relatively few members.https://blog.csdn.net/Linli522362242/article/details/124335986

Relationships should be chosen over joining and joining should be chosen instead of blending whenever possible. When blending multiple, large data sources, performance can be seriously impacted. The problem is further compounded when blending on high-cardinality dimensions. Also, data blending limits some functionality, such as the ability to use dimensions, row-level calculations, or LOD expressions, from a secondary data source.

However, there are exceptions, two of which are discussed here:

• First, data blending is advantageous (and usually necessary) when there is no common key shared between two tables.
• Secondly, in some instances, cross-joining will not work, and a data blend is required. For example, use data blending when you work with two datasets that have different granularities and relationships are not possible, or when a cross-database join is not possible (for example, to cubes or extract only connections), or when you have big datasets for which a blend will improve performance.

Suppose you have the following tables. If the linking fields are User ID and Patron ID, not all values can be a part of the resulting table because of the following:
- A row in the left table does not have a corresponding row match in the right table, as indicated by the null value in the results.
- There are multiple corresponding values in the rows in the right table, as indicated by the asterisk (*) in the results.
  an asterisk (*) in a view with blended data indicates multiple values.
  Asterisks (*) may appear. This indicates multiple dimension values in a single mark, because data blending takes aggregated results and combines them in the view.
- When measures are involved, they are also aggregated, as seen below:

More info can be found here: https://help.tableau.com/current/pro/desktop/en-us/multiple_connections.htm

For these reasons, consider as a guideline that data blending should normally be avoided if a joining option exists.

https://blog.csdn.net/Linli522362242/article/details/124335986, All About Data – Joins, Blends, and Data Structures, provides detailed information about joining and blending. For the purposes of this chapter, joining and blending discussions will be limited to performance considerations.

Working efficiently with large data sources

This section will cover some basics of database tuning and ways to work efficiently with large data sources. Since the topic is more focused on data sources than on Tableau, no exercises are included.

If you are connecting to large data sources and are experiencing performance problems, a conversation with a database administrator (DBA) may be beneficial.

Clear communication coupled with a small amount of database work could dramatically improve performance清晰的通信加上少量的数据库工作可以显着提高性能。. The conversation should include database-tuning points, such as explicitly defining primary and foreign keys, defining columns as not null, and indexing. Each point will be discussed here.

Super{ Candidate(Primary, Alternate), Unique(can hold NULL) }, Foreign, Composite Keys

Primary and foreign keys are essential for joining tables. A primary key is composed of one or more columns in a table. The primary key should be unique for every row. Joining on a non-unique, row-level key may lead to erroneous[ɪˈroʊniəs]错误的 results, as explored in Chapter 4, All About Data – Joins, Blends, and Data Structures. Explicitly defining primary keys in the database helps to ensure that each key value is unique:

https://www.educba.com/sql-keys/

Primary Key
Primary Key is a field that can be used to identify all the tuples uniquely in the database. Only one of the columns can be declared as a primary key. A Primary Key can not have a NULL value.

Example: In the above given relational table, “cust_id” is the Primary Key as it can identify all the row uniquely from the table.
Unique Key
Unique Key can be a field or set of fields that can be used to uniquely identify the tuple from the database. One or more fields can be declared as a unique Key. The unique Key column can also hold the NULL value. Use of Unique Key improves the performance of data retrieval. It makes searching for records from the database much more faster & efficient.

Example: In the above given relational table, “cust_aadhaar_number”, “cust_pan_number” are the Unique Key as it can allow one value as a NULL in the column
Candidate Key : Primary Key and Alternate Key
Candidate Key can be a column or group of columns that can qualify for the Unique Key. Every table has at least one Candidate Key. A table may have one or more Candidate Key. Each Candidate Key can work as a Primary Key if required in certain scenarios.

Example: In the above given relational table, “cust_id”, “cust_aadhaar_number”, “cust_pan_number” are the Candidate Key as it can identify all the row uniquely from the table. These columns also qualify the criteria to be a Primary Key.
Alternate Key
Alternate Key is that Key which can be used as a Primary Key if required. Alternate Key also qualifies to be a Primary Key but for the time being, It is not the Primary Key(A Primary Key can not have a NULL value).备用键也有资格成为主键，但暂时，它不是主键。

Example: In the above given relational table, “cust_aadhaar_number”, “cust_pan_number” are the Alternate Key as both of the columns can be a Primary Key but not yet selected for the Primary Key.
Composite Key
Composite Key is also known as Compound Key / Concatenated Key. Composite Key refers to a group of two or more columns that can be used to identify a tuple from the table uniquely. A group of the column in combination with each other can identify a row uniquely but a single column of that group doesn’t promise to identify the row uniquely.

Example: In the above given relational table i.e. Order Table, “cust_id”, “order_month_year” group of these columns used in combination to identify the tuple uniquely in the Order Table. The individual column of this table is not able to identify the tuple uniquely from the Order table.
Super Key
Super Key is a combination of columns, each column of the table remains dependent on it. Super Key may have some more columns in the group which may or may not be necessary to identify the tuple uniquely from the table. Candidate Key is the subset of the Super Key. Candidate Key is also known as minimal Super Key.

Example: In the above given relational table, Primary Key, Candidate Key & Unique Key is the Super Key. As a single column of Customer Table i.e ‘cust_id’ is sufficient to identify the tuples uniquely from the table. Any set of the column which contains ‘cust_aadhaar_number’, ‘cust_pan_number’ is a Super Key.
Foreign Key
A foreign key is a column which is known as Primary Key in the other table i.e. A Primary Key in a table can be referred to as a Foreign Key in another table. Foreign Key may have duplicate & NULL values if it is defined to accept NULL values.

Example: In the above given relational table, ‘cust_id’ is Primary Key in the Customer table but ‘cust_id’ in the Order table known as a ‘Foreign Key’. Foreign Key in a table always becomes the Primary Key on the other table

A foreign key is composed of one or more columns in a table that uniquely identify rows in another table. This unique identification occurs as a result of the foreign key in one table referencing the primary key in another table. Explicitly defining foreign keys in the database enables Tableau to bypass many integrity checks, thereby improving performance.

For instance, deleting a record(artist_id = 4 from artist table) that contains a value referred to by a foreign key in another table(artist_id,album_id, album_name) would break referential integrity.However, the album "Eat the Rich" referred to the artist with an artist_id of 4. With referential integrity enforced, this would not have been possible.

Defining columns as NOT NULL

Tableau has published multiple white papers on performance improvement tips and tricks ( https://help.tableau.com/current/pro/desktop/en-us/performance_tips.htm ) that state that programmers and Tableau Desktop do not like NULL data. Define each column in your tables with an explicit NOT NULL if possible.

In practice, database admins debate when it is and isn't appropriate to define columns as NOT NULL ; however, two things are clear:

• A primary or foreign key(referential integrity) should be defined as NOT NULL. This is self-evident since primary and foreign keys must be unique by definition.
• Also, any column that is to be indexed should be defined as NOT NULL since otherwise an index may be unusable.

Indexing is discussed more fully in the next section.

Indexing

Let's consider the following two questions regarding indexing:

• What is an index?
• What should be indexed?

The first of our two questions may be easily answered by a DBA, but is likely uncharted waters for the Tableau author. So, to clarify, an index is a copy of selected columns in a database table that has been optimized for efficient searching. Since these copied columns include pointers to the original columns, they can be accessed to quickly find given rows and return the required data.

A small example may prove helpful. According to The Boeing Company, the 787 Dreamliner has about 2.3 million parts. Imagine a table that lists all of these parts in the Part_Name column. Your task is to search this column for every part starting with the "fuse" string. On a non-indexed column, this would require the examination of every row of data in the database. Such a search could be quite slow. Fortunately, indexes can be used to reduce the number of rows searched, thus making the process much faster. One type of structured data used for indexing is B-tree. A B-tree data structure is sorted. Thus, when accessing an index using a B-tree data structure to search for all parts starting with fuse, not every row has to be considered. Instead, the database can skip straight to fs and quickly return the desired rows.

Now let's move on to the second question on indexing. What should be indexed? This question can be answered fairly succinctly[səkˈsɪŋktli]简洁地: ideally, all columns used for joining or filtering should be indexed in the data source.

Although there are some basic performance considerations for creating more efficient joins in Tableau (for example, avoid an outer join when a left join will suffice), join performance is largely determined outside of Tableau. Therefore, it is typically more important to index columns used in joins than those used for filtering.

To continue with our discussion of manipulating data sources, the next section will cover how Tableau can be used to create summarized datasets through extracting.

Working with extracts

This section will discuss what a Tableau data extract is as well as how to efficiently construct an extract. A colleague of mine recently consulted with a relatively small mobile phone service provider. Even though the company was small, the volume could be in excess of 1,000,000 calls per day. Management at the company insisted on the ability to interface[ˈɪntərfeɪs] with相互联系,相互交流 detailed visualizations of individual calls in Tableau workbooks. The performance of the workbooks was, understandably, a problem. Was such low-level detail necessary? Might less detail and snappier workbooks have led to better business decisions?

In order to balance business needs with practical performance requirements, businesses often need to ascertain[ˌæsərˈteɪn]查明,确定 what level of detail is genuinely helpful for reporting. Often, detailed granularity is not necessary. When such is the case, a summary table may provide sufficient business insight while enabling quick performance. In the case of the mobile phone service provider, a daily snapshot of call volumes may have sufficed. Even an hourly snapshot would have greatly reduced the table size and improved Tableau's performance.

To address this common business need, an extract is a proprietary[prəˈpraɪəteri]专有的 compressed data source created by Tableau Desktop. Since its release, the file extension for an extract changed from .tde to the .hyper format. Thus, the new format makes use of the Hyper engine, which was discussed in t2 https://blog.csdn.net/Linli522362242/article/details/123020380. An extract can be stored locally and accessed by Tableau to render visualizations.

Consider the following points that make an extract file an excellent choice for improved performance:

• Extracts can be quickly generated at an aggregate level
• Extracts are a columnar store, which records as sequences of columns
• Relational databases typically store data using a Row Store methodology

In the following example, note that Row Store is excellent for returning individual rows, whereas Column Store is much better for returning aggregated data.

Here is an example table:

Here is a Row Store table in a database:

Here is a Column Store table in a database:

I hope you could see that in a column store table, each n row of a certain attribute makes up for the first row. For example, the first row of Instrument , the first row of Store , and the first row of Price all relate to one entry, whereas in a row store table, all rows that belong to the same entry are in consecutive order.

To sum up what we have learned so far in this section, extracts use compression techniques to reduce file size while maintaining performance, and utilize RAM and hard drive space for optimal performance. Neither of those two points are given when using a live connection to a database—therefore, extracts can improve the performance of your dashboard whenever the database can't.

Constructing an extract

This section will discuss extracts from a performance aspect. Other aspects of extracts, such as scheduling and incremental refreshes, will not be considered here:

As we discussed in the Small extracts section earlier, an extract is created via Data | [Data Source] | Extract Data. From the resulting dialog box, we can take the following actions:

Filter the extract as needed: Sometimes, an extract that precisely reflects a data source is warranted, but often filtering various fields will still populate the extract with the required data while shrinking the size and improving performance. To add a filter, simply click Add... to access a dialog box identical to the filter dialog box used within a worksheet:
Aggregate to the level of granularity represented in the view: Aggregation not only reduces the file size, but can also be helpful from a security standpoint. Without aggregation, an extract is constructed using row-level data. Therefore, the Tableau author should note that if the extract is built without choosing to aggregate, any sensitive row-level data is accessible:
Reduce the number of rows: As shown in the Small extracts section, reducing the number of rows can allow the author to create a small, local extract for quick workbook building, after which the original data source can be accessed for complete analysis:
Hide all unused fields: This option excludes all columns that are not used in the workbook from the extract. This can significantly reduce the extract size and increase performance:

By taking these four measures, your performance should improve immediately. Feel free to test it yourself by using the performance-recording tool and creating different extracts of the same data source. Using aggregation and performance-recording actions will be discussed next.

Aggregation

The following exercise will use two aggregates from a single data source, one at the State level and the other at the City level. These aggregated data sources will be used to create two worksheets. Each of these worksheets will be placed on a dashboard along with a third worksheet with row-level information. Finally, filter actions will be created to tie the three worksheets together. The purpose of the exercise is to demonstrate how small extracts might be used in conjunction with a larger dataset to create a more performant dashboard:

1. Open the workbook associated with this chapter and navigate to the State Agg worksheet.
2. In the Data pane, select the SS - State Agg data source.
3. Create a filled map using state by placing State on the Detail shelf and selecting Filled Map from the Marks card.
4. Right-click on the SS - State Agg data source and select Extract Data.
5. Note that Tableau displays an error stating that it cannot find the referenced file. You can either point to the instance of Sample - Superstore that ships with Tableau or you can use the instance provided via the GitHub link: https://github.com/PacktPublishing/Mastering-Tableau-2021/tree/main/Chapter%2013
6. After connecting to the data source, Tableau will display the Extract Data dialog box. Within the dialog box, select Aggregate data for visible dimensions and All rows. Click the Hide All Unused Fields button and then click on Extract:
Note that the resulting extract only contains State. Also note that the data has been aggregated so that no underlying data is available.
7. Navigate to the City Agg worksheet.
8. In the Data pane, select the SS - City Agg data source. Note that this data source has already been extracted and so only contains State, City, and Sales. Also note that the data has been aggregated so that no underlying data is available.
9. Place City on the Rows shelf, Sales on the Text shelf, and State on the Detail shelf. Don't forget to include State even though it does not display on the view. It must be used so that the dashboard created at the end of the exercise works correctly.
10. Navigate to the Row Detail worksheet and select the Superstore dataset.
11. Create a crosstab view that displays Customer Name, Order ID, Row ID, Profit, and Sales. One quick way to create this view is to double-click on each field.
12. Navigate to the Agg Dash dashboard and place each of the three worksheets on the dashboard.
13. Create the following actions via Dashboard | Actions | Add Action | Filter:
City to Detail

State to City
14. After creating these two actions, in the dashboard, click on the State field.
15. Then, click on the City field.
16. Click on a blank portion of the City Agg worksheet to exclude all values on Row Detail worksheet.
17. Click on a blank portion of the State Agg worksheet to exclude all values on City Agg worksheet.
18. Format the dashboard as desired:

Having completed this exercise, note that the resulting dashboard is quite performant:

• When the user first opens the dashboard, only State Agg worksheet displays. This is performant性能好的 for two reasons.
- First, displaying a single worksheet as opposed to every worksheet when opening the dashboard causes fewer initial queries and less rendering.
- Second, accessing a small extract is quicker than accessing a larger data source.
• Since the City Agg worksheet is also accessing a small extract, when the user clicks on a state, the City Agg worksheet will appear quickly.
• When the user clicks on City, a call is made to the data source that only includes the information for that particular city. A relatively small amount of data is pulled, and performance should be good for even larger datasets.

Another aspect of good performance practice, apart from using aggregate extracts, should be considered for this exercise. The dashboard contains no quick filters. Often, using quick filters on a dashboard is unnecessary. If the worksheets on the dashboard can be used to filter, those worksheets can essentially do double duty. That is to say,

worksheets can provide valuable analysis
while simultaneously acting as filters for other worksheets on the dashboard.

This represents a performance improvement over using quick filters, since adding quick filters would cause additional queries to be sent to the underlying data source.

In the preceding dashboard, each worksheet references a different data source. Therefore, you may ask, how are the action filters able to function across the different data sources? The answer can be found in the filter action dialog box. As shown in Figure 13.14 and Figure 13.15, All Fields are considered Target Filters. Tableau simply matches any fields of the same name across each data source. Extracts can be optimized for even better performance results. We'll cover this in the next section.

Optimizing extracts

Optimization accelerates[əkˈseləreɪt]促进 performance by materializing calculated fields实现计算字段 when possible. This means that Tableau actually generates values for calculated fields in the extract so that those values can be looked up instead of calculated. If you were to use table calculations, Tableau would have to calculate the values each time you change the view.

Note that not all calculated fields are materialized. Fields that are not materialized include table calculations, changeable or unstable functions, such as NOW() and TODAY(), and calculated fields using parameters.

When an extract is first created, it is automatically optimized. In other words, calculated fields are automatically materialized when possible. However, over the course of time, calculated fields may be altered that will cause the extract to drop materialized fields. At such times, open Data | [Data Source] | Extract and click on Compute Calculations Now in order to regenerate the materialized fields: If an extract is set to refresh on Tableau Server, the extract is automatically optimized for each refresh.

The following example demonstrates calculated fields that are materialized and those that are not.

Perform the following steps:

1. In Tableau, select File | New to create a new workbook.
2. In the new workbook, select Connect to Data in the upper-left corner and connect to the Sample Superstore dataset located in My Tableau Repository/Datasources or in your saved data sources.
3. Connect to the Orders table.
4. Create the following parameter:Select Sufficient Profit Ratio
The Select Sufficient Profit Ratio parameter shown in the preceding screenshot allows us to select a value between 0.25 and 0.5 in steps of 0.01. This configuration is just an example; you can select any range of values that you define as a sufficient ratio.

5. Create the following calculated fields:

Profit Ratio
```
SUM([Profit])/SUM([Sales])
```

This Year's Profit

IF [Profit Ratio] > [Select Sufficient Profit Ratio]
    THEN "Sufficient Profit"
END

Window Sum
```
WINDOW_SUM( SUM([Sales]) )
```
Profitable?
```
[Profit] > 0
```

6. Right-click on the data source and select Extract Data.
7. Click the Extract button.
8. When prompted, save the resulting extract to a location of your choosing.
9. Open a new Tableau workbook, select File | Open, and then select the extract created in the preceding step.
10. Note the following in the data source:
- Profit Ratio, an aggregate calculation, was materialized.
- This Year's Profit, which references a parameter, has a value of null and was not materialized. since
- Window Sum, a table calculation, has an "undefined" value and was not materialized.
- Profitable?, a row-level calculation, was materialized.

This exercise demonstrated that once an extract has been created and you add new calculated fields to your dashboard, they might not be included in your extract. Next time you miss a field in your extract, think back to when you did this exercise and remember that certain fields will not be materialized.

Finally, if you make use of parameters in your dashboard,

check whether you can eliminate those and use calculations instead to improve performance.
Also, split calculations if they can't be materialized as a whole. Put the part that can be materialized in one calculated field and the non-materialized part in another. If parts of the calculation can be calculated within the extract creation, you will gain performance.

The advantage of using extracts has now been discussed in great detail, so let's move on and see how we can make the most out of filters.

Using filters wisely

Filters generally improve performance in Tableau. For example, when using a dimension filter to view only the West region, a query is passed to the underlying data source, resulting in information returned for just that region. By reducing the amount of data returned, performance improves. This is because less data means reduced network bandwidth load, reduced database processing requirements, and reduced processing requirements for the local computer.

Filters can also negatively impact Tableau's performance. For example, using only relevant values causes additional queries to be sent to the underlying data source, thereby slowing down the response time. Also, creating quick filters from high-cardinality dimensions can impair performance.(High cardinality refers to a column that can have many possible values. For an online shopping system, fields like userId , shoppingCartId , and orderId are often high-cardinality columns that can take take hundreds of thousands of distinct values)

Tableau's filters are executed in a specific order, so keep this in mind when using them. Refer to the Tableau help pages here: https://help.tableau.com/current/pro/desktop/en-us/order_of_operations.htm . The following flowchart, accessible via the preceding link, along with a link to an hour-long presentation, may help:

The rest of this section follows, step by step, the order of operations. By the end of it, you will be able to know which filters to use in which situation so as to achieve the best performance for your dashboard.

Extract filters

Extract filters remove data from the extracted data source. Simply put, the data isn't there. Thus, performance is enhanced by reducing the overall amount of data. Performance may also be improved since extracted data uses Tableau's proprietary, columnar dataset. I hope you could see that in a column store table, each n row of a certain attribute makes up for the first row. For example, the first row of Instrument , the first row of Store , and the first row of Price all relate to one entry,

Furthermore, extracts are always flattened, which will have performance advantages over connecting to datasets using joins. To create an extract filter, begin by selecting Data | [Data Source] | Extract Data. In the resulting dialog box, choose to add a filter.

Data source filters

Data source filters are applied throughout the workbook. For example, if you create a data source filter that removes all members of the Country dimension except the USA, the Country dimension will only include the USA for all worksheets in the workbook.

Data source filters improve performance in the same way as dimension and measure filters; that is, data source filters cause Tableau to generate a query to the underlying data source, which will limit the data that is returned. Less returned data generally results in quicker processing and rendering. A further advantage that data source filters offer is ease of authoring. For example, if the Tableau author knows in advance that an entire workbook is going to be USA-centric, creating a data source filter saves you the trouble of applying a dimension filter to every worksheet in the workbook using that data source.

Also note that data source filters occur quite early in the process flow. All calculations (including calculations using fixed LOD expressions that are rendered before dimension and measure filters are triggered) respect data source filters.

To create a data source filter, click the Data Source tab located in the bottom-left corner of Tableau Desktop. Then, click on the Add link located in the top-right corner of the page:

Context filters

A context filter is created simply by right-clicking on a field in the Filter shelf and selecting Add to Context:

Dimension and measure filters are independent. Each filter queries the data source independently and returns results. A context filter, on the other hand, will force dimension and measure filters to depend on it. This behavior can be helpful (and necessary) for getting the right answer in some circumstances. For instance, if a Tableau author accesses the Superstore dataset and uses a filter on Product Names to return the top-10 selling product names in a single category, it will be necessary that Category is defined as a context filter. Otherwise, the Product Names filter will return the top 10 overall. Because of this, context filters improve performance.

Dimension and measure filters

Dimension and measure filters can improve performance. Since either a dimension filter or a measure filter will cause Tableau to generate a query to the underlying data source, which will limit the data that is returned, performance is improved. Simply put, the smaller the returned dataset, the better the performance.

However, dimension and measure filters can degrade performance. Since Tableau not only generates queries to the underlying data source in order to display visualizations, but also generates queries to display filters, more displayed filters will slow performance. Furthermore, displayed filters on high-cardinality dimensions can inhibit performance抑制性能. (A dimension with many members is referred to as having high cardinality.) Consider the example of a filter that displays every customer in a dataset. Performance for such a filter might be slow because every customer in the underlying dataset must be located and returned, and then Tableau has to render and display each of these customers in the filter.

When using two or more dimension or measure filters on a view, a relevant filter may be used to limit the choices that display. For example, if a view includes a filter for city and postal code, the latter might be set to show Only relevant values:

This is advantageous to the end user in that it adjusts the number of postal codes that display to reflect only those pertinent[ˈpɜːtɪnənt]相关的 to the cities selected in the first filter. However, using relative filters will cause additional queries to be sent to the data source and thus may degrade performance.

Table calculation filters not better than LOD(include & exclude)

Using table calculations as filters does not have the same corresponding performance enhancements as dimension or measure filters. As discussed above, dimension and measure filters reduce the returned dataset, while table calculation filters do not. In the Tableau process flow, table calculations are not rendered until after the data is returned from the data source. This means that table calculations cannot be used to generate queries to limit returned data. Or, to put it another way, table calculation filters cause all data related to a given dimension or measure to be returned, after which Tableau executes the filter on the returned dataset.

To demonstrate this, perform the following steps:

1. Open the workbook associated with this chapter and navigate to the Late Filter worksheet.
2. In the Data pane, select the Superstore data source.
3. Create a calculated field named Cust Name Tbl Calc with the following code:
```
LOOKUP( MAX( [Customer Name] ),0 )
```
4. Place Customer Name on the Rows shelf.
5. Place Cust Name Tbl Calc on the Filters shelf and constrain to show only Aaron Bergman.
6. Place Sales on the Text shelf.
7. Right-click on Sales and select Quick Table Calculation | Rank.

In this exercise, the entire list of customers is returned to Tableau, after which Tableau deploys the filter然后Tableau会部署筛选器. Essentially, using Cust Name Tbl Calc as a filter merely hides the underlying data. This is useful because the rank returned for Aaron Bergman is correct. Merely filtering on Customer Name would return a rank of 1 for Aaron Bergman. Unfortunately, the correct results come with a performance hit正确的结果伴随着性能下降. Running the performance recorder on this exercise will show that the table calculation negatively impacts performance.

Fortunately, with the advent of LOD calculations, using table calculations as filters is often not necessary. LODs are calculated fields that include or exclude data independent of the current view. For more information, please refer to https://blog.csdn.net/Linli522362242/article/details/124550022, Table Calculations.

Using actions instead of filters

Another way to improve performance might be to use actions instead of filters. You can develop a dashboard that shows a high-level overview first and goes into detail only once the user selects something. The mechanics are similar to the ones we showed in the Aggregation section; however, aggregation happens per worksheet and not on the data source itself. By selecting a mark in the high-level overview, an action will be triggered. The user can dive deeper into details, but the level of detail will only be increased step by step. Hence, less data has to be loaded at once.

A very nice presentation regarding this topic can be found at
https://www.youtube.com/watch?v=veLlZ1btoms.

The next topic we will be discussing involves calculations. How can we write a calculation in the most efficient and performant way?

Efficient calculations

Calculations may be constructed differently and yet accomplish the same thing. Look, for instance, at the following example, which shows that an IF statement can be replaced by simpler code:

Tableau does not permit re-aliasing measures as this would involve modifying data values themselves. If you have a field in your data that contains values such as 0 and 1 but actually encodes information such as no and yes, Tableau will interpret this as a continuous measure. You can convert it to a discrete dimension and then re-alias the values. For more information about discrete vs continuous and dimensions vs measures, see:
Create Aliases to Rename Members in the View - Tableau
Since either of these scenarios will return the desired results, which should be used? The deciding factor is performance. This section will explore what to do and what to avoid when creating calculated fields in order to maximize performance.

Table calculation > basic expression(basic calculation) > Level of Detail Expression

In some instances, an LOD calculation might be faster than a table calculation and vice versa. If you are not sure, try both to see which one performs better. Also, if they're not really needed, use neither. Refer to the following diagram, which explains when to choose which:
This diagram, along with more advice about selecting a calculation type, can be accessed at https://www.tableau.com/about/blog/2016/4/guide-choosing-right-calculation-your-question-53667.

Types of Calculations in Tableau - Tableau

Table calculation > basic expression(basic calculation) > Level of Detail Expression

Basic calculations: These calculations are written as part of the query created by Tableau and therefore are done in the underlying data source. (Basic expressions allow you to transform values or members at the data source level of detail (a row-level calculation) or at the visualization level of detail (an aggregate calculation).)They can be performed

either at the granularity of the data source (a row-level calculation)

Understanding the level of detail of the source (often referred to as granularity is vital. Every time you connect to a data source, the very first question you should ask and answer is: what does a single record represent? If, for example, you were to drag and drop the Number of Records field into the view and observed 1,000 records, then you should be able to complete the statement, I have 1,000 _____. It could be 1,000 students, 1,000 test scores, or 1,000 schools. Having a good grasp of the granularity of the data will help you avoid poor analysis and allow you to determine if you even have the data that's necessary for your analysis.https://blog.csdn.net/Linli522362242/article/details/123767628
A quick way to find the level of detail of your data is to put the Number of Records on the Text shelf, and then try different dimensions on the Rows shelf. When all of the rows display a 1 in the view and the total that's displayed in the lower left status bar equals the number of records in the data, then that dimension (or combination of dimensions) uniquely identifies a record and defines the lowest level of detail of your data.
VVVVV
or at the level of detail of the visualisation (an aggregate calculation).
An aggregation is used when you want to change the granularity of your data.
https://blog.csdn.net/Linli522362242/article/details/124977491
The top half of this dashboard is easily achieved using a table calculation. Simply duplicate the Sales field and apply a difference quick-table calculation run across the Order Date dimension. However, if you try to then convert that computation structure into a table, you end up with the following:
You will realise that it’s not possible to achieve the specified layout with a table calculation as you need the Year dimension with the Measure Names dimension nested inside. Tableau cannot suppress the “Difference in Sales” row for 2013. So in this example, your only option is to use basic calculations:
```
[2013 Sales]
IF YEAR([Order Date]) = 2013 THEN [Sales] END

[2014 Sales]
IF YEAR([Order Date]) = 2014 THEN [Sales] END

[Difference]
SUM([2014 Sales]) – SUM([2013 Sales])
```

Level of Detail Expressions: Like basic calculations, LOD Expressions are also written as part of the query created by Tableau and therefore are done in the data source(Basic calculations can be performed either as row-level calculations or as aggregate calculations. So they can only answer questions at the granularity of the data source or at the level of detail of the visualisation. Level of Detail Expressions, on the other hand, can answer questions at any granularity.).

The difference is that LOD Expressions can operate at a granularity other than除了 that of the data source or the visualisation. They can be performed at

a more granular level (via INCLUDE),
a less granular level (via EXCLUDE)
or an entirely independent level (via FIXED).
Consider the following example in which we ask: What is the 90th percentile of sales at the order-detail level compared to the order-total level?
If you are familiar with the superstore data set (which ships with Tableau), you will know that it is one row of data per line item of each order. So if we consider the question above, we determine:
So for the left chart, we can solve this with a basic calculation, PCT90([Sales]). However, for the right chart, we must first total the order details to the order level and then perform the percentile aggregation. So we must use a Level of Detail Expression:

Table calculations: Table calculations are performed after the query returns and therefore can only operate over values that are in the query result set

There are several categories of problems that can only be solved using table calculations:

Ranking
Recursion (e.g. cumulative totals)
Moving calculations (e.g. rolling averages)
Inter-row calculations (e.g. period vs. period calculations)

So how do you choose the right calculation? Let's compare the different calculations types.

https://www.tableau.com/about/blog/2016/4/guide-choosing-right-calculation-your-question-53667

https://help.tableau.com/current/pro/desktop/en-us/calculations_calculatedfields_understand_types.htm

Other ways to improve performance

To conclude this chapter, let's consider a few other possibilities for improving performance.

Avoid overcrowding a dashboard

Often, end users want to see everything at once on a dashboard. Although this may be perceived as beneficial, it often is not. Consider the inclusion of a large crosstab on a dashboard. Does scrolling through pages of details add to the analytical value of the dashboard? Perhaps the answer is "no." Furthermore, an excess of information on a dashboard may obscure important insights. Diplomatically arguing for leaner[ˈliːnər]更精简 dashboards may lead to better decision making as well as better performance.

Fixing dashboard sizing

Dashboards can be set to an exact size or to Range or Automatic. Exact size精确的大小 results in quicker performance because once Tableau Server has rendered a view for one end user, that render stays in cache and can be reused for the next end user that accesses that dashboard. Automatic and Range, on the other hand, cause Tableau Server to attempt to determine the resolution size used by each end user and render the dashboard accordingly. This means that Tableau Server does not use the instance of the dashboard stored in cache for the next end user. This, in turn, impacts performance.

Setting expectations

If an end user is expecting near-instantaneous performance, then, of course, anything less is disappointing. Explaining in advance that a complicated, detailed-oriented dashboard may not be performant can help in at least two ways.

First, upon explaining the likely performance problems, a compromise may be reached that results in the creation of a less complicated dashboard that still delivers valuable information.
Second, if it is absolutely necessary for the dashboard to be complicated and detail-oriented, at least the end user has been warned that patience may be needed when interfacing it.

Summary

We began this chapter with a discussion of the performance-recording dashboard. This was important because many of the subsequent exercises utilized the performance-recording dashboard to examine underlying queries. Next, we discussed hardware and on-the-fly techniques, where the intent was to communicate hardware considerations for good Tableau performance and, in the absence of optimal hardware, techniques for squeezing the best possible performance out of any computer.

Then we covered working with data sources, including joining, blending, and efficiently working with data sources. This was followed by a discussion on generating and using extracts as efficiently as possible. By focusing on data sources for these three sections, we learned best practices and what to avoid when working with either remote datasets or extracts. The next sections explored performance implications for various types of filters and calculations. Lastly, we looked at additional performance considerations, where we explored a few more thoughts regarding dashboard performance as well as setting expectations.

In the next chapter, we will turn our attention to Tableau Server. Tableau Server is a dense topic worthy of a book. Thus, our exploration will be truncated to focus on Tableau Server from the desktop author's perspective.