mtb3_Tableau Prep + python(tabpy)_pivot_add flow_incremental_script_Latitude Longitude km房源_红灯和测速相机数

本文链接：https://blog.csdn.net/Linli522362242/article/details/124977491

Tableau Prep Builder was introduced with version 2018.1 of Tableau Desktop, but what can we use Tableau Prep Builder (henceforth referred to in this chapter as Prep) for? The core purpose of the tool is data preparation. The good news is, Prep is fully compatible with Tableau Desktop, and also with Tableau Server. That means you can execute jobs in Prep to clean your data with the click of a button. Additionally, Prep is as visual as its big brother, Tableau Desktop, meaning that you can see every step of data preparation in a fully visual format.

Therefore, let's dive into the Graphical User Interface (GUI) and be amazed by another high-end product, which will allow you to get initial data insights, enabling you to decide faster if your dataset is worth analysis. Prep will pave the way for an even smoother Tableau Desktop experience.

In this chapter, the following topics will be discussed:

• Connecting to data
• The Tableau Prep Builder GUI
• Prepping data
• Additional options with Tableau Prep Builder
• Exporting data

In order to get started, we need to load data. How to do so in Prep will be described in the following section.

Connecting to data

If you are familiar with Tableau Desktop, Tableau Prep Builder will be an easy game for you. The handling and interfaces are very similar, and connecting to data, if the connector is available in Prep, works all the same whether it's a text file, a database, or an extract. At first sight, you might not even notice a difference between the Tableau Prep Builder the Tableau Desktop GUIs, which provides the handy advantage that you can start prepping right away.

To get started, begin by opening Tableau Prep Builder:

From here, click on the + in order to open a file. After doing so, the following screen will appear:

From the preceding screenshot, we can see that you can choose the type of data you want to connect to in the search bar. Just as in Tableau, the repertoire[ˈrepərtwɑːr]全部曲目,全部技能(或才能) of Tableau Prep Builder includes multiple databases.

Now let's connect to a dataset with a practical exercise. For this exercise, we need the following dataset: https://www.kaggle.com/datasets/airbnb/boston. Please download calendar.csv, listings.csv, and reviews.csv. Alternatively, download them from the workbook associated with this chapter on Tableau Public: https://public.tableau.com/profile/marleen.meier.

First, we are going to start with the calendar.csv file. Add it to the empty Prep canvas by making a connection with a text file, followed by the selection of your .csv file. You will now see the following screen:

Congratulations—you've just made your first Tableau Prep Builder connection. Here, you can manipulate and visualize your connected dataset as required!

The Tableau Prep Builder GUI

The Tableau Prep Builder GUI has two important canvases to look at. Right after you have connected data to Tableau Prep Builder, the workspace will split into several parts

Let's look at what we can see from the preceding screenshot:

• A: The connection pane, showing you the input files available at the location selected.
• B: The flow pane, which shows your current Prep flow. This always starts with an input step. The Flow Pane, where you will logically build the flow of data with steps that will do anything from cleaning to calculation, to transformation and reshaping. Selecting any single step will reveal the rest of the interface that is specific to that step.
• C: The input pane settings, which give you several options to configure your input.
• D: The input pane samples, showing the fields you moved to the connection pane, including sample values.

In the input pane (the section marked with C), you can use the wildcard union (Multiple Files) function to add multiple files from the same directory. Also, you can limit the sample set that Tableau Prep Builder will print in order to increase performance.

In the input pane samples (the section marked with D) you can select and deselect the fields you want to import and change their data types. The data type options are, for example, strings, dates, or numbers.https://blog.csdn.net/Linli522362242/article/details/123873227

The second GUI is the profile pane(3). Once you've selected the input data needed, click on the + in the flow pane(1) and select Add: Clean Step. Now the profile pane will appear:

In the preceding screenshot, the profile pane shows every column from the data source in two sections. The upper sections show aggregates. For example, column 2, date, shows the number of rows per date in a small histogram. The columns can all be sorted by clicking on the sort icon (a mini bar-chart that appears when your mouse is hovering over a column) next to the column name and by selecting one item. Let's take, for example, True, in available (column 3). All related features will be highlighted: This gives you the chance to get some insights into the data before we even start to clean it up. In the following screenshot, each row is shown as it is in the data source in the lower part of the profile pane(4:The Data Grid shows individual records of data as they exist in that step):

So far we have seen that, after loading data in Prep, visual filters can be applied by clicking on a field or bar in one of the columns. The lower pane(Data Grid) will always show the data source of the selection made at the row level (OR Selecting a value in the profile pane will filter the Data Grid to only show records containing that value.). Next, we will continue by adding more data sources.

Getting to know Tableau Prep Builder

Let's start with a practical example.

Next to the calendar.csv file, connect to the following files:

• listings.csv
• reviews.csv

Now, drag them onto the flow pane(click : Add connection |Text File, then choose listings.csv and reviews.csv):

Can you answer the following questions?

• How many listings use the word "beach" in their description?
• What is the percentage of condominiums[,kɔndə'miniəmz]公寓,托管公寓 that are marked with "exact location"?
add to Contexthttps://blog.csdn.net/Linli522362242/article/details/124730062
and
• On which day were the most reviews entered?

Without a tool like Tableau Prep Builder, it is much more difficult to find the solution to these types of questions. Prep makes our data analytics journey much faster and easier and that is exactly the reason why I encourage you to spend the additional time and learn Prep as well as Tableau Desktop!

Here you see, as an example, I used the sort function on date in order to answer the third question: On which day were the most reviews entered?
(Dates are sorted because each review has a date and the date histogram shows the number of reviews for each date)

As you can see, sorting this field ordered the dates by number of entries, thus making it very simple to answer the question, on which day were the most reviews entered? After the first few clicks, it already starts to feel natural, doesn't it?

All the changes you made to the dataset can be traced back on the left side of the profile pane. But don't forget to add a proper name to each step: this will make it much easier for others, and yourself, to understand your process.
There is an icon above the Clean 1 step in the flow, indicating changes that were made within this step. In this case, the change is to conver the data type of listing_id from number to string type.

So, let's continue on to the cleaning features of Tableau Prep Builder.

Prepping data

Tableau Prep Builder comes with lots of different features. Sometimes you might use many different tools to prepare your dataset in order to get it in a shape you desire. Other times you might just run an aggregation (one feature) and be done. It really depends on the dataset itself and the expected output. The fact is, the closer your Prep output data is to what you need for your Tableau Desktop visualization, the more efficiently VizQL will run on Tableau Desktop. Fewer queries in Tableau Desktop means faster generation of dashboards.

To me, the best part about Tableau Prep Builder is that it can handle a huge amount of data. Sometimes I even use it for datasets I don't want to visualize in Tableau Desktop, just to get a quick overview of, for example, how many rows contain a specific word, how many columns are needed, what happens to the date range if I filter a particular value, and so on! Within a few minutes I have insights that would have taken me much more time to get with database queries or Excel functions. I hope that by the end of this chapter you will be able to cut your time spent data prepping in half (at least). We will divide the prepping features into five subcategories: cleaning, unions and joins, aggregating, pivoting, and scripting. Let's start with cleaning data!

Cleaning data

We have seen the following canvas before in the The Tableau Prep Builder GUI section. To create the cleaning step, the user can simply click on + next to the input and select Add: Clean Step. During the cleaning step, multiple operations can be performed, such as filtering or creating a calculated field. Also note the recommendations Tableau Prep Builder gives you:
and

Tableau Prep Builder analyzes the column content and proposes changes that might fit the data.

The column listing_url for example is being recognized as a webpage and therefore Prep recommends you change it to the data role URL.
The second, third, and several more recommendations after listing_url are to remove certain columns. This is probably the case because the column does not contain any data or contains only a small amount of data.
The list goes on.

This feature can be very useful, especially for unfamiliar datasets. My way of working would be, look at the recommendations, check if they make sense, and execute the change—or not. Don't blindly trust these recommendations, but they can point out data flaws[flɔː]缺陷,瑕疵 you might have missed otherwise.

Data is often messy, involving null values, typos打字错误 from manual entries, different formatting, changes in another system, and so on. As a consequence, you will have to sort out the mess before you can get reliable results from an analysis or a dashboard. This section will show you how to clean data on a column level.

Once a value is selected within your clean step, you have the option to Keep Only, Exclude, Edit Value, or Replace with Null:Figure 3.11: Quick access

None of these changes will change the data source itself. Prep is like an in-between step, or a filter between the original data source and your Tableau Desktop. Excluding a value, as highlighted in Figure 3.11, will only remove it from Tableau Prep Builder. However, if used later on as input for Tableau Desktop, there won't be an option to add that specific value back in. This option will remain in Tableau Prep Builder only.

Another slightly hidden option is to click on the ellipses (...) next to the column headers (as shown in Figure 3.12) and select Clean:
vs

This Clean functionality operates based on the data type of the column.

In the preceding screenshot, the data type is a string (indicated by the Abc icon). The option allows you to make use of eight more cleaning features
For other data types, the option will be grayed out since Clean is only available for strings.

The data type can be changed just above each column header; you will find a symbol above the column name, which can be changed by clicking on it, just like in Tableau Desktop:

This comes in handy in case Prep misinterprets the data type of a column. A wrongly assigned data type can have effects on the calculation you perform on them and how Tableau Desktop would visualize the column.

In the upcoming section, we will practice combining multiple datasets. You might want to combine for example order data with shipment data (using join) or append 2020 sales data to your 2019 sales dataset (using union). To find out how, read on!
Cleaning the data :https://blog.csdn.net/Linli522362242/article/details/123873227

Unions and joins

We will now join the three datasets, so that we may analyze and visualize them together. The nature of the three datasets ( listings.csv(房源) , reviews.csv(房源评论) , and calendar.csv ) require a join but not a union. However, I will still walk you through the steps of a union in order for you to know when you need it!

Union step: A union step allows you to bring together 2 or more branches representing sets of data to be unioned together. You will have options for merging or removing mismatched fields.

unions increase the number of rows, to perform a union, both datasets need to have almost the same layout/header
Join step: A join step allows you to bring together 2 branches of the flow representing sets of data that can be joined on one or more key fields. You will have options for selecting the kind of join as well as the join fields.

joins change the number of columns
merging mismatched fields, and removing unnecessary fields:https://blog.csdn.net/Linli522362242/article/details/123873227

Adding unions

A union in Tableau is the appending of data to an existing dataset. Imagine you have two Excel files. Both have the exact same header, but one contains data from 2019, the other data from 2020. If you union the 2020 data to the 2019 data, you append the rows of the 2020 files to add them underneath the rows of the 2019 files. To perform a union, both datasets need to have almost the same layout/header. Why almost? You will see in the following exercise:

1. To begin this exercise, your Prep flow pane should look like this:
==>==>You can change the color of each step according to your preferences by right-clicking on a step and selecting Edit Color.
2. Add a clean step by clicking the + next to the listings dataset.
3. Select the column host_identity_verified and filter on True:
4. Now, create a calculated field, called DatasetA_ID, containing just a string, 'A'
5. Create a second cleaning step from the same listings data and filter this time on False in the host_identity_verified column:
==>then the editing color is the same as the listing color for both Clean steps
6. Create a calculated field DatasetB_ID containing just a string 'B' , using the same process shown in Figure 3.17.
Renaming the steps by right-clicking and hitting Rename helps to keep track of the applied changes. Your flow should look like the following:
According to the value of the host_identity_verified field (True and False), the listings are divided into DatasetA and DatasetB, and the label fields(DatasetA_ID and DatasetB_ID) are added respectively.
7. You have now created two datasets that can be combined using a union, without changing the original data. Union by dragging one step over the other (DatasetB over DatasetA or vice versa):
8. Check whether all fields match by clicking on the step Union 1 and looking at the input details in the bottom left of the screen:
9. In the preceding screenshot, you can see that on the left, you have information about the union. From top to bottom: the Inputs and their respective colors (these are the same as in the flow pane), the Resulting Fields, showing 2 mismatched fields from 98 resulting fields, and finally, the Mismatched Fields themselves: DatasetA_ID and DatasetB_ID. The colors next to each mismatched field show that our DatasetB_ID field does not exist in DatasetA but exists in the Dataset B,
and the DatasetA_ID field does not exist in DatasetB but exists in the Dataset A.
10. In this case, we know that those columns can be appended even despite the different naming conventions. A better name for both fields would have been Dataset_ID (you can go a few steps back and try this out). Prep will not show any mismatches then. But for us here, we can go ahead and drag and drop one field on top of the other to merge them and thereby give Prep the permission to view them as one column:
11. See how the Resulting Fields now show 0 mismatches, and how both dataset colors are now represented in the bar underneath the column header:
vs
Then rename the field name
==>
12. Rename by selecting and right-clicking the union step from Union 1 to listings2, and add a description by selecting and right-clicking the union step and hit Add Description if you like—all the extra effort you make now will help you later with replication, documentation, and explanation.

Congratulations, your first union was a success. We first split the original listings dataset in two and then combined them again to one. Stacking data with the same layout on top of each other, as we've done here, is a typical task for unions.

Adding joins

Now, we will continue to work in the same flow pane and focus on a join. As a quick refresher, a join appends data horizontally. As a rule of thumb, unions increase the number of rows, joins change the number of columns.

1. Drag listings2 onto reviews until the Join option appears.==>
2. Select listing_id from reviews and id from listings2 for your join.
Because some reviews do not involve listings, you cannot use the review id to join with listings. listing_id is the FOREIGN KEY to listings2
Please note that you can change the join type by clicking on the Venn diagram under Join Type:What the preceding overview tells us is that all 68,275 rows of the reviews dataset were matched to 2,829 rows in the listing2 dataset. However, we do have 756 mismatches from the listing2 dataset. Our total Join Result is therefore 68,275 rows. But it is likely that not all listings have reviews and therefore
we can safely assume that our join worked as expected.
VS
3. Another check could be to change the join to a full outer join and dive deeper into the data later in Tableau Desktop. In order to get to a full outer join, simply click on the outer edge of the two circles:
4. If you want to check immediate results, simply right-click on, for example, the join symbol (represented by the Venn diagram icon, as shown in the following screenshot), and you'll be able to check the data in Tableau Desktop by selecting Preview in Tableau Desktop:
5. We can also rename the step; let's change our join's name to lis&rev.

By now, we've seen how to clean data, and how to use unions and joins.

The union part taught us how to append data vertically or underneath each other.
The join part allowed us to combine data horizontally or next to each other. More specifically, we

More specifically, we were able to combine the listings with their respective reviews. Instead of having two datasets, we have created one that allows us to look up a listing review, and in the same row, view all the data regarding the listing itself, like type of listing房源类型, number of bedrooms, whether it has an ocean view, and so on.

Let's continue next with the aggregation step.

Aggregating

An aggregation is used when you want to change the granularity of your data. In our dataset, we have one row per review. However, we want to see whether hosts that have been in the Airbnb business in Boston longer have more reviews compared to hosts that started more recently. In order to do so we need to get an aggregated number of reviews per year that the host started offering listings. The field host_since will give us helpful information as well as the field reviewer_id. For the latter we will count the distinct reviewers that left reviews. Let's do it!

Our current flow pane looks like this:

To aggregate the amount of reviews, please take the following steps:

1. First, click on the + sign next to the lis&rev and choose Aggregate:
2. This will open a window entitled Aggregate 1. Our goal is to see how many people reviewed accommodation[əˌkɑːməˈdeɪʃn]住宿 per year from when the host started offering listings; therefore, take the following actions.
- a. Using the search field in the Additional Fields pane, as indicated by the arrow on the left in Figure 3.30, add the host_since field to Grouped Fields by clicking it then dragging and dropping.
- b. Add the reviewer_id to Aggregated Fields, also by clicking it then dragging and dropping. Note that the default aggregation is SUM:
  then
  SUM : How many reviews are there each year
  
  Let's have a look at the Tableau Desktop preview by right-clicking on the Aggregate step on our flow pane and selecting Preview in Tableau Desktop:
3. Change the reviewer_id aggregation by clicking on SUM under Aggregated Fields, and change it to Count Distinct:
count the distinct reviewers that left reviews
4. This will change the default aggregation, Sum (where every review is counted), to Count Distinct, which counts the distinct reviewer ID aggregation. This will allow us to focus on how many different reviewers left reviews. This is just an arbitrary choice; feel free to try out the other aggregations and see how the data changes.
Let's have a look at the Tableau Desktop preview by right-clicking on the Aggregate step on our flow pane and selecting Preview in Tableau Desktop:

In the preceding screenshot, we can see the distinct count of reviews per host_since date. Of course, in this specific example, Tableau Desktop could also count the reviews per host_since date if those were the only fields dragged onto the view without the help of Tableau Prep Builder. But a benefit of aggregating prior to importing the data into Tableau Desktop is that you have less data to import; hence, it's possible to achieve better performance. If you want to continue analyzing the dataset in either Tableau Prep Builder or Tableau Desktop, you can now ask further questions, such as the following:

• Which type of accommodation has the highest discount for staying a week rather than a day?
([Price]*7-[Weekly Price])/([Price]*7)
• Is there a correlation between the size of the accommodation and the monthly price?
• How many hosts (as a percentage) have more than one listing?
- Total number of distinct Host Id : COUNTD([Host Id])
- ==>FIXED [Host Id] : [Host Id] is partioned and Id (Listings.Csv) is addressed==>
  
  Count distinct list_id for each Host : { FIXED [Host Id]:COUNTD([Id (Listings.Csv)]) }
- ```
IF SUM([count  list_id for each Host]) > 1
    THEN 1
ELSE
    0
END
```
- ```
SUM(
    IF [count  list_id for each Host] > 1
        THEN 1
    ELSE
        0
    END
)
```
  ==>351
  How many hosts (as a percentage) have more than one listing?
```
SUM( IF [count  list_id for each Host] >1 
        THEN 1
     ELSE 
        0 
     END
)/COUNTD([Host Id])
```

Good luck and happy dashboard building!

We finish this section here and look back at a graph consisting of only 9 datapoints, which we created by aggregating the almost 64,000 rows of review_id data by year that the host started offering the listing. The graph tells a story; 64,000 individual rows of data don't. That is why it is so important to always reduce your data, and aggregate it if possible. In the next section we will talk about another important feature: pivoting.

Pivoting

Do you recognize a situation where you drag and drop your fields onto your Tableau Desktop row and column shelves but somehow the visualization doesn't do what you want? Chances are that you have to turn your data, or better, pivot it. If data that you expect to be in one column, spread over multiple rows, appears in multiple columns instead, it's a case for pivoting. The following example will showcase a need for pivoting.

For our pivot example, we will make use of another dataset. You can find it at https://public.tableau.com/profile/marleen.meier . The dataset is very simple and looks like this:
It has three different colors that were observed on five different days, x amount of times. This data is a typical example of when pivoting is helpful to tidy up the data because multiple columns have the same purpose—B, C, and D are all observations. If you wanted to visualize this table, it could look like the following:

But you might be wondering, since the color scheme already indicates which color has been observed, wouldn't it be better to put all three lines in one graph? Separate lines per color, within one line-chart? Let's accomplish this!

Begin by connecting the Pivot DataSet to Tableau Prep Builder, then follow these steps to pivot the data:

1. From Prep, connect to the just created Excel file and add the Pivot DataSet to the flow pane, click on the + next to the input data step, and select Pivot. A new step called Pivot 2 will appear:
==>
2. Click on the Pivot 2 step in the flow pane, and Figure 3.36 will appear at the bottom of your Prep window. Select all three colors from the Fields pane shown in Step I, and drag them onto the Pivoted Fields pane, as shown in Step II:
3. Now, have a look at the Pivot Results pane, as shown in Step III of the preceding screenshot, and rename Pivot1 Names to Colors and Pivot1 Values to Observations by right-clicking and selecting Rename.And, we now have a clean table, called Clean 1 by default:
vs
==>
4. Remember our problem from the beginning? Our line chart can now be created in a different way, as shown in Figure 3.38. Note that Colors is a dimension now and the Observations are their own field, compared to before when they were located under the column header of each color:
This is a simple use case for pivoting, but the technique here can be transferred to any other dataset.

As a quick tip: Tableau Prep Builder has a built-in feature recommendation. Whenever you see a light-bulb at the upper-right corner of a column, check it out and see if this recommendation applies to your needs:

In this case, Tableau recommends us to remove all leading and trailing spaces, which is exactly what we need! As demonstrated here, it's always worth it to check the recommendations.

another pivot example:(Transforming the data for analysis) https://blog.csdn.net/Linli522362242/article/details/123873227

Perfect, you are all set with the main Prep functionality. However, you might be wondering what you are supposed to do if you require any further preparation. No problem, Prep has you covered. Scripting is the newest addition to Prep and the most powerful of all; it is virtually limitless! Continue reading to explore Prep's scripting feature.

Scripting

The script functionality that we will discuss next is one of the more recently added features to Tableau. We will discuss it next because a programming language like Python or R gives you endless flexibility. If you ever reach the boundaries of Prep's offerings you can fall back on scripting and write your own code to do exactly what you want.

In order to make use of it, you have to connect to Python (or R, if using), outside of Tableau first. The following steps are based on the instructions from https://help.tableau.com/current/prep/en-us/prep_scripts_TabPy.htm for Python or https://help.tableau.com/current/prep/en-us/prep_scripts._R.htm for R.

For this exercise we will be using the Boston Airbnb dataset, more specifically the calendar table. Download this dataset from https://www.kaggle.com/datasets/airbnb/boston and connect to the calendar table in Tableau Prep Builder:

Now, let's take a step back and connect Python to Tableau Prep Builder:

1. Download and install Python from python.org (or download and install R from https://www.r-project.org/ ).
2. Download and install tabpy by executing the following command in your terminal or on the command line:https://blog.csdn.net/Linli522362242/article/details/108037567
```
pip install tabpy
```

3. Alternatively, if using R, open R and execute:

install.packages("Rserve", , "http://rforge.net")

4. Open tabpy on the command line/terminal by entering the following command:
```
tabpy
```
Or, in the R GUI, type:
```
library(Rserve)
Rserve()
```
5. Back in Tableau Prep Builder, click on the + on the right-hand side of the calendar table and add a Clean Step.==>
6. Add a calculated field called PythonTest (use RTest instead if you've chosen
to experiment with R) with a string value "Test" :==>
7. Now, click on the + on the right-hand side of the Clean 1 step and add Script.
By selecting the Script step, the Settings pane at the bottom (see Figure 3.41) will give you an option to connect to Rserve or TabPy. Rserve is the Tableau server for R and TabPy is the Tableau server for Python.
8. Choose your connection type by selecting Rserve or Tableau Python (TabPy) Server. Connect Tableau Prep Builder to tabpy by using localhost and Port 9004 (if using R, connect to Rserve by selecting it under Connection type and using localhost and Port 6311 in the same popup):
Note: If using an SSL-encrypted Rserve server, Port 4912 is the default port.

==>

9. Next, if experimenting with Python, create a .py file containing the following code:
PrepTableau.py

def ChangeString(df):
    df['PythonTest'] = 'Python'
    return df

Alternatively, create a .R file containing the following code:

Get ChangeString<- add_column(
.data,
.before = 'Test'
.after = 'R') {
Return (data.frame ())
}

10. The script we just created is written to change an existing column in Prep and rename every row from Test to Python or R , depending on which language you've chosen.
11. We'll continue with the Python script. Back in Tableau Prep Builder, browse for the .py file we just created and add the function name to the Tableau Prep Builder interface. I called the file PrepTableau.py and the function name is ChangeString, as defined in the preceding step:
12. Run the flow and observe that the PythonTest column has changed from Test to Python:

Our little experiment worked! And even though we used a very simplistic example, it shows that the scripting feature works, and you can just follow your creativity with what else you might be able to do by using scripting. Examples include everything from calculating an inverse normal distribution to machine learning with sentiment analysis or clustering.

You might have noticed that we did not import pandas in the script itself, but still used a pandas DataFrame. This is due to the fact that the pandas import comes with tabpy . You can see this in your command line/terminal after opening tabpy:

If you want to use other libraries in your script, you can install them by using pip install on your command line/terminal. You only have to restart tabpy afterward and you will be able to use them too. Remember, always open tabpy on the terminal/command line first or else Tableau Prep Builder can't execute the script.

At the time of writing, Tableau Prep Builder does not support a script as input yet—you could however use a dummy .csv file as a workaround and add a script to the flow immediately after it.

TabPy(python)+tableau desktop : https://blog.csdn.net/Linli522362242/article/details/123606731

Additional options with Tableau Prep Builder

This section will discuss some smaller stand-alone but very powerful features. Some of you might have been using Prep already since the very first release. With each release and each additional user and given feedback, Tableau was able to create a priority list of features that stakeholders—like you and I—really wished to see. The upcoming topics are some of the best examples. None of these features were present in the initial release but have made the product so much better, more valuable, and more complete. You are able now to save part of your flow and reuse it, write back your data to an external database, refresh parts of your dataset, and schedule flows with the Tableau Data Management add-on. Let's take a closer look and start with the 'insert flow' in flow function.

Insert flow

An option in Tableau Prep Builder is to add an existing flow to your own flow. For example, say someone already worked on the Boston Airbnb dataset and created a flow of the highlighted steps only:

==>
You could save the steps you wish to insert into your flow by right-clicking and selecting Save to File, as shown in Figure 3.46. These steps can be inserted back into your flow by right-clicking on the canvas, selecting Insert Flow, and browsing your .tfl files for the previously downloaded file:

==>

After inserting the file containing the steps you previously downloaded, the result will look like this:

==> rename listings2(1) to listings2
==> drag listings 2 to the Add option of lis&rev

==>

rename the necessary steps' name back
==>==>

This functionality is very convenient to share work and reuse already created steps. Next we will look into another functionality that will allow you to refresh the data used in your Prep flow: the incremental refresh.

Incremental refresh

Tableau Prep Builder 2020.2 saw the introduction of the incremental refresh feature. After completing a flow and adding an output you can decide if you want a full or incremental refresh, and which field you want to use as an indicator. Imagine you have a dataset of infections per day of a particular disease. By every end of the day, you want to add the new data, which comes from a health ministry database, to your dataset. So, what are you going to do?

Typically, you would load the whole dataset, all of it, after the newest datapoints were added. But nothing changes about the datapoints older than that day; they will remain the same and you loaded them before already. Therefore, it is a waste of resources to reload the whole dataset. An incremental refresh will only load new datapoints and add them to the existing data, which has been built up over many days. You can imagine how much time and effort this will save. It is a much more sustainable way to keep your data up to date than a full refresh.

We are going to look at an example in the following workflow. It is important that the Output is there:==>

To add the incremental refresh, take the following steps:

1. Click on the calendar input table and scroll down the Settings tab until you see the Set up Incremental Refresh option. Then, click the Enable toggle:
2. After you click on the Enable box, a dropdown will appear. This dropdown contains all fields that Tableau Prep Builder finds useful as an indicator for a data refresh. In our case, date is the perfect field to use since as soon as a new date is added to the original data source, the refresh will start:
==>
3. Tableau Prep Builder will also ask you for the output field that will represent the date column in the Output step. In our case it is still date.
4. After everything is set up, click on the Output step again and note how you can now select the type of refresh, full or incremental, you prefer for this flow by clicking on the drop-down menu on the Run Flow option:

You just learned how to set up your data refresh—full as well as incremental. This will help you with the sustainable usage of compute resources and to always have your data up to date. Next we will introduce another Tableau feature that unfortunately does not ship附带 with the normal license but is definitely worth investigating—Tableau Prep Conductor.

Tableau Prep Conductor

With Tableau Prep Conductor you can publish your Prep flows to Tableau Server, schedule them to run at a certain time, and always have refreshed data. The conductor is available in the Tableau Data Management add-on and doesn't ship with Tableau Desktop nor Tableau Prep Builder.

For more information, please check https://www.tableau.com/products/add-ons/data-management.

Exporting data

Last, but not least, we may want to export our data. We have seen before that a right-click on a symbol in the flow pane offers the possibility to preview the data in Tableau Desktop:

1. If you want a flow to run according to a schedule or save it to a location, it's best to use Add | Output:
2. After you do this, the following screen will appear:
3. The user now has the option to save the data to a file, and also to save it in Hyper file format as .hyper , as .xlsx (new in Tableau 2021.1), or as a .csv file.
4. Another option is to publish the newly generated data source directly to Tableau Server and make it available for other users. For this option, select Publish data source instead of Save to File from the dropdown.==>
5. A third option to save Prep output was been added in Tableau 2020.3: writing the resulting data to an external database. If you want to do so, select the option Database table from the dropdown. At the time of writing, 11 databases are compatible with the write back functionality, but more will likely be added based on user demand:

This section has shown us how to save our work or even more so, the output of our Prep flows. To me, the ability to save output to an external database is the best new feature, because it is the easiest way to share my output in a structured and organized way with others. So not only I, but also other users, can work with the prepped data, saving additional work. The flow can then be reused not only by me, but by anyone.

Summary

We started this chapter with an introduction to Tableau Prep Builder. We looked at the GUI and how we can connect data to it. After that, we did some exercises regarding data preparation. This can be divided into five parts: data cleaning, unions and joins, aggregating, pivoting, and scripting. We also considered several additional options to improve the usefulness of your data, including inserting steps from other flows, and incremental refreshes. To round out this chapter on Tableau Prep Builder, we looked at exporting data. Here, we saw that the new dataset can be saved as a file or data extract, written back to an external database, or pushed directly to Tableau Server.

Just like Tableau Desktop, Tableau Prep Builder is very much self-explanatory and highly visual. Colors, symbols, and highlights make it easy to get used to this extract, transformhttps://blog.csdn.net/Linli522362242/article/details/123873227, and load tool, which is invaluable for preparing your data before manipulating it on the main Tableau interface!

Chicago Red Light and Speed Camera Data

Exercise: connecting data to Tableau Prep

For this exercise, we need the following dataset Chicago Red Light and Speed Camera Data: https://www.kaggle.com/datasets/chicago/chicago-red-light-and-speed-camera-data. Please download it by clicking on the DownloadAll button on the right site of the page. Alternatively, download it from
the workbook associated with this chapter on Tableau Public: https://github.com/PacktPublishing/Mastering-Tableau-2019.1-Second-Edition/tree/master/Chapter03 First, we going to start with the red-light-camera-locations.csv file. Add it to the empty Prep canvas by making a connection with a text file, followed by the selection of your .csv file. You will now see the following:

==>==>

==>please click on the + in the flow pane and select Add Clean Step==>

==> In the preceding screenshot, the profile pane shows every column from the data source in two sections. The upper sections show aggregates. For example, column 5, GO LIVE DATE, shows the number of rows per date in a small histogram. The columns can all be sorted by clicking on the sort icon next to the column name and by selecting one item. Let's take, for example, EB, in FIRST APPROACH (column 2). All related features will be highlighted:

This gives you the chance to get some insights into the data before we even start to clean it up.

Exercise: getting to know Tableau Prep

==>==>

Can you answer the following questions?

How many year's difference are there between the go-live date of the first red-light and the first speed-light camera?
-
On which intersection[ˌɪntərˈsekʃn]交叉, [交] 十字路口 did the most red-light-camera violations happen闯红灯次数最多的路口发生在哪个路口?
On which day did the first speed-light-violation happen for camera CHI040?

Here you see, as an example that I used to filter on CAMERA ID in order to answer the third question: On which day did the first speed-light-violation happen for camera CHI040?==>

Cleaning data

During the cleaning step, multiple things can be done, such as filtering and creating a calculated field:

1. Once a value is selected, you have the options to keep it, exclude it, edit it, or replace it with Null:
2. Another slightly hidden option is to click on the ellipses (...) next to the column header and select Clean.
The following is the screenshot:
This clean functionality is based on the datatype of the column. In the preceding screenshot, the datatype is a string. For other datatypes, it will be greyed out.
The datatype can be changed just above each column header, you will find a symbol on top of the column name which can be changed by clicking on it. Just as in Tableau Desktop.
During the cleaning step, fields can be removed even if they were already used in a calculation, unlike in Tableau Desktop.

Exercise: cleaning data

Please execute the following tasks:

Remove the field Location from the red-light-camera-locations dataset using Tableau PrepRemove the field Location from the speed-camera-locations dataset using Tableau Prep

Remove the field Location from the red-light-camera-violations dataset using Tableau Prep

Remove the field Location from the speed-camera-violations dataset using Tableau Prep

Unions and joins

In our example, we look at speed- and red-light camera data. However, we currently have four different datasets, which ideally need to be combined. In order to do this, we will create two unions and a join. Let's start!

Next, connect Tableau Prep with this dataset.

1. Use a Union on the two location and the two violation datasets by dragging them on top of each other:
vs
2. Check whether all fields match:
In the preceding screenshot, you can see that on the left, you have information about the union, the Inputs, and the associated colors (which are the same in the flow pane), the Resulting Fields with one mismatch fields from 11 resulting fields, and finally, the Mismatched Fields. The colors next to it show that green, our red-light-camera-violations intersection field does not exist in red, our speed-camera-violations.csv . The same can be seen again in the profile pane.

There might be a situation where you see a mismatch due to different column headers. Let's view this in the following screenshot:
In this example, GO LIVE DATE and GO-LIVE DATE are not exactly the same. Tableau Prep does recognize similarities and marks it yellow. To union those two fields, you simply select one of the two and then the mouse pointer points to another, click on the + icon that appears.

Now, remove the null values from LATITUDE and LONGITUDE, we need this later for the join.==>

Also, as in the following screenshot, change the LATITUDE and LONGITUDE in both unions to a string, and crop them to 8 characters only by adding two calculated fields (Longitude Join and Latitude Join):
Longitude Join

LEFT([LONGITUDE],8)

Latitude Join

LEFT([LATITUDE],8)

Tableau created a new column, Table Names. Please change the values to red-light and speed. Do this for both unions.

3. Rename our unions, as shown here:
You can also add a description – all the extra effort you make now will help you later with documentation and explanation.
4. Drag Locations union step onto Violations union step and join the two.
5. Select the following three fields (Table Names, Latitude Join, and Longitude Join) for an inner join and check the results:
What this overview tells us is that from 557,000 rows of violations, we were only able to match 142K. From 310 rows of locations, we were only able to match 132. Our Join Result is 142,266 rows. This could be due to slightly different Longitudes and Latitudes – one file might have a NAD83 source and the other WGS84; however, we will continue for now. Change the join to a full outer join and dive deeper into the data later in Tableau Desktop.

Always wanted to know what the relation between Latitude, Longitude and actual km are?
- 1 degree Latitude is approximately 110.5 km
- 1 degree Longitude is approximately 111.3*cos(latitude) km
- WGS84 has 0.21mm longer ellipsoid than NAD83

In order to get to a full outer join, simply click on the outer edge of the two circles: If you want to check immediate results, simply right-click on, for example, and the Join symbol (represented by the Venn diagram icon, as shown in the following screenshot), and you'll be able to check the data in Tableau Desktop: By now, we've seen how to clean data, and how to use unions and joins, so let's continue with the aggregation step.

Introduction to aggregating

An aggregation is used when you want to change the granularity of your data. In our dataset, we have one row per violation. However, we want to group them by camera in order to see where violations happen the most. Let's do it!

Exercise: aggregating

1. Our current flow pane looks like this:
2. Now click on the + sign next to the Join 1 and choose Add Aggregate:
3. Our goal is to see violations per camera; therefore, we will add the sum of violations to Aggregated Fields, and other fields such as LATITUDE, LONGITUDE, CAMERA ID will be added to Grouped Fields:
4. Let's have a look at the Tableau Desktop Preview by right-clicking on the Aggregate symbol and select Preview in Tableau Desktop:

In the preceding screenshot, we can see the amount of violations per camera. Of course, Tableau Desktop will also sum up the violations per camera if those were the only fields dragged onto the view. But another benefit of aggregating it upfront is that you have less data to import; hence, it's possible to achieve better performance. If you want to continue analyzing the dataset, you can now ask further questions, such as the following:

Do new cameras have an effect on the number of violations from older cameras?
Is there a correlation between the number of violations from red-light and speeding cameras?
In which months do the most violations happen and is it always the same location?

Exporting data

Last, but not least, we want to export our data. We have seen before that a right-click on a symbol in the flow pane offers the possibility to preview the data in Tableau Desktop:

1. If you want a flow to run according to a schedule or save it to a location, it's best to use Add Output:
2. After you did this, the following screen will appear: