python打印星图_Python中的星图

最新推荐文章于 2024-01-04 12:16:04 发布

weixin_26737625

最新推荐文章于 2024-01-04 12:16:04 发布

阅读量920

点赞数

文章标签： python

原文链接：https://towardsdatascience.com/stars-charts-in-python-9c20d02fb6c0

版权

这篇博客介绍了如何使用Python来打印或绘制星图，参考了来自Towards Data Science的文章，提供了相关代码示例。

摘要由CSDN通过智能技术生成

python打印星图

Diamonds are a data scientist’s best friend. More specifically, the diamond dataset found on Kaggle. In this article, I will walk through a simple workflow to create a Star Chart (aka Spider Chart or Radar Charts). This tutorial was adapted from the wonderful workflow of Alex at Python Charts. All of the code in this article and the needed dataset are available on GitHub.

钻石是数据科学家的最好朋友。更具体地说，在Kaggle上找到的菱形数据集。在本文中，我将通过一个简单的工作流程来创建星图(又名蜘蛛图或雷达图)。本教程改编自Python Charts的Alex出色的工作流程。 GitHub上提供了本文中的所有代码和所需的数据集。

钻石是数据科学家的最好朋友 (Diamonds are a Data Scientist’s Best Friend)

To begin you will need a few libraries. I am running Python 3. I created this workflow using a Jupyter notebook, pandas, matplotlib, and numpy. These packages can be installed via pip or conda if they are not already on your system.

首先，您将需要一些库。我正在运行Python3。我使用Jupyter笔记本，pandas，matplotlib和numpy创建了此工作流程。如果这些软件包尚未安装在系统上，则可以通过pip或conda进行安装。

pip install jupyterlab
pip install pandas
pip install matplotlib 
pip install numpy

The dataset can be downloaded from Kaggle and should be around 3.2 MB. I have included a copy of the dataset on the Github. I have the dataset in the data folder. Load the dataset with pandas, drop the extra index column, and we are off!

数据集可以从Kaggle下载，并且大约为3.2 MB。我已经在Github上包含了数据集的副本。我在数据文件夹中有数据集。用大熊猫加载数据集，删除多余的索引列，我们关闭了！

df = pd.read_csv("data/diamonds.csv")
df.drop("Unnamed: 0", axis=1, inplace=True)

3 C的电平 (Levels of the 3 C’s)

The 4 C’s of diamonds are Cut, Color, Clarity, and Carat. Cut, Color, and Clarity are defined as categorical variables used in the diamond industry. Carat is a numeric representing the weight of a stone.

钻石的4 C分别是切工，颜色，净度和克拉。切工，颜色和净度被定义为钻石行业中使用的分类变量。克拉是代表宝石重量的数字。

Image for post — Photo by Hao Zhang on Unsplash

To create the Star chart, we need to represent the diamond industry terms as numerics. To do this we need to gather information about the levels that are in our dataset. Cut is composed five levels with Ideal being the highest [4] and Fair being the lowest level [0]. In the seven levels of Color, D is the highest [6] and J is the lowest level [0]. Finally, Clarity is composed of eight levels with IF, meaning internally flawless as the highest level [7] and I1, inclusions level 1, as the lowest level [0].

要创建星图，我们需要将钻石行业术语表示为数字。为此，我们需要收集有关数据集中级别的信息。 Cut分为五个级别，其中“ 理想 ”级别最高[4]，而“ 公平 ”级别最低[0]。在七个颜色级别中， D是最高级别[6]， J是最低级别[0]。最后，清晰度由IF的八个级别组成，这意味着内部无瑕疵为最高级别[7]，而I1为内含物，级别1为最低级别[0]。

显示的切割和抛光数据 (Cutting and Polishing Data for Display)

In our dataset, we cut 3 outliers in carat size that skew the downstream column scaling.

在我们的数据集中，我们切下了3个离群值，这些值偏离了下游列的缩放比例。

## Cut diamonds that skew carat range
indicies_to_remove = [27415, 27630, 27130]
df = df.drop(indicies_to_remove)

Next, we create new columns in our dataframe to house the rankings created by mapping a dictionary against our C’s column. An example of the mapping is below.

接下来，我们在数据框中创建新列，以容纳通过将字典映射到C列所创建的排名。映射的示例如下。

cut={'Ideal':4,'Premium':3,'Very Good':2,'Good': 1,'Fair':0}
df['Cut'] = df['cut'].map(cut) #Note: 'Cut' is a different column

Finally, we need to scale the columns that we will use in our Star Chart to represent the data fairly.

最后，我们需要缩放将在星图中使用的列以公平地表示数据。

## Convert all rankings and contiguous data to scale between 0-100
factors = ['Cut', 'Color', "Clarity", "carat", "price"]new_max = 100
new_min = 0
new_range = new_max - new_min## Create Scaled Columns
for factor in factors:
    max_val = df[factor].max()
    min_val = df[factor].min()
    val_range = max_val - min_val
    df[factor + '_Adj'] = df[factor].apply(lambda x: (((x - min_val) * new_range) / val_range) + new_min)

We then subset the scaled columns for downstream plotting. Notice how we are creating a new dataframe (df2) with only the columns we intend to use in the Star Chart.

然后，我们将缩放列的子集用于下游绘图。 请注意，我们如何仅使用打算在星形图表中使用的列来创建新的数据框(df2)。

## Subset scaled columns 
df2 = df[['Cut_Adj', "Color_Adj", "Clarity_Adj", "carat_Adj", "price_Adj"]]
df2.columns = ['Cut', "Color", "Clarity", "Carat", "Price"]

表演之星 (The Star of the Show)

To create the Star Chart, we must specify which columns to use and create the circular plot object using numpy.

要创建星形图，我们必须指定要使用的列，并使用numpy创建圆形图对象。

labels = ['Cut', "Color", "Clarity", "Carat", "Price"]
points = len(labels)angles = np.linspace(0, 2 * np.pi, points, endpoint=False).tolist()
angles += angles[:1]

We then create a helper function to plot a diamond solely by the index number.

然后，我们创建一个辅助函数，以仅通过索引号绘制菱形。

def add_to_star(diamond, color, label=None):
    values = df2.loc[diamond].tolist()
    values += values[:1]
    if label != None:
        ax.plot(angles, values, color=color, linewidth=1, label=label)
    else:
        ax.plot(angles, values, color=color, linewidth=1, label=diamond)
    ax.fill(angles, values, color=color, alpha=0.25)

Now the magic begins! We can begin populating our Star Chart with any diamonds we want. How about the most expensive and the two least expensive:

现在魔术开始了！我们可以开始用我们想要的任何钻石填充星图。最贵的和最便宜的两个分别如何：

## Create plot object   
fig, ax = plt.subplots(figsize=(6, 6), subplot_kw=dict(polar=True))## Plot a new diamond with the add_to_star function
add_to_star(27749, '#1aaf6c', "Most Expensive Diamond")
add_to_star(0, '#429bf4', "Least Expensive A")
add_to_star(1, '#d42cea', "Least Expensive B")

This amount is enough to create a Star Chart, however, there are no x labels, no orientation, and no custom flair. Let’s change that!

此数量足以创建星图，但是，没有x标签，没有方向，也没有自定义样式。让我们改变它！

## Fix axis to star from top
ax.set_theta_offset(np.pi / 2)
ax.set_theta_direction(-1)## Edit x axis labels
for label, angle in zip(ax.get_xticklabels(), angles):
    if angle in (0, np.pi):
        label.set_horizontalalignment('center')
    elif 0 < angle < np.pi:
        label.set_horizontalalignment('left')
    else:
        label.set_horizontalalignment('right')## Customize your graphic# Change the location of the gridlines or remove them
ax.set_rgrids([20, 40, 60 ,80])
#ax.set_rgrids([]) # This removes grid lines# Change the color of the ticks
ax.tick_params(colors='#222222')
# Make the y-axis labels larger, smaller, or remove by setting fontsize
ax.tick_params(axis='y', labelsize=0)
# Make the x-axis labels larger or smaller.
ax.tick_params(axis='x', labelsize=13)# Change the color of the circular gridlines.
ax.grid(color='#AAAAAA')
# Change the color of the outer circle
ax.spines['polar'].set_color('#222222')
# Change the circle background color
ax.set_facecolor('#FAFAFA')# Add title and legend
ax.set_title('Comparing Diamonds Across Dimensions', y=1.08)
ax.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1))# Draw axis lines for each angle and label.
ax.set_thetagrids(np.degrees(angles), labels)

So what is the output?

那么输出是什么？

最好的金光闪闪 (Best Bling for Your Buck)

What is the diamond with the highest rating across the 4 C’s with the lowest price? To find out we must get the total value across the 4 C’s and divide by the raw (unscaled) price. This section operates over the original dataframe with all the raw columns. To find the total, we sum the four scaled columns.

在价格最低的4 C钻石中评级最高的钻石是什么？为了找出答案，我们必须获得4 C的总价值，然后除以原始(未定标)价格。本节将对具有所有原始列的原始数据框进行操作。为了找到总数，我们对四个比例列进行求和。

df['Total'] = df['Cut_Adj'] + df['Color_Adj'] + df['Clarity_Adj'] + df['carat_Adj']## Divide Value total by Price
df['4C_by_Price'] = df['Total']/df['price']
df = df.sort_values(by="4C_by_Price", ascending=False)

The diamond with the most bling for our buck is #31597 and the diamond with the least bling for our buck is #26320. How do these diamonds compare on a Star Chart? Let’s explore below:

对我们的降压效果最高的钻石是＃31597，而对我们的降压效果最少的钻石是＃26320。这些钻石与星图相比如何？让我们探索以下内容：

结论： (Conclusions:)

Thank you for exploring a few diamond characteristics in a Star Chart format using matplotlib. If you have any questions post them below or to the location of the full code, the GitHub repository. My name is Cody Glickman and I can be found on LinkedIn. Be sure to check out some other articles about fun data science projects!

感谢您使用matplotlib探索星图格式的一些钻石特征。如果您有任何疑问，请在下面或完整代码的位置发布GitHub存储库。我叫Cody Glickman ，可以在LinkedIn上找到。请务必查看其他有关有趣的数据科学项目的文章！