dataprp库本地参数

To help users quickly manage the parameters, currently, we defined four global parameters. Global parameter applies to all the plots which has that parameter.

Global ParameterDescription
widthChange the plots’ width in plot(df, col1), plot(df, col1, col2), plot(df, col1, col2, col3), plot_correlation() and plot_missing().
heightChange the plots’ height in plot(df, col1), plot(df, col1, col2) and plot(df, col1, col2, col3), plot_correlation() and plot_missing().
binsApply to bins for Histogram, KDE Plot, Box Plot, Word Length, Line Chart, Spectrum.
ngroupsApply to bars and slices for the Bar Chart and Pie Chart.

Local parameters are plot-specified and the names are separated by … The portion before the first . is plot name and the portion after the first . is parameter name. The . is also used when the parameter name contains more than one word. When global parameter and local parameter are both entered by a user in config, the global parameter will be overwrote by local parameters for specific plots

In the following tables we summarize the parameters for each API. You can also find the parameters for each plot in the Config API reference.

plot()

Local ParameterTypeDefaultDescription
hist.binsint50Maximum number of bins to display in the Histogram
hist.yscale`str“linear”Y-axis scale (“linear” or “log”) for the `Histogram
bar.barsint10Maximum number of bars to display in the Bar Chart
bar.sort_descendingboolTrueWhether to sort the bars in descending order in the Bar Chart`
bar.colorstr“#1f77b4”Color of the bars in the Bar Chart
insight.duplicates.thresholdint1Warn if the percent of duplicated values is above this threshold in the Insights`
insight.uniform.thresholdfloat0.999The p-value threshold for chi-square test in the Insights
insight.missing.thresholdint1Warn if the percent of missing values is above this threshold in the Insights
insight.skewed.thresholdfloat1e-5The p-value for the scipy.skewtest which test whether the skew is different from the normal distributionin in the Insights
insight.infinity.thresholdint1Warn if the percent of infinites is above this threshold in the Insights
insight.zeros.thresholdint5Warn if the percent of zeros is above this threshold in the Insights
insight.negatives.thresholdint1Warn if the percent of megatives is above this threshold in the Insights
insight.normal.thresholdfloat0.99The p-value threshold for normal test, it is based on D’Agostino and Pearson’s test that combines skew and kurtosis to produce an omnibus test of normality in the Insights
insight.high_cardinality.thresholdint50The threshold for unique values count, count larger than threshold yields high cardinality in the Insights
insight.constant.thresholdint1The threshold for unique values count, count equals to threshold yields constant value in the Insights
insight.outstanding_no1.thresholdfloat1.5The threshold for outstanding no1 insight, measures the ratio of the largest category count to the second-largest category count in the Insights
insight.attribution.thresholdfloat0.5The threshold for the attribution insight, measures the percentage of the top 2 categories in the Insights
insight.high_word_cardinality.thresholdint1000The threshold for the high word cardinality insight, which measures the number of words of that cateogory in the Insights
insight.outstanding_no1_word.thresholdint0The threshold for the outstanding no1 word threshold, which measures the ratio of the most frequent word count to the second most frequent word count in the Insights
insight.outlier.thresholdint0The threshold for the outlier count for the Insights in the Box Plot`
kde.yscale`str“linear”Y-axis scale (“linear” or “log”) for the `KDE Plot
kde.hist_colorstr“#aec7e8”Color of the histogram in the KDE Plot
kde.line_colorstr“#d62728”Color of the line in the KDE Plot
box.ngroups`int15Maximum number of groups for categorical column to display in the `Box Plot
box.bins`int50Maximum number of bins for numerical column to display in the `Box Plot
box.unit`str“auto”Defines the time unit to group values over for a datetime column. It can be “year”, “quarter”, “month”, “week”, “day”, “hour”,“minute”, “second”. With default value “auto”, it will use the time unit such that the resulting number of groups is closest to 15 in the `Box Plot
box.sort_descending`boolTrueWhether to sort the boxes in descending order of frequency in the `Box Plot
box.colorstr“#d62728”Color of the Box Plot
value_table.ngroupsint10
pie.slices`int10Maximum number of pie slices to display in the Pie Chart
wordcloud.top_words`int30Maximum number of most frequent words to display in the Word Cloud
wordlen.bins`int50Maximum number of bins in the Word Length
line.bins`int50Maximum number of bins to display in the Line Chart
line.sort_descending`boolTrueWhether to sort the groups in descending order of frequency in the `Line Chart
line.yscale`str“linear”Y-axis scale (“linear” or “log”) for the `Line Chart
line.unit`str“auto”Defines the time unit to group values over for a datetime column. It can be “year”, “quarter”, “month”, “week”, “day”, “hour”, “minute”, “second”. With default value “auto”, it will use the time unit such that the resulting number of groups is closest to 15 in the `Line Chart
line.agg`str“mean”Specify the aggregate to use when aggregating over a numeric column in the `Line Chart
scatter.sample_size`int1000Number of points to randomly sample per partition in the `Scatter Plot
scatter.sample_rate`float“None”Defines the sample rate per partition in the `Scatter Plot. Cannot be used with sample_size. Set it to 1.0 for no sampling
hexbin.tile_sizefloat“auto”The size of the tile in the hexbin plot. Measured from the middle of a hexagon to its left or right corner in the Hexbin Plot.
nested.nsubgroups`int5Maximum number of most frequent values from the second column to display (computed on the filtered data consisting of the most frequent values from the first column) in the `Nested Bar Chart
stacked.ngroups`int10Maximum number of most frequent values from the first column to display in the `Stacked Bar Chart
stacked.nsubgroups`int5Maximum number of most frequent values from the second column to display (computed on the filtered data consisting of the most frequent values from the first column) in the `Stacked Bar Chart
stacked.unit`str“auto”Defines the time unit to group values over for a datetime column. It can be “year”, “quarter”, “month”, “week”, “day”, “hour”, “minute”, “second”. With default value “auto”, it will use the time unit such that the resulting number of groups is closest to 15 in the `Stacked Bar Chart
stacked.sort_descending`boolTrueWhether to sort the groups in descending order of frequency in the `Stacked Bar Chart
heatmap.ngroups`int10Maximum number of most frequent values from the first column to display in the `Heat Map
heatmap.nsubgroups`int5Maximum number of most frequent values from the second column to display (computed on the filtered data consisting of the most frequent values from the first column)in the `Heat Map

plot_missing()

Local ParameterTypeDefaultDescription
spectrum.binsint20Maximum number of bins to display in the Spectrum
PDF.sample_sizeint100Number of evenly spaced samples between the minimum and maximum values to compute the PDF at
CDF.sample_sizeint100Number of evenly spaced samples between the minimum and maximum values to compute the CDF at

plot_correlation()

Local ParameterTypeDefaultDescription
scatter.sample_size`int1000Number of points to randomly sample per partition in the Scatter Plot in plot_correlation(df, x, y)
scatter.sample_ratefloat“None”Defines the sample rate per partition in the `Scatter Plot. Cannot be used with sample_size. Set it to 1.0 for no sampling

create_report()

Local ParameterTypeDefaultDescription
bar.barsint10Maximum number of bars to display in the Bar Chart
bar.sort_descendingboolTrueWhether to sort the bars in descending order in the Bar Chart`
`bar.yscalestr“linear”Y-axis scale (“linear” or “log”) for the Bar Chart
pie.slices`int10Maximum number of pie slices to display in the Pie Chart
pie.sort_descendingboolTrueWhether to sort the slices in descending order of frequency in the Pie Chart
wordcloud.top_wordsint30Maximum number of most frequent words to display in the Word Cloud
wordcloud.stopwordboolTrueWhether to remove stopwords in the Word Cloud
wordcloud.lemmatizeboolFalseWhether to lemmatize the words in the Word Cloud
wordcloud.stemboolFalseWhether to apply Potter Stem on the words in the Word Cloud
wordfreq.top_wordsint30Maximum number of most frequent words to display in the Word Frequency
wordcloud.stopwordboolTrueWhether to remove stopwords in the Word Frequency
wordcloud.lemmatizeboolFalseWhether to lemmatize the words in the Word Frequency
wordcloud.stemboolFalseWhether to apply Potter Stem on the words in the Word Frequency
wordlen.binsint50Maximum number of bins in the Word Length
wordlen.yscalestr“linear”Y-axis scale (“linear” or “log”) for the Word Length
line.unitstr“auto”Defines the time unit to group values over for a datetime column. It can be “year”, “quarter”, “month”, “week”, “day”, “hour”, “minute”, “second”. With default value “auto”, it will use the time unit such that the resulting number of groups is closest to 15 in the `Line Chart
kde.binsint50Maximum number of bins in the KDE Plot
kde.yscale`str“linear”Y-axis scale (“linear” or “log”) for the `KDE Plot
box.sort_descending`boolTrueWhether to sort the boxes in descending order of frequency in the `Box Plot
spectrum.binsint20Maximum number of bins to display in the Spectrum
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Hailey的算法学习笔记

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值