Python：wordcloud.wordcloud()函数的参数解析及其说明

最新推荐文章于 2024-09-25 01:02:19 发布

一个处女座的程序猿

最新推荐文章于 2024-09-25 01:02:19 发布

阅读量1.4w

点赞数 10

分类专栏： Python编程(初级+进阶)

本文链接：https://blog.csdn.net/qq_41185868/article/details/107703213

版权

Python编程(初级+进阶) 专栏收录该内容

271 篇文章 511 订阅

订阅专栏

Python：wordcloud.wordcloud()函数的参数解析及其说明

wordcloud.wordcloud()函数的参数解析及其说明

class WordCloud Found at: wordcloud.wordcloudclass WordCloud(object):
"""Word cloud object for generating and drawing.

Parameters
----------
font_path: string
Font path to the font that will be used (OTF or TTF).
Defaults to DroidSansMono path on a Linux machine. If you are on another OS or don't have this font, you need to adjust this path.

width : int (default=400)
Width of the canvas.

height : int (default=200)
Height of the canvas.

prefer_horizontal : float (default=0.90)
The ratio of times to try horizontal fitting as opposed to vertical.  If prefer_horizontal < 1, the algorithm will try rotating the word   if it doesn't fit. (There is currently no built-in way to get only vertical words.)

mask : nd-array or None (default=None)
If not None, gives a binary mask on where to draw words. If mask is not  None, width and height will be ignored and the shape of mask will be used instead. All white (#FF or #FFFFFF) entries will be considerd   "masked out" while other entries will be free to draw on. [This  changed in the most recent version!]

scale : float (default=1)
Scaling between computation and drawing. For large word-cloud   images,
using scale instead of larger canvas size is significantly faster, but might lead to a coarser fit for the words.

min_font_size : int (default=4)
Smallest font size to use. Will stop when there is no more room   in this  size.

font_step : int (default=1)
Step size for the font. font_step > 1 might speed up computation but   give a worse fit.

max_words : number (default=200)
The maximum number of words.

stopwords : set of strings or None
The words that will be eliminated. If None, the build-in STOPWORDS  list will be used.

background_color : color value (default="black")
Background color for the word cloud image.

max_font_size : int or None (default=None)
Maximum font size for the largest word. If None, height of the    image is used.

mode : string (default="RGB")
Transparent background will be generated when mode is "RGBA"  and  background_color is None.

relative_scaling : float (default=.5)
Importance of relative word frequencies for font-size. With  relative_scaling=0, only word-ranks are considered. With   relative_scaling=1, a word that is twice as frequent will have twice the size. If you want to consider the word frequencies and not only  their rank, relative_scaling around .5 often looks good.

.. versionchanged: 2.0
Default is now 0.5.

color_func: callable, default=None
Callable with parameters word, font_size, position, orientation,  font_path, random_state that returns a PIL color for each word.
Overwrites "colormap". See colormap for specifying a matplotlib colormap instead.

regexp : string or None (optional)
Regular expression to split the input text into tokens in   process_text.
If None is specified, ``r"\w[\w']+"`` is used.

collocations : bool, default=True
Whether to include collocations (bigrams) of two words.

.. versionadded: 2.0

colormap : string or matplotlib colormap, default="viridis"
Matplotlib colormap to randomly draw colors from for each   word.
Ignored if "color_func" is specified.

.. versionadded: 2.0

normalize_plurals : bool, default=True
Whether to remove trailing 's' from words. If True and a word appears with and without a trailing 's', the one with trailing 's'  is removed and its counts are added to the version without  trailing 's' -- unless the word ends with 'ss'.

类WordCloud在:WordCloud找到。wordcloudclass WordCloud(对象):
用于生成和绘制的Word云对象。

参数
----------
font_path:字符串
要使用的字体(OTF或TTF)的字体路径。
Linux机器上的默认DroidSansMono路径。如果你在另一个操作系统上或者没有这个字体，你需要调整这个路径。

width :int(默认=400)
画布的宽度。

height :int(默认=200)
画布的高度。

prefer_horizontal : float(默认=0.90)
尝试水平拟合与垂直拟合的时间比。如果prefer_horizontal < 1，算法将尝试旋转不适合的单词。(目前还没有内置的方法来只获取垂直的单词。)

mask : nd-array或None(默认=None)
如果没有，给出一个二进制掩码在哪里绘制单词。如果遮罩不是None，宽度和高度将被忽略，而使用遮罩的形状。所有白色(#FF或#FFFFFF)的参赛作品将被视为“屏蔽”，而其他参赛作品将可以自由提取。[这在最近的版本中有所改变!]

scale :浮动(默认=1)
在计算和绘图之间缩放。对于大的字云图像，
使用scale而不是更大的画布尺寸会快得多，但可能会导致适合文字的粗化。

min_font_size : int(默认=4)
使用的最小字体大小。将停止时，没有更多的空间在这个大小。

font_step : int(默认=1)
字体的步长。font_step > 1可能会加速计算，但是匹配效果更差。

max_words :数字(默认=200)
单词的最大数量。

stopwords :一组字符串或没有
将被删除的单词。如果没有，将使用内置的STOPWORDS列表。

background_color :颜色值(默认=“黑色”)
背景色为字云图像。

max_font_size : int或None(默认=None)
为最大的字的最大字体大小。如果没有，则使用图像的高度。

mode :string(默认="RGB")
当模式为“RGBA”，background_color为None时，将生成透明背景。

relative_scaling :浮动(默认= 5)
字体大小的相对频率的重要性。对于relative_scaling=0，只考虑单词的等级。使用relative_scaling=1，出现频率两倍的单词的大小也会增加一倍。如果您想要考虑单词的频率而不仅仅是它们的排名，那么在5左右的relative_scaling通常看起来不错。

. .versionchanged: 2.0
现在默认值是0.5。

color_func:可调用，默认=无
可调用参数word, font_size, position, orientation, font_path, random_state，为每个单词返回一个PIL颜色。
覆盖“colormap”。请参阅colormap以指定matplotlib的colormap。

regexp :字符串或无(可选)
正则表达式，用于在process_text中将输入文本分割为令牌。
如果没有指定,“r”\ w (\ w) +”“使用。
&
collocations :bool, default=True
是否包含两个单词的搭配(双字母组合)。

. .versionadded: 2.0

colormap : string或matplotlib colormap，默认="viridis"
Matplotlib colormap为每个单词随机绘制颜色。
如果指定了“color_func”，则忽略。

. .versionadded: 2.0

normalize_plurals : bool, default=True
是否删除单词后面的“s”。如果是真的，并且一个单词出现时带有或不带有结尾s，那么带有结尾s的单词将被删除，并将其计数添加到没有结尾s的版本中——除非这个单词以“ss”结尾。

Attributes
----------
``words_`` : dict of string to float
Word tokens with associated frequency.

.. versionchanged: 2.0
``words_`` is now a dictionary

``layout_ `` : list of tuples (string, int, (int, int), int, color))
Encodes the fitted word cloud. Encodes for each word the string,   font size, position, orientation and color.

Notes
-----
Larger canvases with make the code significantly slower. If you   need a  large word cloud, try a lower canvas size, and set the scale parameter.

The algorithm might give more weight to the ranking of the words  than their actual frequencies, depending on the ``max_font_size `   and the scaling heuristic.
"""

属性
---------
' ' words_ ' ':浮动字符串的dict
具有相关频率的单词标记。

. .versionchanged: 2.0
“words_”现在是一本字典

' ' layout_ ' ':元组列表(字符串，int， (int, int)， int, color))
编码合适的词云。为每个单词编码字符串、字体大小、位置、方向和颜色。

笔记
-----
较大的画布使代码明显地变慢。如果你需要一个大的字云，尝试一个较低的画布大小，并设置比例参数。

根据' ' max_font_size '和缩放启发式，算法可能给予单词的排名比它们的实际频率更多的权重。
”“”

def __init__(self, font_path=None, width=400, height=200,
margin=2,
ranks_only=None, prefer_horizontal=.9, mask=None, scale=1,
color_func=None, max_words=200, min_font_size=4,
stopwords=None, random_state=None,
background_color='black',
max_font_size=None, font_step=1, mode="RGB",
relative_scaling=.5, regexp=None, collocations=True,
colormap=None, normalize_plurals=True):
if font_path is None:
font_path = FONT_PATH
if color_func is None and colormap is None:
# we need a color map
import matplotlib
version = matplotlib.__version__
if version[0] < "2" and version[2] < "5":
colormap = "hsv"
else:
colormap = "viridis"
self.colormap = colormap
self.collocations = collocations
self.font_path = font_path
self.width = width
self.height = height
self.margin = margin
self.prefer_horizontal = prefer_horizontal
self.mask = mask
self.scale = scale
self.color_func = color_func or colormap_color_func(colormap)
self.max_words = max_words
self.stopwords = stopwords if stopwords is not None else
STOPWORDS
self.min_font_size = min_font_size
self.font_step = font_step
self.regexp = regexp
if isinstance(random_state, int):
random_state = Random(random_state)
self.random_state = random_state
self.background_color = background_color
self.max_font_size = max_font_size
self.mode = mode
if relative_scaling < 0 or relative_scaling > 1:
raise ValueError(
"relative_scaling needs to be "
"between 0 and 1, got %f." %
relative_scaling)
self.relative_scaling = relative_scaling
if ranks_only is not None:
warnings.warn("ranks_only is deprecated and will be
removed as"
" it had no effect. Look into relative_scaling.",
DeprecationWarning)
self.normalize_plurals = normalize_plurals

def fit_words(self, frequencies):
"""Create a word_cloud from words and frequencies.

Alias to generate_from_frequencies.

Parameters
----------
frequencies : dict from string to float
A contains words and associated frequency.

Returns
-------
self
"""
return self.generate_from_frequencies(frequencies)

def generate_from_frequencies(self, frequencies,
max_font_size=None):
"""Create a word_cloud from words and frequencies. Parameters

----------
frequencies : dict from string to float
A contains words and associated frequency.

max_font_size : int
Use this font-size instead of self.max_font_size

Returns
-------
self

"""
# make sure frequencies are sorted and normalized
frequencies = sorted(frequencies.items(), key=itemgetter(1),
reverse=True)
if len(frequencies) <= 0:
raise ValueError("We need at least 1 word to plot a word
cloud, "
"got %d." %
len(frequencies))
frequencies = frequencies[:self.max_words] # largest entry will
be 1
max_frequency = float(frequencies[0][1])
frequencies = [(word, freq / max_frequency) for
word, freq in frequencies]
if self.random_state is not None:
random_state = self.random_state
else:
random_state = Random()
if self.mask is not None:
mask = self.mask
width = mask.shape[1]
height = mask.shape[0]
if mask.dtype.kind == 'f':
warnings.warn("mask image should be unsigned byte
between 0"
" and 255. Got a float array")
if mask.ndim == 2:
boolean_mask = mask == 255
elif mask.ndim == 3: # if all channels are white, mask out
:::3]255, axis=-1)
else:
boolean_mask = np.all(mask[ ==
raise ValueError("Got mask of invalid shape: %s" %
str(mask.shape))
else:
boolean_mask = None
height, width = self.height, self.width
occupancy = IntegralOccupancyMap(height, width,
boolean_mask)
# create image
img_grey = Image.new("L", (width, height))
draw = ImageDraw.Draw(img_grey)
img_array = np.asarray(img_grey)
font_sizes, positions, orientations, colors = [], [], [], []
last_freq = 1.
if max_font_size is None:
# if not provided use default font_size
max_font_size = self.max_font_size
if max_font_size is None:
# figure out a good font size by trying to draw with
# just the first two words
if len(frequencies) == 1:
# we only have one word. We make it big!
font_size = self.height
else:
self.generate_from_frequencies(dict(frequencies[:2]),
max_font_size=self.height)
# find font sizes
sizes = [x[1] for x in self.layout_]
try:
font_size = int(2 * sizes[0] * sizes[1] /
(sizes[0] + sizes[1]))
# quick fix for if self.layout_ contains less than 2 values
# on very small images it can be empty
except IndexError:
try:
font_size = sizes[0]
except IndexError:
raise ValueError('canvas size is too small')
else:
font_size = max_font_size
# we set self.words_ here because we called
generate_from_frequencies
# above... hurray for good design?
self.words_ = dict(frequencies)
# start drawing grey image
for word, freq in frequencies:
# select the font size
rs = self.relative_scaling
if rs != 0:
font_size = int(round((rs * (freq / float(last_freq)) +
(1 - rs)) * font_size))
if random_state.random() < self.prefer_horizontal:
orientation = None
else:
orientation = Image.ROTATE_90
tried_other_orientation = False
while True:
# try to find a position
font = ImageFont.truetype(self.font_path, font_size)
# transpose font optionally
transposed_font = ImageFont.TransposedFont(
font, orientation=orientation)
# get size of resulting text
box_size = draw.textsize(word, font=transposed_font)
# find possible places using integral image:
result = occupancy.sample_position(box_size[1] + self.
margin,
box_size[0] + self.margin,
random_state)
if result is not None or font_size < self.min_font_size:
# either we found a place or font-size went too small
break
# if we didn't find a place, make font smaller
# but first try to rotate!
if not tried_other_orientation and self.prefer_horizontal <
1:
orientation = Image.ROTATE_90 if orientation is None
else Image.ROTATE_90
tried_other_orientation = True
else:
font_size -= self.font_step
orientation = None

if font_size < self.min_font_size:
# we were unable to draw any more
break
x, y = np.array(result) + self.margin // 2
# actually draw the text
draw.text((y, x), word, fill="white", font=transposed_font)
positions.append((x, y))
orientations.append(orientation)
font_sizes.append(font_size)
colors.append(self.color_func(word, font_size=font_size,
position=(x, y),
orientation=orientation,
random_state=random_state,
font_path=self.font_path))
# recompute integral image
if self.mask is None:
img_array = np.asarray(img_grey)
else:
img_array = np.asarray(img_grey) + boolean_mask
# recompute bottom right
# the order of the cumsum's is important for speed ?!
occupancy.update(img_array, x, y)
last_freq = freq

self.layout_ = list(zip(frequencies, font_sizes, positions,
orientations, colors))
return self

def process_text(self, text):
"""Splits a long text into words, eliminates the stopwords.

Parameters
----------
text : string
The text to be processed.

Returns
-------
words : dict (string, int)
Word tokens with associated frequency.

..versionchanged:: 1.2.2
Changed return type from list of tuples to dict.

Notes
-----
There are better ways to do word tokenization, but I don't
want to
include all those things.
"""
stopwords = set([i.lower() for i in self.stopwords])
flags = re.UNICODE if sys.version < '3' and type(text) is unicode
else 0
regexp = self.regexp if self.regexp is not None else r"\w[\w']+"
words = re.findall(regexp, text, flags)
# remove stopwords
words = [word for word in words if word.lower() not in
stopwords]
# remove 's
words = [word[:-2] if word.lower().endswith("'s") else word for
word in words]
# remove numbers
words = [word for word in words if not word.isdigit()]
if self.collocations:
word_counts = unigrams_and_bigrams(words, self.
normalize_plurals)
else:
word_counts, _ = process_tokens(words, self.
normalize_plurals)
return word_counts

def generate_from_text(self, text):
"""Generate wordcloud from text.

The input "text" is expected to be a natural text. If you pass a
sorted
list of words, words will appear in your output twice. To
remove this
duplication, set ``collocations=False``.

Calls process_text and generate_from_frequencies.

..versionchanged:: 1.2.2
Argument of generate_from_frequencies() is not return of
process_text() any more.

Returns
-------
self
"""
words = self.process_text(text)
self.generate_from_frequencies(words)
return self

def generate(self, text):
"""Generate wordcloud from text.

The input "text" is expected to be a natural text. If you pass a
sorted
list of words, words will appear in your output twice. To
remove this
duplication, set ``collocations=False``.

Alias to generate_from_text.

Calls process_text and generate_from_frequencies.

Returns
-------
self
"""
return self.generate_from_text(text)

def _check_generated(self):
"""Check if ``layout_`` was computed, otherwise raise error."""
if not hasattr(self, "layout_"):
raise ValueError("WordCloud has not been calculated, call
generate"
" first.")

def to_image(self):
self._check_generated()
if self.mask is not None:
width = self.mask.shape[1]
height = self.mask.shape[0]
else:
height, width = self.height, self.width
img = Image.new(self.mode, (int(width * self.scale),
int(height * self.scale)),
self.background_color)
draw = ImageDraw.Draw(img)
for (word, count), font_size, position, orientation, color in self.
layout_:
font = ImageFont.truetype(self.font_path,
int(font_size * self.scale))
transposed_font = ImageFont.TransposedFont(
font, orientation=orientation)
pos = int(position[1] * self.scale), int(position[0] * self.scale)
draw.text(pos, word, fill=color, font=transposed_font)

return img

def recolor(self, random_state=None, color_func=None,
colormap=None):
"""Recolor existing layout.

Applying a new coloring is much faster than generating the
whole
wordcloud.

Parameters
----------
random_state : RandomState, int, or None, default=None
If not None, a fixed random state is used. If an int is given,
this
is used as seed for a random.Random state.

color_func : function or None, default=None
Function to generate new color from word count, font size,
position
and orientation. If None, self.color_func is used.

colormap : string or matplotlib colormap, default=None
Use this colormap to generate new colors. Ignored if
color_func
is specified. If None, self.color_func (or self.color_map) is
used.

Returns
-------
self
"""
if isinstance(random_state, int):
random_state = Random(random_state)
self._check_generated()
if color_func is None:
if colormap is None:
color_func = self.color_func
else:
color_func = colormap_color_func(colormap)
self.layout_ = [(word_freq, font_size, position, orientation,
color_func(word=word_freq[0], font_size=font_size,
position=position, orientation=orientation,
random_state=random_state,
font_path=self.font_path)) for
word_freq, font_size, position, orientation, _ in
self.layout_]
return self

def to_file(self, filename):
"""Export to image file.

Parameters
----------
filename : string
Location to write to.

Returns
-------
self
"""
img = self.to_image()
img.save(filename, optimize=True)
return self

def to_array(self):
"""Convert to numpy array.

Returns
-------
image : nd-array size (width, height, 3)
Word cloud image as numpy matrix.
"""
return np.array(self.to_image())

def __array__(self):
"""Convert to numpy array.

Returns
-------
image : nd-array size (width, height, 3)
Word cloud image as numpy matrix.
"""
return self.to_array()

def to_html(self):
raise NotImplementedError("FIXME!!!")