class WordCloud Found at: wordcloud.wordcloudclass WordCloud(object): """Word cloud object for generating and drawing. Parameters ---------- font_path: string Font path to the font that will be used (OTF or TTF). Defaults to DroidSansMono path on a Linux machine. If you are on another OS or don't have this font, you need to adjust this path. width : int (default=400) Width of the canvas. height : int (default=200) Height of the canvas. prefer_horizontal : float (default=0.90) The ratio of times to try horizontal fitting as opposed to vertical. If prefer_horizontal < 1, the algorithm will try rotating the word if it doesn't fit. (There is currently no built-in way to get only vertical words.) mask : nd-array or None (default=None) If not None, gives a binary mask on where to draw words. If mask is not None, width and height will be ignored and the shape of mask will be used instead. All white (#FF or #FFFFFF) entries will be considerd "masked out" while other entries will be free to draw on. [This changed in the most recent version!] scale : float (default=1) Scaling between computation and drawing. For large word-cloud images, using scale instead of larger canvas size is significantly faster, but might lead to a coarser fit for the words. min_font_size : int (default=4) Smallest font size to use. Will stop when there is no more room in this size. font_step : int (default=1) Step size for the font. font_step > 1 might speed up computation but give a worse fit. max_words : number (default=200) The maximum number of words. stopwords : set of strings or None The words that will be eliminated. If None, the build-in STOPWORDS list will be used. background_color : color value (default="black") Background color for the word cloud image. max_font_size : int or None (default=None) Maximum font size for the largest word. If None, height of the image is used. mode : string (default="RGB") Transparent background will be generated when mode is "RGBA" and background_color is None. relative_scaling : float (default=.5) Importance of relative word frequencies for font-size. With relative_scaling=0, only word-ranks are considered. With relative_scaling=1, a word that is twice as frequent will have twice the size. If you want to consider the word frequencies and not only their rank, relative_scaling around .5 often looks good. .. versionchanged: 2.0 Default is now 0.5. color_func: callable, default=None Callable with parameters word, font_size, position, orientation, font_path, random_state that returns a PIL color for each word. Overwrites "colormap". See colormap for specifying a matplotlib colormap instead. regexp : string or None (optional) Regular expression to split the input text into tokens in process_text. If None is specified, ``r"\w[\w']+"`` is used. collocations : bool, default=True Whether to include collocations (bigrams) of two words. .. versionadded: 2.0 colormap : string or matplotlib colormap, default="viridis" Matplotlib colormap to randomly draw colors from for each word. Ignored if "color_func" is specified. .. versionadded: 2.0 normalize_plurals : bool, default=True Whether to remove trailing 's' from words. If True and a word appears with and without a trailing 's', the one with trailing 's' is removed and its counts are added to the version without trailing 's' -- unless the word ends with 'ss'. | 类WordCloud在:WordCloud找到。wordcloudclass WordCloud(对象): 用于生成和绘制的Word云对象。
参数 ---------- font_path:字符串 要使用的字体(OTF或TTF)的字体路径。 Linux机器上的默认DroidSansMono路径。如果你在另一个操作系统上或者没有这个字体,你需要调整这个路径。
width :int(默认=400) 画布的宽度。
height :int(默认=200) 画布的高度。
prefer_horizontal : float(默认=0.90) 尝试水平拟合与垂直拟合的时间比。如果prefer_horizontal < 1,算法将尝试旋转不适合的单词。(目前还没有内置的方法来只获取垂直的单词。)
mask : nd-array或None(默认=None) 如果没有,给出一个二进制掩码在哪里绘制单词。如果遮罩不是None,宽度和高度将被忽略,而使用遮罩的形状。所有白色(#FF或#FFFFFF)的参赛作品将被视为“屏蔽”,而其他参赛作品将可以自由提取。[这在最近的版本中有所改变!]
scale :浮动(默认=1) 在计算和绘图之间缩放。对于大的字云图像, 使用scale而不是更大的画布尺寸会快得多,但可能会导致适合文字的粗化。
min_font_size : int(默认=4) 使用的最小字体大小。将停止时,没有更多的空间在这个大小。
font_step : int(默认=1) 字体的步长。font_step > 1可能会加速计算,但是匹配效果更差。
max_words :数字(默认=200) 单词的最大数量。
stopwords :一组字符串或没有 将被删除的单词。如果没有,将使用内置的STOPWORDS列表。
background_color :颜色值(默认=“黑色”) 背景色为字云图像。
max_font_size : int或None(默认=None) 为最大的字的最大字体大小。如果没有,则使用图像的高度。
mode :string(默认="RGB") 当模式为“RGBA”,background_color为None时,将生成透明背景。
relative_scaling :浮动(默认= 5) 字体大小的相对频率的重要性。对于relative_scaling=0,只考虑单词的等级。使用relative_scaling=1,出现频率两倍的单词的大小也会增加一倍。如果您想要考虑单词的频率而不仅仅是它们的排名,那么在5左右的relative_scaling通常看起来不错。
. .versionchanged: 2.0 现在默认值是0.5。
color_func:可调用,默认=无 可调用参数word, font_size, position, orientation, font_path, random_state,为每个单词返回一个PIL颜色。 覆盖“colormap”。请参阅colormap以指定matplotlib的colormap。
regexp :字符串或无(可选) 正则表达式,用于在process_text中将输入文本分割为令牌。 如果没有指定,“r”\ w (\ w) +”“使用。 & collocations :bool, default=True 是否包含两个单词的搭配(双字母组合)。
. .versionadded: 2.0
colormap : string或matplotlib colormap,默认="viridis" Matplotlib colormap为每个单词随机绘制颜色。 如果指定了“color_func”,则忽略。
. .versionadded: 2.0
normalize_plurals : bool, default=True 是否删除单词后面的“s”。如果是真的,并且一个单词出现时带有或不带有结尾s,那么带有结尾s的单词将被删除,并将其计数添加到没有结尾s的版本中——除非这个单词以“ss”结尾。 |
def __init__(self, font_path=None, width=400, height=200, margin=2, ranks_only=None, prefer_horizontal=.9, mask=None, scale=1, color_func=None, max_words=200, min_font_size=4, stopwords=None, random_state=None, background_color='black', max_font_size=None, font_step=1, mode="RGB", relative_scaling=.5, regexp=None, collocations=True, colormap=None, normalize_plurals=True): if font_path is None: font_path = FONT_PATH if color_func is None and colormap is None: # we need a color map import matplotlib version = matplotlib.__version__ if version[0] < "2" and version[2] < "5": colormap = "hsv" else: colormap = "viridis" self.colormap = colormap self.collocations = collocations self.font_path = font_path self.width = width self.height = height self.margin = margin self.prefer_horizontal = prefer_horizontal self.mask = mask self.scale = scale self.color_func = color_func or colormap_color_func(colormap) self.max_words = max_words self.stopwords = stopwords if stopwords is not None else STOPWORDS self.min_font_size = min_font_size self.font_step = font_step self.regexp = regexp if isinstance(random_state, int): random_state = Random(random_state) self.random_state = random_state self.background_color = background_color self.max_font_size = max_font_size self.mode = mode if relative_scaling < 0 or relative_scaling > 1: raise ValueError( "relative_scaling needs to be " "between 0 and 1, got %f." % relative_scaling) self.relative_scaling = relative_scaling if ranks_only is not None: warnings.warn("ranks_only is deprecated and will be removed as" " it had no effect. Look into relative_scaling.", DeprecationWarning) self.normalize_plurals = normalize_plurals def fit_words(self, frequencies): """Create a word_cloud from words and frequencies. Alias to generate_from_frequencies. Parameters ---------- frequencies : dict from string to float A contains words and associated frequency. Returns ------- self """ return self.generate_from_frequencies(frequencies) def generate_from_frequencies(self, frequencies, max_font_size=None): """Create a word_cloud from words and frequencies. Parameters ---------- frequencies : dict from string to float A contains words and associated frequency. max_font_size : int Use this font-size instead of self.max_font_size Returns ------- self """ # make sure frequencies are sorted and normalized frequencies = sorted(frequencies.items(), key=itemgetter(1), reverse=True) if len(frequencies) <= 0: raise ValueError("We need at least 1 word to plot a word cloud, " "got %d." % len(frequencies)) frequencies = frequencies[:self.max_words] # largest entry will be 1 max_frequency = float(frequencies[0][1]) frequencies = [(word, freq / max_frequency) for word, freq in frequencies] if self.random_state is not None: random_state = self.random_state else: random_state = Random() if self.mask is not None: mask = self.mask width = mask.shape[1] height = mask.shape[0] if mask.dtype.kind == 'f': warnings.warn("mask image should be unsigned byte between 0" " and 255. Got a float array") if mask.ndim == 2: boolean_mask = mask == 255 elif mask.ndim == 3: # if all channels are white, mask out :::3]255, axis=-1) else: boolean_mask = np.all(mask[ == raise ValueError("Got mask of invalid shape: %s" % str(mask.shape)) else: boolean_mask = None height, width = self.height, self.width occupancy = IntegralOccupancyMap(height, width, boolean_mask) # create image img_grey = Image.new("L", (width, height)) draw = ImageDraw.Draw(img_grey) img_array = np.asarray(img_grey) font_sizes, positions, orientations, colors = [], [], [], [] last_freq = 1. if max_font_size is None: # if not provided use default font_size max_font_size = self.max_font_size if max_font_size is None: # figure out a good font size by trying to draw with # just the first two words if len(frequencies) == 1: # we only have one word. We make it big! font_size = self.height else: self.generate_from_frequencies(dict(frequencies[:2]), max_font_size=self.height) # find font sizes sizes = [x[1] for x in self.layout_] try: font_size = int(2 * sizes[0] * sizes[1] / (sizes[0] + sizes[1])) # quick fix for if self.layout_ contains less than 2 values # on very small images it can be empty except IndexError: try: font_size = sizes[0] except IndexError: raise ValueError('canvas size is too small') else: font_size = max_font_size # we set self.words_ here because we called generate_from_frequencies # above... hurray for good design? self.words_ = dict(frequencies) # start drawing grey image for word, freq in frequencies: # select the font size rs = self.relative_scaling if rs != 0: font_size = int(round((rs * (freq / float(last_freq)) + (1 - rs)) * font_size)) if random_state.random() < self.prefer_horizontal: orientation = None else: orientation = Image.ROTATE_90 tried_other_orientation = False while True: # try to find a position font = ImageFont.truetype(self.font_path, font_size) # transpose font optionally transposed_font = ImageFont.TransposedFont( font, orientation=orientation) # get size of resulting text box_size = draw.textsize(word, font=transposed_font) # find possible places using integral image: result = occupancy.sample_position(box_size[1] + self. margin, box_size[0] + self.margin, random_state) if result is not None or font_size < self.min_font_size: # either we found a place or font-size went too small break # if we didn't find a place, make font smaller # but first try to rotate! if not tried_other_orientation and self.prefer_horizontal < 1: orientation = Image.ROTATE_90 if orientation is None else Image.ROTATE_90 tried_other_orientation = True else: font_size -= self.font_step orientation = None if font_size < self.min_font_size: # we were unable to draw any more break x, y = np.array(result) + self.margin // 2 # actually draw the text draw.text((y, x), word, fill="white", font=transposed_font) positions.append((x, y)) orientations.append(orientation) font_sizes.append(font_size) colors.append(self.color_func(word, font_size=font_size, position=(x, y), orientation=orientation, random_state=random_state, font_path=self.font_path)) # recompute integral image if self.mask is None: img_array = np.asarray(img_grey) else: img_array = np.asarray(img_grey) + boolean_mask # recompute bottom right # the order of the cumsum's is important for speed ?! occupancy.update(img_array, x, y) last_freq = freq self.layout_ = list(zip(frequencies, font_sizes, positions, orientations, colors)) return self def process_text(self, text): """Splits a long text into words, eliminates the stopwords. Parameters ---------- text : string The text to be processed. Returns ------- words : dict (string, int) Word tokens with associated frequency. ..versionchanged:: 1.2.2 Changed return type from list of tuples to dict. Notes ----- There are better ways to do word tokenization, but I don't want to include all those things. """ stopwords = set([i.lower() for i in self.stopwords]) flags = re.UNICODE if sys.version < '3' and type(text) is unicode else 0 regexp = self.regexp if self.regexp is not None else r"\w[\w']+" words = re.findall(regexp, text, flags) # remove stopwords words = [word for word in words if word.lower() not in stopwords] # remove 's words = [word[:-2] if word.lower().endswith("'s") else word for word in words] # remove numbers words = [word for word in words if not word.isdigit()] if self.collocations: word_counts = unigrams_and_bigrams(words, self. normalize_plurals) else: word_counts, _ = process_tokens(words, self. normalize_plurals) return word_counts def generate_from_text(self, text): """Generate wordcloud from text. The input "text" is expected to be a natural text. If you pass a sorted list of words, words will appear in your output twice. To remove this duplication, set ``collocations=False``. Calls process_text and generate_from_frequencies. ..versionchanged:: 1.2.2 Argument of generate_from_frequencies() is not return of process_text() any more. Returns ------- self """ words = self.process_text(text) self.generate_from_frequencies(words) return self def generate(self, text): """Generate wordcloud from text. The input "text" is expected to be a natural text. If you pass a sorted list of words, words will appear in your output twice. To remove this duplication, set ``collocations=False``. Alias to generate_from_text. Calls process_text and generate_from_frequencies. Returns ------- self """ return self.generate_from_text(text) def _check_generated(self): """Check if ``layout_`` was computed, otherwise raise error.""" if not hasattr(self, "layout_"): raise ValueError("WordCloud has not been calculated, call generate" " first.") def to_image(self): self._check_generated() if self.mask is not None: width = self.mask.shape[1] height = self.mask.shape[0] else: height, width = self.height, self.width img = Image.new(self.mode, (int(width * self.scale), int(height * self.scale)), self.background_color) draw = ImageDraw.Draw(img) for (word, count), font_size, position, orientation, color in self. layout_: font = ImageFont.truetype(self.font_path, int(font_size * self.scale)) transposed_font = ImageFont.TransposedFont( font, orientation=orientation) pos = int(position[1] * self.scale), int(position[0] * self.scale) draw.text(pos, word, fill=color, font=transposed_font) return img def recolor(self, random_state=None, color_func=None, colormap=None): """Recolor existing layout. Applying a new coloring is much faster than generating the whole wordcloud. Parameters ---------- random_state : RandomState, int, or None, default=None If not None, a fixed random state is used. If an int is given, this is used as seed for a random.Random state. color_func : function or None, default=None Function to generate new color from word count, font size, position and orientation. If None, self.color_func is used. colormap : string or matplotlib colormap, default=None Use this colormap to generate new colors. Ignored if color_func is specified. If None, self.color_func (or self.color_map) is used. Returns ------- self """ if isinstance(random_state, int): random_state = Random(random_state) self._check_generated() if color_func is None: if colormap is None: color_func = self.color_func else: color_func = colormap_color_func(colormap) self.layout_ = [(word_freq, font_size, position, orientation, color_func(word=word_freq[0], font_size=font_size, position=position, orientation=orientation, random_state=random_state, font_path=self.font_path)) for word_freq, font_size, position, orientation, _ in self.layout_] return self def to_file(self, filename): """Export to image file. Parameters ---------- filename : string Location to write to. Returns ------- self """ img = self.to_image() img.save(filename, optimize=True) return self def to_array(self): """Convert to numpy array. Returns ------- image : nd-array size (width, height, 3) Word cloud image as numpy matrix. """ return np.array(self.to_image()) def __array__(self): """Convert to numpy array. Returns ------- image : nd-array size (width, height, 3) Word cloud image as numpy matrix. """ return self.to_array() def to_html(self): raise NotImplementedError("FIXME!!!") | |