python 相关性分析切点寻找_利用Python进行数据分析分析-Ted Talks Data Analysis

585019d60572

Ted Talks

环境:Python2.7 Anaconda Jupyter Notebook

数据集: https://www.kaggle.com/rounakbanik/ted-talks

导入相应的库

%matplotlib inline

import pandas as pd

import numpy as np

from scipy import stats

import matplotlib.pyplot as plt

import seaborn as sns #matplotlib的默认作图风格就会被覆盖成seaborn的格式

import json

from pandas.io.json import json_normalize

from wordcloud import WordCloud, STOPWORDS #词云

df = pd.read_csv('ted_main.csv')

df.colums #数据集的首行表头

Index([u'comments', u'description', u'duration', u'event', u'film_date',

u'languages', u'main_speaker', u'name', u'num_speaker',

u'published_date', u'ratings', u'related_talks', u'speaker_occupation',

u'tags', u'title', u'url', u'views'],

dtype='object')

#调整表头顺序

df = df[['name', 'title', 'description', 'main_speaker', 'speaker_occupation', 'num_speaker', 'duration', 'event', 'film_date', 'published_date', 'comments', 'tags', 'languages', 'ratings', 'related_talks', 'url', 'views']]

Features Available

name: The official name of the TED Talk. Includes the title and the speaker.

title: The title of the talk

description: A blurb of what the talk is about.

main_speaker: The first named speaker of the talk.

speaker_occupation: The occupation of the main speaker.

num_speaker: The number of speakers in the talk.

duration: The duration of the talk in seconds.

event: The TED/TEDx event where the talk took place.

film_date: The Unix timestamp of the filming.

published_date: The Unix timestamp for the publication of the talk on TED.com

comments: The number of first level

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值