如何让Ai帮数据分析师干活-工作1

最新推荐文章于 2023-11-14 00:45:00 发布

远洋之帆

最新推荐文章于 2023-11-14 00:45:00 发布

阅读量1.7k

点赞数 1

分类专栏： AIGC AI应用市场文章标签：人工智能 python 信息可视化 Powered by 金山文档

本文链接：https://blog.csdn.net/liangwqi/article/details/129311521

版权

AIGC 同时被 2 个专栏收录

37 篇文章 24 订阅

订阅专栏

AI应用市场

23 篇文章 3 订阅

订阅专栏

故事背景：

openai公开了api调用接口，北大前几天出了一款chatexcel工具。这两件事本来没什么关系，但是工程师就是这样没事总要给自己找点事干。在一个技术群里跟人吹牛说如果openai开放api我也可以做一个chatexcel，甚至比他们做的更加好。

1.要做到自然语言接需求

2.可以精准的理解用户需求

3.可以给出准确分析结果

4.需要给出可视化的呈现报告

5.如果可以最好能做成ppt呈现

好了然后又是填坑之路，为了快速做产品最小代价poc。于是就openai api+可视化的chatgpt一起使用了。其实如果真要做产品这些必然都是封装好的都是用openai api来做，对用户就一个需求交互框、一个输入数据cvs表的地方就可以。这边我是验证产品上下界，所以请允许我无伤大雅的犯一次规。

思路如下：

1.用户输入表单后对表头做解析，解析出meta信息，已备后续用户需求分析使用

2.先格式化的输入描述，让openai API生成自动化数据分析的代码（产品化时候，可以把用户宽泛需求通过openai转成格式化输入）

3.把生成python代码解析存成.py格式

4.用python得os包执行python脚本，把数据可视化转成html格式方便点击查看

下面看实际效果：

调用api通过自然语言描述生成带代码，给出的结果代码堆在一起，所以需要做解析

代码解析，这边偷了个懒，让chatgpt帮忙解析出结果

解析出openai生成的代码如下，因为包的版本问题有些版本冲突问题，为了快速验证，我放弃解决冲突，让chatgpt帮忙重新对任务生成代码。

import pandas as pd
import jieba
from pyecharts import WordCloud

# read in the data from the CSV file
data = pd.read_csv('product_reviews.csv')

# split product reviews into individual words
reviews = data['Product Review']
word_freq = {}
for review in reviews:
    words = jieba.cut(review)
    for word in words:
        if word in word_freq:
            word_freq[word] += 1
        else:
            word_freq[word] = 1

            # sort the words by frequency
sorted_word_freq = sorted(word_freq.items(), key=lambda x: x[1], reverse=True)

# print the top 10 most frequent words
print('Top 10 most frequent words:')
for word, freq in sorted_word_freq[:10]:
    print(f'{word}: {freq}')

    # create a word cloud of the top 50 most frequent words
wordcloud = WordCloud(width=800, height=620)
wordcloud.add("", sorted_word_freq[:50], word_size_range=[20, 100])
wordcloud.render('wordcloud.html')

chatgpt解析任务，生成代码

上面代码有些小bug

于是还是直接测试了用chatgpt来生成代码

import pandas as pd
import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from collections import Counter

# read the data from the table
df = pd.read_csv('product_reviews.csv')

# create a list of stop words
stop_words = ['的', '了', '是', '我', '你', '他', '她', '我们', '你们', '他们']

# tokenize the product reviews and count the frequency of each word
words_list = []
for review in df['Product Review']:
    words = jieba.lcut(review)
    words_list.extend(words)
words_freq = Counter(words_list)

# remove stop words from the word frequency dictionary
for stop_word in stop_words:
    words_freq.pop(stop_word, None)

# sort the word frequency dictionary by descending order of frequency
sorted_words_freq = sorted(words_freq.items(), key=lambda x: x[1], reverse=True)

# print the top 10 most frequent words
print('Top 10 most frequent words in product reviews:')
for word, freq in sorted_words_freq[:10]:
    print(f'{word}: {freq}')

# create a word cloud using the top 50 most frequent words
wordcloud = WordCloud(background_color='white', width=800, height=400).generate_from_frequencies(words_freq.most_common(50))

# plot the word cloud
plt.figure(figsize=(12, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

# save the word cloud as an HTML file
wordcloud.to_file('wordcloud.html')

甚至给出了要安装什么包

让chatgpt帮忙生成一些数据做测试

import csv
import random

# 商品名称列表
product_names = ['Product A', 'Product B', 'Product C', 'Product D', 'Product E']

# 生成商品评论数据
product_reviews = []
for i in range(1000):
    # 随机选择一个商品名称
    product_name = random.choice(product_names)
    # 随机生成一个评论
    product_review = f"This is a great {product_name}!"
    # 随机生成曝光点次数和点击次数
    num_exposures = random.randint(1, 100)
    click_count = random.randint(0, num_exposures)
    # 添加到商品评论列表中
    product_reviews.append([product_name, product_review, num_exposures, click_count])

# 将商品评论数据写入 CSV 文件
with open('product_reviews.csv', mode='w', newline='') as csv_file:
    writer = csv.writer(csv_file)
    # 写入表头
    writer.writerow(['Product Name', 'Product Review', 'Number of Exposures', 'Click Count'])
    # 写入数据
    writer.writerows(product_reviews)