update the database content

115 篇文章 0 订阅
52 篇文章 2 订阅

2万多条数据已经爬去完毕,发现格式不正确,该怎么办?

爬取的结果如下:

[{“title”: “工艺:”, “content”: [“油爆”]}, {“title”: “口味:”, “content”: [“咸鲜味”]}, {“title”: “菜系:”, “content”: [“福建菜”]}, {“title”: “功效:”, “content”: [“福建菜”, “通乳调理”, “气血双补调理”, “营养不良调理”]}, {“title”: “主料:”, “content”: [“河虾250克”]}, {“title”: “辅料:”, “content”: [“竹笋35克”, “香菇(鲜)10克”, “青椒20克”, “红萝卜25克”]}, {“title”: “调料:”, “content”: [“大葱10克 鸡蛋清10克 大蒜5克 淀粉(豌豆)12克 白砂糖5克 盐3克 味精1克 料酒3克 胡椒1克 植物油75克 各适量”]}]

JSon在线解析后结果如下:
这里写图片描述

通过分析发现:

{“title”: “功效:”, “content”: [“福建菜”, “通乳调理”, “气血双补调理”, “营养不良调理”]}

content中内容还在列表中,我们需要取出来,解决办法有两种:

(1)编写爬虫代码时就应该整理好数据。当数据较少可以修改代码重新跑一次,但是数据太多,重跑不可能。

(2)使用pipelines进行数据整理,这种办法也是数据清理时经常会用到的。方法如下:

#以下代码可以在任意文件夹下运行,只要环境配置正确
import pymysql.cursors
import json
#make a connection with the databases
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='123456',
                             db='baikemy.com',
                             cursorclass=pymysql.cursors.DictCursor,autocommit=True)
try:
    with connection.cursor() as cursor:

        sql = "SELECT `id`, `gongyi` FROM `total_copy1`" 
        cursor.execute(sql) #execute the search
        result = cursor.fetchall() #get all the row from search
        for r in result: 
            g = json.loads(r['gongyi']) # change json string to json object
            #if when make to json objects,we can get the value by  g2['title']
            d = []
            for g2 in g:
                a = g2['title'] #get value 
                b = g2['content'] #b is a list of strings 
                strx = ''
                for s in b:#change content list of strings (b)  into one string
                    strx += s + ' '
                strx = strx[:-1] #remove the last space
                d.append( # new gongyi object
                    {
                        "title": a,
                        "content": strx
                    }
                )
            with connection.cursor() as cursor2:
                d2=str(d).replace("'","\"") # change object to string
                print(d2)
                sql = "UPDATE `total_copy1` SET `gongyi`=%s WHERE `id`=%s" #update gongyi object
                cursor2.execute(sql, (d2, r['id'])) # update by knowing id 
                print(r['id'])
finally:
    connection.close()
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
根据以下要求:Instead of using a binary file to save the arraylist of points, change the savaData method and the constructor of the Model class to use a database to write / read the coordinates of all the points. Use XAMPP and phpMyAdmin to create a database called "java" with a table called "points" that has two integer columns x and y (in addition to the ID primary key). Hint: make sure you delete all the old point coordinates from the database before inserting new ones. Hint: use phpMyAdmin to check what is stored in the database. Use the Test class to run all the tests for the software and check that all the tests still work. Use the Start class to run the software and check that closing the software correctly saves the point coordinates in the database (use phpMyAdmin to check the content of the database). Run the software again and check that all the points from the previous run are correctly displayed,修改下述代码:public class Model implements Serializable { private ArrayList<Point> points; private ArrayList<ModelListener> listeners; private static final String FILE_NAME = "points.bin"; public Model() { points = new ArrayList<Point>(); listeners = new ArrayList<ModelListener>(); // Read points from file if it exists File file = new File(FILE_NAME); if (file.exists()) { try { ObjectInputStream in = new ObjectInputStream(new FileInputStream(file)); points = (ArrayList<Point>) in.readObject(); in.close(); } catch (IOException e) { e.printStackTrace(); } catch (ClassNotFoundException e) { e.printStackTrace(); } } } public void addListener(ModelListener l) { listeners.add(l); } public ArrayList<Point> getPoints() { return points; } public void addPoint(Point p) { points.add(p); notifyListeners(); // points changed so notify the listeners. saveData(); // save point to file } public void clearAllPoints() { points.clear(); notifyListeners(); // points changed so notify the listeners. saveData(); // save empty list to file } public void deleteLastPoint() { if (points.size() > 0) { points.remove(points.size() - 1); notifyListeners(); // points changed so notify the listeners. saveData(); // save updated list to file } } private void notifyListeners() { for (ModelListener l : listeners) { l.update(); // Tell the listener that something changed. } } public int numberOfPoints() { return points.size(); } public void saveData() { try { ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream(FILE_NAME)); out.writeObject(points); out.close(); } catch (IOException e) { e.printStackTrace(); } }
最新发布
05-25

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值