Python对商品属性进行二次分类并输出多层嵌套字典

题目有点长,感觉好像也解释的不太清楚,但是大概意思就是,我们在逛一个网站的时候,譬如天猫,你会看到有“女装”、“男鞋”、“手机”等等分类,点击进去又会有相应的品牌,女装下面会有“snidle”、“伊芙丽”等品牌,男鞋下面会有“nike”、“adidas”等分类,如果一个用户在搜索nike,那么相应的标签应该会带上“男鞋”,通俗的说是会在输入框下面弹出“在男鞋下面搜索nike”,那么我写这篇文章就是要预测我们在输入一个品牌的时候,相对应的一级分类的概率是多少。
然并卵,我并没有天猫的相关数据,只有我公司的数据,但是这个数据肯定不能外泄,编数据又很麻烦,所以就不讲怎么用机器学习的算法去计算这个概率了,不过这也不难,待我有时间写个爬虫把数据弄下来再写,嘿嘿。
总之,做完后的预测数据应该是酱紫的:
天猫商品二次分类预测值

这个表怎么看呢,第一行是一级分类的类别,第一列是二级分类的类别。以第三行为例,我们可以看到“scofield”这个品牌被分类为“女装/内衣”的概率是0.87473829,“女鞋/男鞋/箱包”的概率是0.03394293,“化妆品/个人护理”的概率是0.21392374。所以如果你在天猫的搜索框里搜索“scofield”,下面最可能弹出来的是“在女装/内衣中搜索scofield”。
但是这个表有个缺陷,就是0值太多,而且没有排序,看起来很乱,所以我们用python中的字典进行排序。
废话不多说,上代码:

#coding:utf-8
import numpy as np
import pandas as pd
from odo import odo
from odo import convert
import json
from operator import itemgetter
import collections
from collections import OrderedDict
import sys  

reload(sys)  
sys.setdefaultencoding('utf8')

#加载数据集
result = pd.read_table('tmalltest.txt',header =None)
listall = odo(result,list)
result1 = pd.read_table('tmalltest.txt')
result2 = result1.drop('class',axis = 1)
listvalue = odo(result2,list)
count = len(range(result.shape[1]))
id = result.iloc[0,1:16]
listvalue
out = [result.iloc[y,0] for y in range(1,result.shape[0])] 
outid  =tuple(out)
d = [dict(zip(id,tuple(listvalue[i]))) for i in range(0,len(listvalue))]
#将字典的键值对反转
func = lambda b:dict([(x,y) for y,x in b.items()])
dd = [func(d[i]) for i in range(len(d)) ]
#删除字典中key为0的键值对
delete = [dd[i].pop(0.0) for i in range(len(d))]
#将字典反转回来
ddvalue = [func(dd[i]) for i in range(len(d))]
#两个列表合成dict
dictall = dict(zip(out,ddvalue))

#使输出到控制台的时候显示的是中文
print json.dumps(dictall).decode("unicode-escape")

#将字典中的值取出来,放到一个新列表中
lista = []
for k in dictall.keys():
    sorted_d =sorted(dictall[k].iteritems(),key = itemgetter(1),reverse = True)
    print sorted_d
    lista.append(sorted_d)
#只选取预测值排前三的类别
listb = [lista[i][0:3] for i in range(len(lista))]
listc = [json.dumps(listb[i]).decode("unicode-escape") for i in range(len(listb))]

#二级分类排序,可以用OrderedDict有序字典排序
dictorder = [OrderedDict(lista[i]) for i in range(0,len(lista))] 
print json.dumps(dictorder).decode("unicode-escape")

#将排序号的列表重新组合成字典
dictall_sort= dict(zip(dictall.keys(),listc))


#写个函数使输出嵌套字典更美观
def pretty_dict(obj, indent=' '):
    def _pretty(obj, indent):
        for i, tup in enumerate(obj.items()):
            k, v = tup
            #如果是字符串则拼上""
            if isinstance(k, basestring): k = '"%s"'% k
            if isinstance(v, basestring): v = '"%s"'% v
            #如果是字典则递归
            if isinstance(v, dict):
                v = ''.join(_pretty(v, indent + ' '* len(str(k) + ': {')))#计算下一层的indent
            #case,根据(k,v)对在哪个位置确定拼接什么
            if i == 0:#开头,拼左花括号
                if len(obj) == 1:
                    yield '{%s: %s}'% (k, v)
                else:
                    yield '{%s: %s,\n'% (k, v)
            elif i == len(obj) - 1:#结尾,拼右花括号
                yield '%s%s: %s}'% (indent, k, v)
            else:#中间
                yield '%s%s: %s,\n'% (indent, k, v)
    print ''.join(_pretty(obj, indent))

#输出原始未排序的字典,美化后
print pretty_dict(dictall)
#输出排序后的字典,美化前
print json.dumps(dictall_sort).decode("unicode-escape")
#输出排序后的字典,美化后
print pretty_dict(dictall_sort)
    输出结果:
#输出原始未排序的字典,美化后
{"太平鸟": {"男装/户外运动/": 0.847823719,
               "家纺/家饰/鲜花": "0",
               "化妆品/个人护理": 0.11242904,
               "腕表/珠宝饰品/眼镜": 0.05923729},
 "博士伦": {"家纺/家饰/鲜花": "0",
               "化妆品/个人护理": 0.11323213,
               "医药保健": 0.89348974},
 "a02": {"家纺/家饰/鲜花": "0",
         "女装/内衣": 0.984447322,
         "女鞋/男鞋/箱包": 0.12493492},
 "周黑鸭": {"零食/进口食品/茶酒": 0.87323123,
               "家纺/家饰/鲜花": "0",
               "厨具/收纳/宠物": 0.12432232},
 "3M": {"家纺/家饰/鲜花": "0",
        "厨具/收纳/宠物": 0.32344534,
        "家居建材": 0.68213814},
 "博士": {"家纺/家饰/鲜花": "0"},
 "sk-II": {"家纺/家饰/鲜花": "0",
           "化妆品/个人护理": 0.98843487,
           "腕表/珠宝饰品/眼镜": 0.02324442},
 "洗洁精": {"图书音像": 0.02124194,
               "家纺/家饰/鲜花": "0"},
 "finity": {"家纺/家饰/鲜花": "0",
            "女装/内衣": 0.93392424,
            "女鞋/男鞋/箱包": 0.07323483},
 "selected": {"男装/户外运动/": 0.934439842,
              "家纺/家饰/鲜花": "0",
              "女鞋/男鞋/箱包": 0.07438472},
 "scofield": {"家纺/家饰/鲜花": "0",
              "化妆品/个人护理": 0.21392374,
              "女鞋/男鞋/箱包": 0.03394293,
              "女装/内衣": 0.87473829},
 "米其林": {"家纺/家饰/鲜花": "0.02432412",
               "汽车/配件/用品": 0.98233342},
 "好奇": {"零食/进口食品/茶酒": 0.11321412,
            "母婴玩具": 0.89472934,
            "家纺/家饰/鲜花": "0"},
 "佐卡伊": {"家纺/家饰/鲜花": "0",
               "化妆品/个人护理": 0.13232944,
               "腕表/珠宝饰品/眼镜": 0.87342324},
 "波司登": {"母婴玩具": 0.02134243,
               "家纺/家饰/鲜花": "0",
               "女装/内衣": 0.78765673,
               "化妆品/个人护理": 0.20183924},
 "breadbutter": {"零食/进口食品/茶酒": 0.29434974,
                 "家纺/家饰/鲜花": "0",
                 "女鞋/男鞋/箱包": 0.03329473,
                 "女装/内衣": 0.684728232},
 "北极绒": {"家纺/家饰/鲜花": "0.84932498",
               "大家电/生活电器": 0.05213923,
               "家居建材": 0.11321321},
 "Adidas": {"男装/户外运动/": 0.829743434,
            "家纺/家饰/鲜花": "0",
            "女鞋/男鞋/箱包": 0.14974892,
            "手机/数码/电脑办公": 0.04232553},
 "当当网": {"图书音像": 0.78947234,
               "家纺/家饰/鲜花": "0"},
 "snidle": {"家纺/家饰/鲜花": "0",
            "女装/内衣": 0.83927289,
            "女鞋/男鞋/箱包": 0.15237234,
            "腕表/珠宝饰品/眼镜": 0.02432324},
 "TISSOT": {"家纺/家饰/鲜花": "0",
            "大家电/生活电器": 0.13942309,
            "腕表/珠宝饰品/眼镜": 0.87545234},
 "曼妮芬": {"家纺/家饰/鲜花": "0",
               "化妆品/个人护理": 0.07239742,
               "女装/内衣": 0.93837427},
 "New Balance": {"母婴玩具": 0.43237442,
                 "家纺/家饰/鲜花": "0",
                 "女鞋/男鞋/箱包": 0.57823432},
 "Jackjones": {"男装/户外运动/": 0.883293743,
               "家纺/家饰/鲜花": "0",
               "女鞋/男鞋/箱包": 0.10343298,
               "手机/数码/电脑办公": 0.02234927},
 "ZARA": {"女鞋/男鞋/箱包": 0.12429483,
          "家纺/家饰/鲜花": "0",
          "女装/内衣": 0.78283128,
          "腕表/珠宝饰品/眼镜": 0.10213943},
 "海尔": {"家纺/家饰/鲜花": "0。1323243",
            "厨具/收纳/宠物": 0.09354832,
            "大家电/生活电器": 0.79103821},
 "nike": {"男装/户外运动/": 0.891232313,
          "家纺/家饰/鲜花": "0",
          "化妆品/个人护理": 0.06163211,
          "手机/数码/电脑办公": 0.04293713},
 "双立人": {"家纺/家饰/鲜花": "0",
               "厨具/收纳/宠物": 0.98943242,
               "医药保健": 0.01943242},
 "苹果": {"手机/数码/电脑办公": 0.89232342,
            "家纺/家饰/鲜花": "0",
            "汽车/配件/用品": 0.05293713,
            "腕表/珠宝饰品/眼镜": 0.05230971},
 "兰芝": {"家纺/家饰/鲜花": "0",
            "女装/内衣": 0.09238374,
            "化妆品/个人护理": 0.78423234,
            "腕表/珠宝饰品/眼镜": 0.13213232}}

#输出排序后的字典,美化前
{"太平鸟": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.8478237190000001], ["化妆品/个人护理", 0.11242904]]", "博士伦": "[["家纺/家饰/鲜花", "0"], ["医药保健", 0.89348974], ["化妆品/个人护理", 0.11323213]]", "a02": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.9844473220000001], ["女鞋/男鞋/箱包", 0.12493492]]", "周黑鸭": "[["家纺/家饰/鲜花", "0"], ["零食/进口食品/茶酒", 0.87323123], ["厨具/收纳/宠物", 0.12432232]]", "3M": "[["家纺/家饰/鲜花", "0"], ["家居建材", 0.68213814], ["厨具/收纳/宠物", 0.32344534]]", "博士": "[["家纺/家饰/鲜花", "0"]]", "sk-II": "[["家纺/家饰/鲜花", "0"], ["化妆品/个人护理", 0.98843487], ["腕表/珠宝饰品/眼镜", 0.02324442]]", "洗洁精": "[["家纺/家饰/鲜花", "0"], ["图书音像", 0.02124194]]", "finity": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.93392424], ["女鞋/男鞋/箱包", 0.07323483]]", "selected": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.9344398420000001], ["女鞋/男鞋/箱包", 0.07438472]]", "scofield": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.87473829], ["化妆品/个人护理", 0.21392374]]", "米其林": "[["家纺/家饰/鲜花", "0.02432412"], ["汽车/配件/用品", 0.98233342]]", "好奇": "[["家纺/家饰/鲜花", "0"], ["母婴玩具", 0.89472934], ["零食/进口食品/茶酒", 0.11321412]]", "佐卡伊": "[["家纺/家饰/鲜花", "0"], ["腕表/珠宝饰品/眼镜", 0.87342324], ["化妆品/个人护理", 0.13232944]]", "波司登": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.78765673], ["化妆品/个人护理", 0.20183924]]", "breadbutter": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.684728232], ["零食/进口食品/茶酒", 0.29434974]]", "北极绒": "[["家纺/家饰/鲜花", "0.84932498"], ["家居建材", 0.11321321], ["大家电/生活电器", 0.05213923]]", "Adidas": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.8297434340000001], ["女鞋/男鞋/箱包", 0.14974892]]", "当当网": "[["家纺/家饰/鲜花", "0"], ["图书音像", 0.78947234]]", "snidle": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.83927289], ["女鞋/男鞋/箱包", 0.15237234]]", "TISSOT": "[["家纺/家饰/鲜花", "0"], ["腕表/珠宝饰品/眼镜", 0.87545234], ["大家电/生活电器", 0.13942309]]", "曼妮芬": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.93837427], ["化妆品/个人护理", 0.07239742]]", "New Balance": "[["家纺/家饰/鲜花", "0"], ["女鞋/男鞋/箱包", 0.57823432], ["母婴玩具", 0.43237442]]", "Jackjones": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.883293743], ["女鞋/男鞋/箱包", 0.10343298]]", "ZARA": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.78283128], ["女鞋/男鞋/箱包", 0.12429483]]", "海尔": "[["家纺/家饰/鲜花", "0。1323243"], ["大家电/生活电器", 0.79103821], ["厨具/收纳/宠物", 0.09354832]]", "nike": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.8912323129999999], ["化妆品/个人护理", 0.06163211]]", "双立人": "[["家纺/家饰/鲜花", "0"], ["厨具/收纳/宠物", 0.98943242], ["医药保健", 0.01943242]]", "苹果": "[["家纺/家饰/鲜花", "0"], ["手机/数码/电脑办公", 0.89232342], ["汽车/配件/用品", 0.05293713]]", "兰芝": "[["家纺/家饰/鲜花", "0"], ["化妆品/个人护理", 0.78423234], ["腕表/珠宝饰品/眼镜", 0.13213232]]"}

#输出排序后的字典,美化后
{"太平鸟": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.8478237190000001], ["化妆品/个人护理", 0.11242904]]",
 "博士伦": "[["家纺/家饰/鲜花", "0"], ["医药保健", 0.89348974], ["化妆品/个人护理", 0.11323213]]",
 "a02": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.9844473220000001], ["女鞋/男鞋/箱包", 0.12493492]]",
 "周黑鸭": "[["家纺/家饰/鲜花", "0"], ["零食/进口食品/茶酒", 0.87323123], ["厨具/收纳/宠物", 0.12432232]]",
 "3M": "[["家纺/家饰/鲜花", "0"], ["家居建材", 0.68213814], ["厨具/收纳/宠物", 0.32344534]]",
 "博士": "[["家纺/家饰/鲜花", "0"]]",
 "sk-II": "[["家纺/家饰/鲜花", "0"], ["化妆品/个人护理", 0.98843487], ["腕表/珠宝饰品/眼镜", 0.02324442]]",
 "洗洁精": "[["家纺/家饰/鲜花", "0"], ["图书音像", 0.02124194]]",
 "finity": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.93392424], ["女鞋/男鞋/箱包", 0.07323483]]",
 "selected": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.9344398420000001], ["女鞋/男鞋/箱包", 0.07438472]]",
 "scofield": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.87473829], ["化妆品/个人护理", 0.21392374]]",
 "米其林": "[["家纺/家饰/鲜花", "0.02432412"], ["汽车/配件/用品", 0.98233342]]",
 "好奇": "[["家纺/家饰/鲜花", "0"], ["母婴玩具", 0.89472934], ["零食/进口食品/茶酒", 0.11321412]]",
 "佐卡伊": "[["家纺/家饰/鲜花", "0"], ["腕表/珠宝饰品/眼镜", 0.87342324], ["化妆品/个人护理", 0.13232944]]",
 "波司登": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.78765673], ["化妆品/个人护理", 0.20183924]]",
 "breadbutter": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.684728232], ["零食/进口食品/茶酒", 0.29434974]]",
 "北极绒": "[["家纺/家饰/鲜花", "0.84932498"], ["家居建材", 0.11321321], ["大家电/生活电器", 0.05213923]]",
 "Adidas": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.8297434340000001], ["女鞋/男鞋/箱包", 0.14974892]]",
 "当当网": "[["家纺/家饰/鲜花", "0"], ["图书音像", 0.78947234]]",
 "snidle": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.83927289], ["女鞋/男鞋/箱包", 0.15237234]]",
 "TISSOT": "[["家纺/家饰/鲜花", "0"], ["腕表/珠宝饰品/眼镜", 0.87545234], ["大家电/生活电器", 0.13942309]]",
 "曼妮芬": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.93837427], ["化妆品/个人护理", 0.07239742]]",
 "New Balance": "[["家纺/家饰/鲜花", "0"], ["女鞋/男鞋/箱包", 0.57823432], ["母婴玩具", 0.43237442]]",
 "Jackjones": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.883293743], ["女鞋/男鞋/箱包", 0.10343298]]",
 "ZARA": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.78283128], ["女鞋/男鞋/箱包", 0.12429483]]",
 "海尔": "[["家纺/家饰/鲜花", "01323243"], ["大家电/生活电器", 0.79103821], ["厨具/收纳/宠物", 0.09354832]]",
 "nike": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.8912323129999999], ["化妆品/个人护理", 0.06163211]]",
 "双立人": "[["家纺/家饰/鲜花", "0"], ["厨具/收纳/宠物", 0.98943242], ["医药保健", 0.01943242]]",
 "苹果": "[["家纺/家饰/鲜花", "0"], ["手机/数码/电脑办公", 0.89232342], ["汽车/配件/用品", 0.05293713]]",
 "兰芝": "[["家纺/家饰/鲜花", "0"], ["化妆品/个人护理", 0.78423234], ["腕表/珠宝饰品/眼镜", 0.13213232]]"}
    这里结果显示的不太好看,其实在linux下输出很清晰,看图片:

天猫商品预测二次分类的排序结果

这个的难点在于python的多层嵌套字典的输出和删除python字典中的值,譬如在这里就是删除字典中value = 0的值,我最开始的时候是把value值提取出来放到一个列表里去删除,但是删除之后至少还会保留一个0值,后来想到可以把字典的key和value反转,用dict.pop删除key = 0的键值对就可以了。第二个难点就是多层嵌套字典的排序。我们知道字典是无序的,所以只能把字典按照value排序,然后把排序后的结果存到一个list里,在和原来对应的key值列表组合成字典,这样就方便多了。
记录一下上周的工作,以后忘记了回来再看,如果大家有更好的方法,欢迎交流~
ps:这个天猫数据是我编的,如果需要我可以分享出来 = =

  • 3
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Adobe After Effects CC Classroom in a Book (2019 Release Finelybook 出版日期) By 作者: Lisa Fridsma – Brie Gyncild ISBN-10 书号: 0135298644 ISBN-13 书号: 9780135298640 Edition 版本: 1 出版日期: 2018-12-31 pages 页数: (416 ) Creative professionals seeking the fastest, easiest, most comprehensive way to learn Adobe After Effects CC (2019 release) choose Adobe After Effects CC Classroom in a Book (2019 release) from Adobe Press. The 15 project-based lessons in this book show users step-by-step the key techniques for working in After Effects. Learn how to create, manipulate, and optimize motion graphics for film, video, DVD, the web, and mobile devices. Gain hands-on experience animating text and images, customizing a wide range of effects, tracking and syncing content, rotoscoping, distorting and warping images and video, and correcting color. Create Motion Graphics templates in After Effects so colleagues can make specific edits in Premiere Pro without accidentally changing other critical settings. Learn to create 3D content with both After Effects and Maxon Cinema 4D Lite (included with the software). The online companion files include all the necessary assets for readers to complete the projects featured in each chapter. All buyers of the book get full access to the Web Edition 版本: A Web-based version of the complete ebook enhanced with video and multiple-choice quizzes. Contents Getting Started 1Getting to Know the Workflow 2 Creating a Basic Animation Using Effects and Presets 3 Animating Text 4 Working with Shape Layers 5 Animating a Multimedia Presentation 6 Animating Layers 7Working with Masks 8 Distorting Objects with the Puppet Tools 9 Using the Roto Brush Tool 10 Performing Color Correction 11 Creating Motion Graphics Templates 12 Using 3D Features 13 Working with the 3D Camera Tracker 14 Advanced Editing Techniques 15 Rendering and Outputting Appendix:General keyboard shortcuts Appendix:Customizing keyboard shortcuts Index Production Notes Contributors

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值