Stanford CS246 homework of NTHU-CS-MDA lecture ( K-means )

最新推荐文章于 2022-04-13 22:14:23 发布

Gravitychen

最新推荐文章于 2022-04-13 22:14:23 发布

阅读量545

点赞数

分类专栏： python 文章标签： kmeans

本文链接：https://blog.csdn.net/weixin_38699659/article/details/85072699

版权

concept

c1 : 10个cluster 的起点，随机起点

c2 : 10个cluster 的起点，很远的起点

data ：所有数据，最长维度 == 233

使用mac

注意hadoop 只能用 os 本身的 python，我裝了 anaconda ，需要先移開，並在 os python pip numpy 才能用
記得在文件開頭加上 #!/usr/bin/env python

要使用文件名稱的話，加上

try:
    input_file = os.environ['mapreduce_map_input_file']
except KeyError:
    input_file = os.environ['map_input_file']

debug 流程，寫在知乎了。
https://zhuanlan.zhihu.com/p/49264405

implement

mapper.py

#!/usr/bin/env python
from __future__ import print_function
import sys
import os
import fileinput
import numpy as np
import logging


try:
    input_file = os.environ['mapreduce_map_input_file']
except KeyError:
    input_file = os.environ['map_input_file']


c1 , c2 , data , vocab , newc1,newc2,newdata= [] , [] , [] , [] , [] , [] , [] 


for line in fileinput.input(): # dui

    # bu neng you \n de a xiong di
    if "c1.txt" in input_file:
    	print("%s\t111"%(line.strip()))
    if "c2.txt" in input_file:
    	print("%s\t222"%(line.strip()))
    if "data.txt" in input_file:
    	print("%s\t333"%(line.strip()))
    if "vocab.txt" in input_file:
    	print("%s\t444"%(line.strip()))

reducer.py

#!/usr/bin/env python
# coding:utf-8
from __future__ import print_function
import fileinput
import ast
import numpy as np
# np.set_printoptions(threshold=np.inf,linewidth=200)
from tqdm import tqdm

def get_loss(init,dist = "euclidean"):

    #  c1 , c2         随机起点 ，远起点
    costs_=[]
    MAX_ITER = 20
    
    for itr in  range(MAX_ITER)

最低0.47元/天解锁文章

Gravitychen

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Stanford CS246 homework of NTHU-CS-MDA lecture ( K-means )

conceptc1 : 10个cluster 的起点，随机起点c2 : 10个cluster 的起点，很远的起点data ：所有数据，最长维度 == 233implement2d array里面找出最大长度的 1维 array# max 1D array length in a 2D arraylen(max(data,key = len)) == 233一...
复制链接

扫一扫