Udacity Deep Learning实战(一)

Udacity的深度学习是Google开设的一门基于TensorFlow完成任务的在线课程,课程短小精悍,包括4章(入门ML/DL,DNN,CNN,RNN)、6个小作业(以ipynb的形式,十分方便友好)和1个大作业(开发实时摄像头应用)。

有ML/DL基础的同学,视频很快可以过完,因此课程精华在于其实战项目,很有意思。作为G家的课程,算是TensorFlow比较权威的学习tutorial了。

Problem 1

用IPython.display来可视化一些样本数据:

from IPython.display import display, Image
def visualize(folders):
    for folder_path in folders:
        fnames = os.listdir(folder_path)
        random_index = np.random.randint(len(fnames))
        fname = fnames[random_index]
        display(Image(filename=os.path.join(folder_path, fname)))

print("train_folders")
visualize(train_folders)
print("test_folders")
visualize(test_folders)

Problem 2

使用matplotlib.pyplot可视化样本:

def visualize_datasets(datasets):
    for dataset in datasets:
        with open(dataset, 'rb') as f:
            letter = pickle.load(f)
            sample_idx = np.random.randint(len(letter))
            sample_image = letter[sample_idx, :, :]
            fig = plt.figure()
            plt.imshow(sample_image)
        break

visualize_datasets(train_datasets)
visualize_datasets(test_datasets)

Problem 3

检查样本是否平衡(不同样本的数量差不多):

def check_dataset_is_balanced(datasets, notation=None):
    print(notation)
    for label in datasets:
        with open(label, 'rb') as f:
            ds = pickle.load(f)
            print("label {} has {} samples".format(label, len(ds)))

check_dataset_is_balanced(train_datasets, "training set")
check_dataset_is_balanced(test_datasets, "test set")

Problem 5

统计训练集、测试集和验证集出现重复的样本:

import hashlib

def count_duplicates(dataset1, dataset2):
    hashes = [hashlib.sha1(x).hexdigest() for x in dataset1]
    dup_indices = []
    for i in range(0, len(dataset2)):
        if hashlib.sha1(dataset2[i]).hexdigest() in hashes:
            dup_indices.append(i)
    return len(dup_indices)

data = pickle.load(open('notMNIST.pickle', 'rb'))
print(count_duplicates(data['test_dataset'], data['valid_dataset']))
print(count_duplicates(data['valid_dataset'], data['train_dataset']))
print(count_duplicates(data['test_dataset'], data['train_dataset']))

Problem 6

使用50、100、1000和5000个和全部训练样本来训练一个off-the-shelf模型,可以借助sklearn.linear_model中的Logistic Regression方法。

def train_and_predict(X_train, y_train, X_test, y_test):
    lr = LogisticRegression()

    X_train = X_train.reshape(X_train.shape[0], 28*28)
    lr.fit(X_train, y_train)

    X_test = X_test.reshape(X_test.shape[0], 28*28)
    print(lr.score(X_test, y_test))

def main():
    X_train = data["train_dataset"]
    y_train = data["train_labels"]

    X_test = data["test_dataset"]
    y_test = data["test_labels"]
    for size in [50, 100, 1000, 5000, None]:
        train_and_predict(X_train[:size], y_train[:size], X_test[:size], y_test[:size])

main()

转自:http://blog.csdn.net/Draco_mystack/article/details/77341715天才写的特别好

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Udacity自动驾驶数据集是Udacity为其自动驾驶算法比赛专门准备的数据集。该数据集对连续视频图片进行了仔细的标注,主要包含了汽车、行人、大型车辆等类别。数据集的大小为1.5G,共有9420张图像。标注格式采用了2D坐标,包括了Car、Truck、Pedestrian三类。如果你需要使用该数据集,你可以通过下载dataset1来获取数据。同时,你可以使用数据格式转化工具将数据转化为voc格式,以便更好地进行处理和分析。\[2\]\[3\] #### 引用[.reference_title] - *1* [Udacity CH2 数据集解析小技巧](https://blog.csdn.net/weixin_44337149/article/details/118541085)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* [Udacity Self-Driving 目标检测数据集简介与使用](https://blog.csdn.net/Jesse_Mx/article/details/72599220)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [Udacity Self-Driving自动驾驶目标检测数据集使用指南](https://blog.csdn.net/u010801994/article/details/85092375)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^koosearch_v1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值