java iterator转_java-使用DataSetIterator时TransformProcess转换数据

我有一个既包含数值属性又包含名义属性的CSV数据集.我为数据集定义了架构,该架构列出了名义属性的所有可能值.之后,我创建了TransformProcess,以使用CategoricalToOneHotTransform将标称值转换为数值.如何在RecordReaderDataSetIterator上使用此TransformProcess为我的神经网络做准备?

Schema schema = new Schema.Builder()

.addColumnInteger("age")

.addColumnCategorical("workclass", "Private", "Self-emp-not-inc", "Self-emp-inc", "Federal-gov", "Local-gov", "State-gov", "Without-pay", "Never-worked")

.addColumnInteger("fnlwgt")

.addColumnCategorical("education", "Bachelors", "Some-college", "11th", "HS-grad", "Prof-school", "Assoc-acdm", "Assoc-voc", "9th", "7th-8th", "12th", "Masters", "1st-4th", "10th", "Doctorate", "5th-6th", "Preschool")

.addColumnInteger("education-num")

.addColumnCategorical("marital-status", "Married-civ-spouse", "Divorced", "Never-married", "Separated", "Widowed", "Married-spouse-absent", "Married-AF-spouse")

.addColumnCategorical("occupation", "Tech-support", "Craft-repair", "Other-service", "Sales", "Exec-managerial", "Prof-specialty", "Handlers-cleaners", "Machine-op-inspct", "Adm-clerical", "Farming-fishing", "Transport-moving", "Priv-house-serv", "Protective-serv", "Armed-Forces")

.addColumnCategorical("relationship", "Wife", "Own-child", "Husband", "Not-in-family", "Other-relative", "Unmarried")

.addColumnCategorical("race", "White", "Asian-Pac-Islander", "Amer-Indian-Eskimo", "Other", "Black")

.addColumnCategorical("sex", "Female", "Male")

.addColumnInteger("capital-gain")

.addColumnInteger("capital-loss")

.addColumnInteger("hours-per-week")

.addColumnCategorical("native-country", "United-States", "Cambodia", "England", "Puerto-Rico", "Canada", "Germany", "Outlying-US(Guam-USVI-etc)", "India", "Japan", "Greece", "South", "China", "Cuba", "Iran", "Honduras", "Philippines", "Italy", "Poland", "Jamaica", "Vietnam", "Mexico", "Portugal", "Ireland", "France", "Dominican-Republic", "Laos", "Ecuador", "Taiwan", "Haiti", "Columbia", "Hungary", "Guatemala", "Nicaragua", "Scotland", "Thailand", "Yugoslavia", "El-Salvador", "Trinadad&Tobago", "Peru", "Hong", "Holand-Netherlands")

.addColumnCategorical("class", ">50K", "<=50K")

.build();

TransformProcess tp = new TransformProcess.Builder(schema)

.transform(new CategoricalToOneHotTransform("workclass"))

.transform(new CategoricalToOneHotTransform("education"))

.transform(new CategoricalToOneHotTransform("marital-status"))

.transform(new CategoricalToOneHotTransform("occupation"))

.transform(new CategoricalToOneHotTransform("relationship"))

.transform(new CategoricalToOneHotTransform("race"))

.transform(new CategoricalToOneHotTransform("sex"))

.transform(new CategoricalToOneHotTransform("native-country"))

.transform(new CategoricalToIntegerTransform("class"))

.build();

Schema outputSchema = tp.getFinalSchema();

int numLinesToSkip = 0;

String delimiter = ",";

CSVRecordReader recordReader = new CSVRecordReader(numLinesToSkip, delimiter);

recordReader.initialize(new FileSplit(Paths.get("..\\adult.data").toFile()));

int labelIndex = outputSchema.getColumnNames().size() - 1;

int numClasses = 2;

int batchSize = 2000;

RecordReaderDataSetIterator iterator = new RecordReaderDataSetIterator(recordReader, batchSize, labelIndex, numClasses);

DataSet allData = iterator.next();

allData.shuffle();

SplitTestAndTrain testAndTrain = allData.splitTestAndTrain(0.65);

解决方法:

RecordReaderDataSetItertor接收记录读取器并处理向量化过程.这将包装记录读取器,并输出转换后的记录,然后将其馈送到recordreaderdatasetiterator.

标签:csv,deeplearning4j,java

来源: https://codeday.me/bug/20191111/2018933.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值