How to Load Data from CSV (Data Preparation Part)

  • How to load a CSV file
  • How to convert strings from a file to floating point numbers.
  • How to convert class values from a file to integers.

1.2 Tutorial

  1. Load a file
  2. Load a file and convert Strings to Floats
  3. Load a file and convert Strings to Integers.
# Function for loading a CSV
# load a CSV file
from csv import reader

def load_csv(filename):
    file = open(filename,"r")
    lines = reader(file)
    dataset = list(lines)
    return dataset

load_csv('pima-indians-diabetes.data.csv')
# Example of Loading the Pima Indians Diabetes Dataset CSV File
# Example of loading Pima Indians CSV dataset
from csv import reader

# Load a csv file
def load_csv(filename):
    file = open(filename,"r")
    lines = reader(file)
    dataset = list(lines)
    return dataset

# Load dataset
filename = 'pima-indians-diabetes.data.csv'
dataset = load_csv(filename)

print('Loaded data file {0} with {1} rows and {2} columns'.format(filename,len(dataset),len(dataset[0])))

Sample output from loading the Pima Indians Diabetes dataset CSV file.

A limitation of this function is that it will load empty lines from data files and add them to our list of rows.  Below is the updated example with the new improved version of the load_csv () function

# Improved Example of Loading the Pima Indians Diabetes Dataset CSV File
# Example of loading Pima Indian CSV dataset
from csv import reader

# Load a CSV file
def load_csv(filename):
    dataset = list()
    with open(filename, 'r') as file:
        csv_reader = reader(file)
        for row in csv_reader:
            if not row:
                continue
            dataset.append(row)
    return dataset
                
# Load dataset
filename = 'pima-indians-diabetes.data.csv'
dataset = load_csv(filename)
print('Loaded data file {0} with {1} rows and {2} columns'.format(filename,len(dataset),len(dataset[0])))

Sample Output From Loading the Pima Indians Diabetes Dataset CSV File

1.2 Convert String to Floats

if not all machine learning algorithms prefer to work with numbers. Specifically, floating point numbers are prefered.Our code for loading a CSV file returns a dataset as a list of lists. but each value is a string. We can see if we print out one record from the dataset:

print(dataset[0])

 We can write a small function to convert specific columns of our loaded dataset to floating point values.Below is this function called str_column_to_float(). It will convert a given column in the dataset to floating point values, careful to strip any whitespace from the value befor making the conversion.

def str_column_to_float(dataset, column):
    for row in dataset:
        row[column] = float(row[column].strip())

We can test this function by combining it with our load CSV function above, and convert all of the numeric data in the Pima Indians dataset to floating point values. The complete example is below.

# Example of converting string variables to float
from csv import reader

# Load a CSV file
def load_csv(filename):
    dataset = list()
    with open(filename, 'r') as file:
        csv_reader = reader(file)
        for row in csv_reader:
            if not row:
                continue
            dataset.append(row)
    return dataset

# Convert string column to float
def str_column_to_float(dataset, column):
    for row in dataset:
        row[column] = float(row[column].strip())
        
# Load pima-indians-diabetes dataset
filename = 'pima-indians-diabetes.data.csv'
dataset = load_csv(filename)
print('Loaded data file {0} with {1} rows and {2} columns'.format(filename,len(dataset),len(dataset[0])))

print(dataset[0])

# convert string columns to float
for i in range(len(dataset[0])):
    str_column_to_float(dataset,i)
print(dataset[0])

 Running this example we see the first row of the dataset printed both before and after the conversion. We can see that the values in each column have been converted from strings to numbers.

 Some machine learning algorithms prefer all values to be numeric, including the outcome or predicted value. We can convert the class value in the iris flowers dataset to an integer by creating a map.

  • First, we locate all of the unique class values, which happen to be: Iris-setosa, Iris-versicolor and Iris-virginica.
  • Next, we assign an integer value to each, such as: 0, 1 and 2.
  • Finally, we replace all occurrences of class string values with their corresponding integer values.

Below is a function to do just that called str_column_to_int(). Like the previously introduced str_column_to_float() it operates on a single column in the dataset.

# Example of integer encoding string class values
from csv import reader

# Load a CSV file
def load_csv(filename):
    dataset = list()
    with open(filename, 'r') as file:
        csv_reader = reader(file)
        for row in csv_reader:
            if not row:
                continue
            dataset.append(row)
    return dataset

# Convert string column to float
def str_column_to_float(dataset, column):
    for row in dataset:
        row[column] = float(row[column].strip())

# Convert string column to float
def str_column_to_float(dataset,column):
    for row in dataset:
        row[column] = float(row[column].strip())
        
# Convert string column to integer
def str_column_to_int(dataset, column):
    class_values = [row[column] for row in dataset]
    unique = set(class_values)
    lookup = dict()
    for i, value in enumerate(unique):
        lookup[value] = i
    for row in dataset:
        row[column] = lookup[row[column]]
    return lookup

# Load iris dataset
filename = 'iris.csv'
dataset = load_csv(filename)
print('Loaded data file {0} with {1} rows and {2} columns'.format(filename,len(dataset),len(dataset[0])))
print(dataset[0])

# convert string columns to float
for i in range(4):
    str_column_to_int(dataset,4)
# convert class column to int
lookup = str_column_to_int(dataset, 4)
print(dataset[0])
print(lookup)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值