###python has a nice csv reader,which reads each line of a file into memory.You can read in each row and just append a list.From there,you can
####quickly trun it into an array.The first thing to do is to import the relevant package,that i will need for my script.These include the numpy (for
#####maths and arrays),and csv for reading and writing csv files .If I want to use something from this I need to call csv.[function] or np.[function]
###first
import csv as csv
import numpy as np
#######open up the csv file in to a Python object
csv_file_object=csv.reader(open(D:\udacity P2/train.csv','rb''))
header=csv_file_object.next() #### the next() command just skipts the first line which is a header
data=[] ######Creat a variable called 'data'
for row in csv_file_object: ####run through each row in the csv file,
data.append(row) ####adding each row to data variable
data=np.array(data) ####then convert from a list to an array.Be aware that each item is currently a string in this format
#######now if you want to call a specific column fo data,say,the gender column,i can just typt data[0::,4,remembering that "0::" means all
######(from start to end), and Python starts indices from 0(not 1).You should be aware that the csv reader works by default wiht strings,so you
####will need to convert to floats in order to do numerical calculations.For example,you can turn the Pclass variable into floats by using
#####data[0::,2].astype(np.float).Using this,we can calculate the proportion of survivors on the Titanic:
##### The size() function counts how many elements are in the array and sum() (as you would expects) sums up the elements in array.
number_passagers=np.size(data[0::,1].astype(np.float))
number_survived=np.sum(data[0::,1].astype(np.float))
proportion_survivors=number_survived / number_passengers
######numpy has some lovely functions.For example,we can search the gender column,find wherw any elements equal female(and for males
######'do not equal female'),and then use this to determine the numver of females and males that survived:
women_only_stats=data[0::,4]=="female" ###this finds where all the elements in the gender column that equals "female"
men_only_stats=data[0::,4]!="female" ####this finds where all the elements do not equal female (i.e.male)
########we use these two new variables as "mask" on our original train data,so we can select only those women,and only those men on
########board,then calculate the proportion of those who survived:
######using the index from above we select the females and males separately
women_onboard=data[women_only_stats,1].astype(np.float)
men_onboard=data[men_only_stats,1].astype(np.float)
####then we finds the proportions of them that survived
proportion_women_survived=\
np.sum(women_onboard)/np.size(women_onboard)
proportion_men_survived=\
np.sum(men_onboard)/np.size(men_onboard)
####and then print it out
print 'Proportion of women who survived is %s' % proportion_women_survived
print'Proportion fo men who surivived is %s' % proportion_men_survived
#####now that i have my indication that women were much more likely to survive,I am done with the training set.
######reading the test data and writing the gender modle as a csv
######as before,we need to read in the test file by opening a python object to read and another to write.First,we read in the test.csc file and
####skip the header line:
test_file=open('D:\udacity P2/test.csv','rb')
test_file_object=csv.reader(test_file)
header=test_file_object.next()
#####now,let's open a pointer to a new file so we can write to it (this file does not exist yet).Call it something descriptive so that it si recognizable
#####whnen we ipload it:
prediction_file=open("genderbasedmodel.csv","rb")
prediction_file_object=csv.writer(prediction_file)
#####we now want to read in the test file row by row,see if it is female or male,and writer our survival prediciton to a new file
prideiction_file_object.writerow(["PassengerId","Survived"])
for row in test_file_object: #######for each row in test.csv
if row [3] =='female': ############is it a female,if yes then
prediction_file_obgect.writerow([row[0],'1') #############predict 1
else:
prediction_file_object.writerow([row[0],'0']) #########predict 0
test_file.close()
prediction_file.close()