The file from the URL in your post contains additional commas for some items in the GICS industry group column. The first occurs at line 31 in the file:
ABUNDANT PRODUCE LIMITED,ABT,Food, Beverage & Tobacco
Normally, the 3rd item should be surrounded by quotes to escape breaking on the comma, such as:
ABUNDANT PRODUCE LIMITED,ABT,"Food, Beverage & Tobacco"
For this situation, because the first 2 columns appear to be clean, you can merge any additional text into the 3rd field. After this cleaning, load it into a data frame.
You can do this with a generator that will pull out and clean each line one at a time. The pd.DataFrame constructor will read in the data and create a data frame.
import pandas as pd
def merge_last(file_name, skip_lines=0):
with open(file_name, 'r') as fp:
for i, line in enumerate(fp):
if i < 2:
continue
x, y, *z = line.strip().split(',')
yield (x,y,','.join(z))
# create a generator to clean the lines, skipping the first 2
gen = merge_last('ASXListedCompanies.csv', 2)
# get the column names
header = next(gen)
# create the data frame
df = pd.DataFrame(gen, columns=header)
df.head()
returns:
Company name ASX code GICS industry group
0 MOQ LIMITED MOQ Software & Services
1 1-PAGE LIMITED 1PG Software & Services
2 1300 SMILES LIMITED ONT Health Care Equipment & Services
3 1ST GROUP LIMITED 1ST Health Care Equipment & Services
4 333D LIMITED T3D Commercial & Professional Services
And the rows with the extra commas are preserved:
df.loc[27:30]
# returns:
Company name ASX code GICS industry group
27 ABUNDANT PRODUCE LIMITED ABT Food, Beverage & Tobacco
28 ACACIA COAL LIMITED AJC Energy
29 ACADEMIES AUSTRALASIA GROUP LIMITED AKG Consumer Services
30 ACCELERATE RESOURCES LIMITED AX8 Class Pend
Here is a more generalized generator that will merge after a given number of columns:
def merge_last(file_name, merge_after_col=2, skip_lines=0):
with open(file_name, 'r') as fp:
for i, line in enumerate(fp):
if i < 2:
continue
spl = line.strip().split(',')
yield (*spl[:merge_after_col], ','.join(spl[merge_after_col:]))