Reading Data into Python (Easy Mode)

Whether you’re just learning Python or have made dozens of scripts but still can never remember how to read and parse files for your quick and dirty scripts, these handy snippets should help. All subsequent code was written for Python 2.7.

A simple example for .txt files

file = open("your_filename_here.txt", "r")
for line in file:
    line.parse("\t")  #e.g. splits the line (a list) into items by tab 

For .txt files in particular, it’s often a very good idea to print at least one line to see what data hygeine you might have to do. These files often have /n at the end of each line to signify a new line in the file, and that can seriously mess with your data! As always, when analyzing data, make sure you’re checking what you’re reading in before you go too far.

 

A simple example for .csv files

import csv
file = open("your_filename_here.csv", "r")
for line in file:
    line.split(",")  #e.g. splits the line (a list) into items by comma 

One thing that’s can be tricky about .csv files is the fact that they can be saved as Windows Comma Separated (.csv) or MS-DOS Comma Separated (.csv). When programming with Python 2.7 on my Mac, sometimes I come across problems when the csv I’m working with isn’t saved in the Windows Comma Separated format, though the above example for reading in data will work just fine for either format.

 

Look at your data

The first mistake I made when I was starting to code was just printing the entire file to make sure I knew how the data looked. This is just not a good idea for many files, as they’re so big it takes too much time/memory and your program might freeze up. Test the waters with something like the following in your terminal (note the following syntax is written for a Mac, terminal commands can be different for Windows):

your-computer:~ leeorg$ head -10 your_filename_here.csv

This will print the first ten lines of your file, whatever the file type may be. Want to look at the last 3 lines in the file instead?

your-computer:~ leeorg$ tail -3 your_filename_here.csv

No idea what a terminal is or how to use it? It’s not as hard as you think! I recommend familiarizing yourself as soon as you can, it’s extremely useful.

 

No really, look at your data

The second mistake I found I made early on was not looking at my data carefully enough. Really make sure you know what’s going on in your data file before you get too far along. It might only take one NaN (Not a Number) entry to throw off your entire analysis (I’ve been there before, don’t do what I did).