Introduction to Using Python for Data Analysis.

First, open up Spyder. Spyder has three main parts: the main window (we'll be typing our program in here), the console in the lower-right (by default) and this amazing thing called the Object Inspector in the upper-right (by default). The object inspector will look at what you're typing and instantly pull up the documentation for it which, as beginners, is invaluable. As time goes on you may want to turn it off, but it helps a bunch in the beginning.

For now, we're going to do two mini-projects just to get your feet wet with Python: we're going to construct and plot $\sin(x)$ and we're going to make a histogram from normal (bell-shaped) data. If you're not sure of what the normal distribution is, you should check it out before starting. The two main packages we'll be using are **numpy** which you will see a lot (it is *the* scientific computing package for Python) and **matplotlib** which is useful for, among other things, graphing. Let's get going!

For both files, we will need to import *numpy* and the *pyplot* part of *matplotlib.pyplot*. We can do this by typing the following commands at the top of the file:

import numpy as np import matplotlib.pyplot as plt

These things might look strange, but they're not bad: we're simply importing the packages, and when we want to use something in them we will "call" them by using *np* for numpy and *plt* for matplotlib.pyplot. You'll see what we mean by this in a second.

First, let's try something out. Python, by default, doesn't know anything about the sine function; luckily, numpy knows a lot about it, and we've imported it. Hence, the line

print np.sin(pi / 2)

will print out `1.0` if everything is correct (to try this, click the little green running man at the top, or click on "Run" and "Run" in the menu). What did we do in this line? We've *called* numpy by typing `np` first, then we said, "I want to use the sine function fron the numpy package," so we typed `np.sin()`. For kicks, I evaluated this at $\frac{\pi}{2}$, so we put `pi / 2` on the inside the ()'s. In general, this is how using commands from numpy or matplotlib.pyplot work: you put `np` or `pl` followed by a dot followed by a function. Spyder has some sweet auto-complete stuff, so if you have an idea of what you want to do you may just try typing it in and see if there's a related function. If not, google.

That one value of sine was great, but we want to plot sine so we need a lot of values. Let's take 100 values between 0 and $2\pi$. We could think about this like this: "Well, $2\pi = 6.28$..., so if I divide this into a hundred pieces each piece will be 0.628..., so I'll have a list like..." Luckily, numpy has a function which automatically splits up an interval into however many pieces we want! How nice of it. Let's split up the interval $0$ to $2\pi$ into 100 pieces and save that partition as the variable `x` as follows:

x = np.linspace(0,2*pi, 100)

Again, this calls numpy with `np` and uses the `linspace` function which takes the interval $0$ to $2\pi$ and divides it into 100 pieces. Cool.

Once we have this partition, we'd like to evaluate `sin` at each point. Luckily, this is easy! Recalling that `x` is the 100 points equally spaced between 0 and 1 that we just made, we apply `np.sin` to our `x`. Here's what we type:

y = np.sin(x)

This saves our sine values (as an array) to the variable `y`.

Now, we will use matplotlib.pyplot to plot `y`. We use the command `plt.plot(y)`. In total, your entire program should look like this:

import numpy as np import matplotlib.pyplot as plt x = np.linspace(0,2*pi,100) y = np.sin(x) plt.plot(x,y)

and this will return a delightful picture showing off the sine curve. Good job!

Save your previous program and start a new file. We'll start by importing the same two things as before.

import numpy as np import matplotlib.pyplot as plt

Getting normal-shaped (bell-shaped) data is easy in numpy! To get, say, 100 random numbers from a normal distribution we have the `random.randn` command. The first "random" is because numpy has a lot of random commands that we can use; the "randn" tells it which particular random command we'd like to use. Let's use this command and save it as a variable `N`:

N = np.random.randn(100)

The 100 tells numpy that we want 100 of these random numbers. At this point, we can use the histogram command from mathplotlib.pyplot:

plt.hist(N, bins=10)

import numpy as np import matplotlib.pyplot as pl g = np.random.randn(100) plt.hist(g, bins=10)

So run it and check out that histogram. If we want it to look a bit nicer, we can try to generate 10,000 random numbers (what would we change?).

Okay, well, that's not *everything*, but this is a good start: we know that numpy and matplotlib.pyplot are neat and they have a lot to offer. They're also relatively simple to use. This section was essentially to move you gently towards making bigger and more powerful programs — and, just as important, to see if Python and friends have been set up correctly.

⇐ Back to 0.1 | Home | Onwards to 1.2 ⇒ |