User Tools

Site Tools


chapter4

Chapter4

How to start python at chimera server

Please enter 'python2' to start python version 2.

python2

4.1. NumPy ndarray | 多次元配列オブジェクト

call numpy module

import numpy as np

4.1.1 Create ndarry

data1 = [6,7.5,8,0,1]
arr1 = np.array(data1)

Difference between ndarry and normal array

data1 * 2   # [6, 7.5, 8, 0, 1, 6, 7.5, 8, 0, 1]
arr1 * 2    # array([ 12.,  15.,  16.,   0.,   2.])

data1 + data1   # [6, 7.5, 8, 0, 1, 6, 7.5, 8, 0, 1]
arr1 + arr1     # array([ 12.,  15.,  16.,   0.,   2.])

Information of ndarry

data2 = [[1,2,3,4],[5,6,7,8]]
arr2 = np.array(data2)
arr2.ndim   # Number of array dimensions
arr2.shape  # Tuple of array dimensions
arr2.dtype  # Create a data type object
arr1.dtype  # Data type is automatically created suitable type

Other ways

np.zeros(10)
np.zeros((3,6))
np.empty
np.arange(15)

4.1.2 Data type of elements of ndarry | ndarry要素のデータ型

A data type object (an instance of numpy.dtype class) describes how the bytes in the fixed-size block of memory corresponding to an array item should be interpreted.

arr1 = np.array([1,2,3], dtype=np.float64)
arr1.dtype   # dtype('float64')
arr2 = np.array([1,2,3], dtype=np.int32)
arr2.dtype   # dtype('int32')

Change (or cast) dtype of array

float_arr = arr2.astype(np.float64)
float_arr.dtype   # dtype('float64')

Change float to integer

float_arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
float_arr.astype(np.int32)   # array([ 3, -1, -2,  0, 12, 10], dtype=int32)

Use dtype of another ndarray

int_arr = np.arange(10)
float_arr = np.array([.22, .270, .357], dtype=np.float64)
int_arr.astype(float_arr.dtype)   # array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

# astype: copy of the array, cast to a specified type.

4.1.3 Calculation of ndarry | ndarryとスカラーの計算

For two ndarray with same size, it calculate element of same position.

arr = np.array([[1.,2.,3.],[4.,5.,6.]])
arr * arr
arr - arr
1 / arr
arr ** 0.5

4.1.4 Indexing and basics of slicing | インデックス参照とスライシングの基礎

NumPy array indexing is a rich topic, as there are many ways you may want to select a subset of your data or individual elements. One-dimensional arrays are simple; on the surface they act similarly to Python lists

arr = np.arange(10)
arr[5]   # 5
arr[5:8]  # array([5, 6, 7])

Broad cast (Substitution of values)

arr[5:8] = 12
arr    # array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9]) Three elements are replaced to '12'

Slice is not copy. Change of slice same as change of original ndarry.

arr_slice = arr[5:8]
arr_slice[1] = 12345
arr   # array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,     9])
arr_slice[:]=64
arr   # array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

To copy slice of narry

arr_slice=arr[5:8].copy()

For multi-dimension

arr2d = np.array([[0,1,2,3,4,5],[10,11,12,13,14,15],[20,21,22,23,24,25],[30,31,32,33,34,35],[40,41,42,43,44,45],[50,51,52,53,54,55]])
arr2d[1]   # array([10, 11, 12, 13, 14, 15])
arr2d[0][2]   # 2
arr2d[0,2]   # 2
arr3d = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])   # 2 x 3
arr3d[0]   # array([[1, 2, 3],[4, 5, 6]])
arr3d[1,0]   # array([7, 8, 9])

Slice of multi-dimension

arr2d[0,3:5]
arr2d[4:,4:]
arr2d[:,2]
arr2d[2::2,::2]

# Comment left means inclusive

right means exclusive

4.1.5 Boolean Indexing | プールインデックス参照

random: to generate some random normally distributed data

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data_random = randn(7,4)   !!! NOT WORKING !!!
data_random = np.random.randn(7,4)

Check whether elements of names is 'Bob'

names == 'Bob'   # array([ True, False, False,  True, False, False, False], dtype=bool)

Extract whether rows of data matched to position of 'Bob' (0 and 3)

data_int = np.array([[00,01,02,03],[10,11,12,13],[20,21,22,23],[30,31,32,33],[40,41,42,43],[50,51,52,53],[60,61,62,63]])
data_int[names == 'Bob']   # array([[ 0,  1,  2,  3],[30, 31, 32, 33]])
data_int[names == 'Bob', 2:]   # array([[ 2,  3],[32, 33]])
data_int[names == 'Bob', 3]   # array([ 3, 33])

Check whether elements of names is not 'Bob'

names != 'Bob'   # array([False,  True,  True, False,  True,  True,  True], dtype=bool)
data_int[-(names == 'Bob')]   # array([[10, 11, 12, 13],[20, 21, 22, 23],[40, 41, 42, 43],[50, 51, 52, 53],[60, 61, 62, 63]])

Select two elements from names

mask = (names == 'Bob') | (names == 'Will')
mask   # array([ True, False,  True,  True,  True, False, False], dtype=bool)
data_int[mask]   # array([[ 0,  1,  2,  3],[20, 21, 22, 23],[30, 31, 32, 33],[40, 41, 42, 43]])

Substitution of values using true/false array

data_random
data_random[data_random < 0] = 0
data_random

data_int
data_int [names != 'Joe'] = 7
data_int  # array([[ 7,  7,  7,  7],[10, 11, 12, 13],[ 7,  7,  7,  7],[ 7,  7,  7,  7],[ 7,  7,  7,  7],[50, 51, 52, 53],[60, 61, 62, 63]])

4.1.6 Fancy indexing | ファンシーインデックス参照

arr = np.empty((8,4))
for i in range(8):
 arr[i] = i
arr
arr[[4,3,9,6]]
arr[[-3,-5,-7]]

arr = np.arange(32).reshape((8,4))
arr
arr[[1,5,7,2],[0,3,1,2]]   # array([ 4, 23, 29, 10])  = Position of (1,0),(5,3),(7,1),(2,2)

To get elements of row 1,5,7,2 in order of 0,3,1,2

arr[[1,5,7,2]][:,[0,3,1,2]]
arr[np.ix_([1,5,7,2],[0,3,1,2])]

4.1.7 Transpose of matrix | 転置行列、行と列の入れ替え

Transpose of matrix

arr.T

Inner product | 内積 

np.dot(arr.T, arr) # 2240 = 0+4*4+8*8+12*12+16*16+20*20+24*24+28*28

Transpose: Permute the dimensions of an array.

arr = np.arange(16).reshape((2,2,4))
arr.transpose((1,0,2))

#other example
x = np.arange(4).reshape((2,2))
np.transpose(x)

swapaxes(a, axis1, axis2): Interchange two axes of an array.

arr.swapaxes(1,2)

4.2 Universal function | ユニバーサル関数

Example of monomial ufnc. Using one ndarray | 単項ufnc

arr = np.arange(10)
arr   # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

sqrt: square | 平方根

np.sqrt(arr)

exp: exponential | 指数

np.exp(arr)

Example of binomial ufunc. Using two ndarry | 二項ufnc

maximum: get maximum element by comparing value of each elements

x = np.arange(10)   # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([6,7,5,8,0,4,1,2,9,3])   # array([6, 7, 5, 8, 0, 4, 1, 2, 9, 3])
np.maximum(x,y)   # array([6, 7, 5, 8, 4, 5, 6, 7, 9, 9])

subtract: subtract between 1st ndarray and 2nd ndarray

np.subtract(x,y)  # array([-6, -6, -3, -5,  4,  1,  5,  5, -1,  6])

4.3 Data processing using ndarray | ndarrayを用いたデータ処理

This practice of replacing explicit loops with array expressions is commonly referred to as vectorization. In general, vectorized array operations will often be one or two (or more) orders of magnitude faster than their pure Python equivalents.

Let's evaluate the function sqrt(x^2 + y^2) across a regular grid of values (Example from http://d.hatena.ne.jp/y_n_c/20091122/1258904025)

x = np.arange(5)
y = np.arange(5)
xs, ys = np.meshgrid(x, y)  # Convert to two dimension array
xs   # array([[0, 1, 2, 3, 4],[0, 1, 2, 3, 4],[0, 1, 2, 3, 4],[0, 1, 2, 3, 4],[0, 1, 2, 3, 4]])
ys   # array([[0, 0, 0, 0, 0],[1, 1, 1, 1, 1],[2, 2, 2, 2, 2],[3, 3, 3, 3, 3],[4, 4, 4, 4, 4]])
z = np.sqrt(xs ** 2 + ys ** 2)
z

Visualize values of sqrt(x^2 + y^2) using matplotlib

import matplotlib.pyplot as plt
plt.imshow(z, cmap=plt.cm.gray); plt.colorbar()
!!! _tkinter.TclError: no display name and no $DISPLAY environment variable !!!
plt.title("Image plot of $\sqrt{x^2 + y^2}$ for a grid of values")
!!! _tkinter.TclError: no display name and no $DISPLAY environment variable !!!

Save a graph as PDF file (commented by Nakagawa)

Please install matplotlib beforehand (not to Chimera):

e.g.: sudo easy_install -m matplotlib

Then, to show the plot, add the following command:

plt.show()

For PDF output:

import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
pp = PdfPages('test.pdf')  
plt.imshow(z, cmap=plt.cm.gray); plt.colorbar()
plt.title("Image plot of $\sqrt{x^2 + y^2}$ for a grid of values")    
pp.savefig()
pp.close()

Save a graph as PNG file

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
plt.imshow(z, cmap=plt.cm.gray); plt.colorbar()
plt.title("Image plot of $\sqrt{x^2 + y^2}$ for a grid of values")
plt.savefig('myfig1.png')

Save a graph as PNG file (Original data from book)

points = np.arange(-5, 5, 0.01) 
xs, ys = np.meshgrid(points, points)
z = np.sqrt(xs ** 2 + ys ** 2)

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
plt.imshow(z, cmap=plt.cm.gray); plt.colorbar()
plt.title("Image plot of $\sqrt{x^2 + y^2}$ for a grid of values")
plt.savefig('myfig2.png')

4.3.1 Expressing Conditional Logic as Array Operations | 条件制御のndarrayでの表現

Get a value from xarr whenever the corresponding value in cond is 'True' otherwise take the value from yarr.

xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

where(condition[, x, y]): Return elements, either from x or y, depending on condition.

result = np.where(cond, xarr, yarr)   # array([ 1.1,  2.2,  1.3,  1.4,  2.5])

Other example

arr = np.random.randn(4, 4)
np.where(arr > 0, 2, -2)  # set positive values to 2 and negative value to -2 
np.where(arr > 0, 2, arr)  # set only positive values to 2

Nested example

cond1 = np.array([True, False, True, True, False])
cond2 = np.array([True, True, True, False, False])      
np.where(cond1 & cond2, 0,  
       np.where(cond1, 1,   
                np.where(cond2, 2, 3))) 
# array([0, 2, 0, 1, 3])  
# cond2 => [2, 2, 2, 3, 3] => [1, 2, 1, 1, 3] => [0, 2, 0, 1, 3]

Mathematical and Statistical Methods

arr = np.arange(32).reshape((8,4))

Mean | 平均

arr.mean()
np.mean(arr)

Sum | 合計

arr.sum()

Optional axis argument

arr.mean(axis=1)  # mean of rows
arr.sum(0)  # sum of columns

cumsum: Cumulative sum of elements starting from 0 cumprod: Cumulative product of elements starting from 1

arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
arr.cumsum(0)
arr.cumprod(1)

Methods for Boolean Arrays

For boolean array, sum is often used as a means of counting True values in a boolean array

arr = np.random.randn(100)
(arr > 0).sum() # Number of positive values

any: tests whether one or more values in an array is True all: checks if every value is True

bools = np.array([False, False, True, False])
bools.any()
bools.all()

Sorting

sort: Return a sorted copy of an array.

arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr.sort()

Multidimensional arrays can have each 1D section of values sorted in-place along an axis by passing the axis number to sort

arr = np.random.randn(5, 3)
arr.sort(1)

Unique and Other Set Logic

unique: Find the unique elements of an array.

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
np.unique(names)

ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
np.unique(ints)

in1d: Test whether each element of a 1-D array is also present in a second array.

values = np.array([6, 0, 0, 3, 2, 5, 6])
np.in1d(values, [2, 3, 6])

4.4 File Input and Output with Arrays

NumPy is able to save and load data to and from disk either in text or binary format. In later chapters you will learn about tools in pandas for reading tabular data into memory.

Storing Arrays on Disk in Binary Format

save: Save an array to a binary file in NumPy .npy format.

arr = np.arange(10)
np.save('some_array', arr)
np.load('some_array.npy')

savez: Save several arrays into a single file in uncompressed .npz format.

np.savez('array_archive.npz', a=arr, b=arr)
arch = np.load('array_archive.npz')
arch['b']

Saving and Loading Text Files

Create test csv file

exit()  # exit python
touch array_ex.txt
vi array_ex.txt

# copy following values
0.580052,0.186730,1.040717,1.134411
0.194163,-0.636917,-0.938659,0.124094
-0.126410,0.268607,-0.695724,0.047428
-1.484413,0.004176,-0.744203,0.005487
2.302869,0.200131,1.670238,-1.881090
-0.193230,1.047233,0.482803,0.960334

# start python2
python2 
import numpy as np

loadtext: Load data from a text file.

arr = np.loadtxt('array_ex.txt', delimiter=',')
arr

4.5 Linear Algebra

Linear algebra, like matrix multiplication, decompositions, determinants, and other square matrix math, is an important part of any array library. There is a function dot, both an array method, and a function in the numpy namespace, for matrix multiplication.

x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])
x
y

dot: Dot product of two arrays. | 内積

x.dot(y)  # equivalently np.dot(x, y)
np.dot(x, np.ones(3))  # A matrix product between a 2D array and a suitably sized 1D array results in a 1D array

numpy.linalg has a standard set of matrix decompositions and things like inverse and determinant. These are implemented under the hood using the same industry-standard Fortran libraries used in other languages like MATLAB and R, such as like BLAS, LAPACK, or possibly (depending on your NumPy build) the Intel MKL. List of linalg

from numpy.linalg import inv, qr # import libraries
X = np.arange(25).reshape((5,5))
mat = X.T.dot(X)

inv: Compute the inverse of a square matrix

inv(mat)

qr: Compute the QR decomposition

q, r = qr(mat)

4.6 Random Number Generation

The numpy.random module supplements the built-in Python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions. For example, you can get a 4 by 4 array of samples from the standard normal distribution using normal

samples = np.random.normal(size=(4, 4))

4.7 Example: Random Walks

An illustrative application of utilizing array operations is in the simulation of random walks. Let’s first consider a simple random walk starting at 0 with steps of 1 and -1 occurring with equal probability.

A pure Python way to implement a single random walk with 1,000 steps using the built-in random module:

import random
position = 0
walk = [position]
steps = 1000
for i in xrange(steps):
  step = 1 if random.randint(0, 1) else -1
  position += step
  walk.append(position)
  

To draw 1,000 coin flips at once, set these to 1 and -1, and compute the cumulative sum:

nsteps = 1000
draws = np.random.randint(0, 2, size=nsteps)
steps = np.where(draws > 0, 1, -1)
walk = steps.cumsum()

walk.min() # minimum and maximum value along the walk’s trajectory
walk.max()

A more complicated statistic is the first crossing time, the step at which the random walk reaches a particular value. Here we might want to know how long it took the random walk to get at least 10 steps away from the origin 0 in either direction. np.abs(walk) >= 10 gives us a boolean array indicating where the walk has reached or exceeded 10, but we want the index of the first 10 or -10. Turns out this can be computed using argmax, which returns the first index of the maximum value in the boolean array (True is the maximum value)

argmax: Indices of the maximum values along an axis.

(np.abs(walk) >= 10).argmax()

Simulating Many Random Walks at Once

If your goal was to simulate many random walks, say 5,000 of them, you can generate all of the random walks with minor modifications to the above code. The numpy.random functions if passed a 2-tuple will generate a 2D array of draws, and we can compute the cumulative sum across the rows to compute all 5,000 random walks in one shot:

nwalks = 5000
nsteps = 1000
draws = np.random.randint(0, 2, size=(nwalks, nsteps)) # 0 or 1
steps = np.where(draws > 0, 1, -1)
walks = steps.cumsum(1)
walks

walks.max() 
walks.min()

To compute the minimum crossing time to 30 or -30. (not all 5,000 of them reach 30.)

hits30 = (np.abs(walks) >= 30).any(1)
hits30
hits30.sum() # Number that hit 30 or -30

To select out the rows of walks that actually cross the absolute 30 level and call argmax across axis 1 to get the crossing times

crossing_times = (np.abs(walks[hits30]) >= 30).argmax(1)
crossing_times.mean()

To generate normally distributed steps with some mean and standard deviation

steps = np.random.normal(loc=0, scale=0.25,
      size=(nwalks, nsteps))
chapter4.txt · Last modified: 2014-06-09 08:00 by yoko_nagai