6. Pandas的Series的函数

Pandas的Series有很多的属性和函数，函数一般有两类结果：in-place和copy，in-place意思是修改自身，而copy类的函数会返回一个新的Series而不影响自身。

6.1 get、get_value函数

首先看看get函数，可以返回指定的key所对应的value值，如果key不存在，返回default的值。

>>> import pandas as pd
>>> help(pd.Series.get)
get(self, key, default=None) unbound pandas.core.series.Series method  Get item from object for given key (DataFrame column, Panel slice, etc.). Returns default value if not found

而get_value函数仅返回key所对应的value，如果key不存在则抛出异常。所以建议使用Series的get函数更为稳妥。

import pandas as pd
idx =  "hello the cruel world".split()
val = [1, 21, 13, 104]
t = pd.Series(val, index = idx)

print t.get("the")
print t.get_value("the")

print t.get("The", "None")
#print t.get_value("The")

6.2 add、append函数

add和append函数都能改变series，只不过add类似于加法操作，而append则是连接。

add函数可以将other另一个Series对象加到某Series对象里，两个Series具有相同的index或者label的对应值相加。

import pandas as pd
idx =  "hello the cruel world".split()
val = [1, 21, 13, 104]
t = pd.Series(val, index = idx)
val = [4, 4, 4, 4]
s = pd.Series(val, index = idx)
print t.add(s)
print t + 4

add函数等价于算术运算符加号。

append函数和列表的append函数类似，将另外一个series连接在某series后边。

import pandas as pd
idx =  "hello the cruel world".split()
val = [1, 21, 13, 104]
t = pd.Series(val, index = idx)
val = [4, 4, 4, 4]
s = pd.Series(val, index = idx)
print t, "<- t" 
print t.append(s), "<- append"
print t, "<- t"

6.3 count函数

count函数可以统计series里非NaN数据个数。

import pandas as pd
idx =  "hello the cruel world".split()
val = [1, 21, None, 104]
t = pd.Series(val, index = idx)
print t, "<- t"
print t.count(), "<- t.count()"

6.4 sort_index、sort_values函数

sort_index函数会对series进行index的排序，默认inplace参数为假即返回新的series不影响原series。

import pandas as pd
idx =  "hello the cruel world".split()
val = [1, 21, None, 104]
t = pd.Series(val, index = idx)
print t.sort_index(), "<- t.count()"
print t, "<- t"

执行结果：

cruel    NaN
hello      1
the       21
world    104
dtype: float64 <- t.sort_index()
hello      1
the       21
cruel    NaN
world    104
dtype: float64 <- t

而sort_values函数则是对values进行排序输出，默认inplace参数为假即返回排序后的series，不影响原series。

import pandas as pd
idx =  "hello the cruel world".split()
val = [1000, 201, None, 104]
t = pd.Series(val, index = idx)
print t, "<- t"
print t.sort_values(), "<- t.sort_values()"
print t, "<- t"

如果想影响原series可以启用函数的inplace参数为True。

import pandas as pd
idx =  "hello the cruel world".split()
val = [1000, 201, None, 104]
t = pd.Series(val, index = idx)
print t, "<- t"
t.sort_index(inplace = True)
print t, "<- after t.sort_index(inplace=True)"
t.sort_values(inplace = True)
print t, "<- after t.sort_values(inplace=True)"

程序的执行结果：

hello    1000
the       201
cruel     NaN
world     104
dtype: float64 <- t
cruel     NaN
hello    1000
the       201
world     104
dtype: float64 <- after t.sort_index(inplace=True)
world     104
the       201
hello    1000
cruel     NaN
dtype: float64 <- after t.sort_values(inplace=True)

6.5 reset_index函数

很多的Series操作都依赖于index，所以有必要了解一下修改series的index的函数reset_index的用法。

import pandas as pd
idx =  "hello the cruel world".split()
val = [1000, 201, None, 104]
t = pd.Series(val, index = idx)
print t, "<- t"
print t.reset_index(), "<- reset_index"
print t.reset_index(drop = True), "<- reset_index"
print t, "<- t"

可以启用reset_index函数的drop形参去掉之前的index，启用inplace直接影响原series对象。

6.6 reindex函数

reindex函数可以将series的index换成其他的index。新的series保留原series存在的index的values值，如果新的index没在原series的index里填充NaN值，或者使用fill_value参数指定填充值。

import pandas as pd
idx =  "hello the cruel world".split()
val = [1000, 201, None, 104]
t = pd.Series(val, index = idx)
idn = "hello python nice world".split()
print t, "<- t"
print t.reindex(idn), "<- reindex"
print t.reindex(idn, fill_value = -1), "<- reindex"
print t, "<- t"

执行结果如下：

hello    1000
the       201
cruel     NaN
world     104
dtype: float64 <- t
hello     1000
python     NaN
nice       NaN
world      104
dtype: float64 <- reindex
hello     1000
python      -1
nice        -1
world      104
dtype: float64 <- reindex
hello    1000
the       201
cruel     NaN
world     104
dtype: float64 <- t

感谢Klang(金浪)智能数据看板klang.org.cn鼎力支持！