25. Pandas的数据拼接-concat函数
在pandas里提供concat函数可以将形参给出的列表里的各个pandas的数据拼接成一个大的数据。
- 两个Series的拼接
import pandas as pd
import numpy as np
s1 = pd.Series(np.arange(2,6))
s2 = pd.Series(np.arange(8,12))
ss = pd.concat([s1, s2])
print ss
程序的执行结果:
0 2
1 3
2 4
3 5
0 8
1 9
2 10
3 11
dtype: int64
- 两个DataFrame的拼接 1). label和columns均相同的情况下:
import pandas as pd
import numpy as np
col = "hello the cruel world".split()
idx = ["a", "b", "c", "d"]
val1 = np.arange(16).reshape(4, 4)
val2 = np.arange(20, 36).reshape(4, 4)
df1 = pd.DataFrame(val1, index = idx, columns = col)
print df1
df2 = pd.DataFrame(val2, index = idx, columns = col)
print df2
df12 = pd.concat([df1, df2])
print df12
程序的执行结果:
hello the cruel world # prinf df1
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
d 12 13 14 15
hello the cruel world # print df2
a 20 21 22 23
b 24 25 26 27
c 28 29 30 31
d 32 33 34 35
hello the cruel world # print df12
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
d 12 13 14 15
a 20 21 22 23
b 24 25 26 27
c 28 29 30 31
d 32 33 34 35
2). 对于DataFrame的拼接比较复杂,原因是label和columns有可能不是一一对应的,这个时候两DataFrame未匹配上的label或columns下的值为NaN。
import pandas as pd
import numpy as np
col1 = "hello the cruel world".split()
col2 = "hello the nice world".split()
idx1 = ["a", "b", "c", "d"]
idx2 = ["a", "b", "d", "e"]
val1 = np.arange(16).reshape(4, 4)
val2 = np.arange(20, 36).reshape(4, 4)
df1 = pd.DataFrame(val1, index = idx1, columns = col1)
print df1
df2 = pd.DataFrame(val2, index = idx2, columns = col2)
print df2
df12 = pd.concat([df1, df2])
print df12
程序执行结果:
hello the cruel world # print df1
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
d 12 13 14 15
hello the nice world # print df2
a 20 21 22 23
b 24 25 26 27
d 28 29 30 31
e 32 33 34 35
cruel hello nice the world # print df12
a 2 0 NaN 1 3
b 6 4 NaN 5 7
c 10 8 NaN 9 11
d 14 12 NaN 13 15
a NaN 20 22 21 23
b NaN 24 26 25 27
d NaN 28 30 29 31
e NaN 32 34 33 35
- 指定拼接的轴,默认是列方向的拼接数据,可以指定concat 的形参axis为行上的拼接数据。
import pandas as pd
import numpy as np
col1 = "hello the cruel world".split()
col2 = "hello the nice world".split()
idx1 = ["a", "b", "c", "d"]
idx2 = ["a", "b", "d", "e"]
val1 = np.arange(16).reshape(4, 4)
val2 = np.arange(20, 36).reshape(4, 4)
df1 = pd.DataFrame(val1, index = idx1, columns = col1)
print df1
df2 = pd.DataFrame(val2, index = idx2, columns = col2)
print df2
df12 = pd.concat([df1, df2], axis = 1)
print df12
程序的执行结果:
hello the cruel world # print df1
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
d 12 13 14 15
hello the nice world# print df2
a 20 21 22 23
b 24 25 26 27
d 28 29 30 31
e 32 33 34 35
hello the cruel world hello the nice world # print df12
a 0 1 2 3 20 21 22 23
b 4 5 6 7 24 25 26 27
c 8 9 10 11 NaN NaN NaN NaN
d 12 13 14 15 28 29 30 31
e NaN NaN NaN NaN 32 33 34 35
感谢Klang(金浪)智能数据看板klang.org.cn鼎力支持!