35. Pandas的时间序列数据-date_range参数详解
之前的章节已对date_range函数的基本使用做了简要的演示,这章将对此函数的参数作以较为详细的使用和演示。
- freq = "T",按分钟为间隔(频率)产生时间序列,等价于"min"。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='T')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='min')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-16 18:30:34 0.489893
2018-12-16 18:31:34 0.000442
2018-12-16 18:32:34 -0.465273
2018-12-16 18:33:34 -0.173814
2018-12-16 18:34:34 -0.603672
Freq: T, dtype: float64
2018-12-16 18:30:34 0.690540
2018-12-16 18:31:34 -0.815213
2018-12-16 18:32:34 0.460163
2018-12-16 18:33:34 1.515437
2018-12-16 18:34:34 -0.832920
Freq: T, dtype: float64
- freq = "S",则是以秒为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='3T10S')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-16 18:30:34 -1.078270
2018-12-16 18:33:44 -0.120087
2018-12-16 18:36:54 1.863152
2018-12-16 18:40:04 -0.601866
2018-12-16 18:43:14 0.881057
Freq: 190S, dtype: float64
这里的时间间隔频率为3分10秒。
- freq = "H",则是以小时为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='2H')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-16 18:30:34 -0.182473
2018-12-16 20:30:34 1.037907
2018-12-16 22:30:34 -0.175579
2018-12-17 00:30:34 -0.586400
2018-12-17 02:30:34 -0.334369
Freq: 2H, dtype: float64
从结果可看出时间序列前后相差2小时。
- freq = "B",则是以工作日为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods = 10, freq='B')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-17 18:30:34 0.011285
2018-12-18 18:30:34 0.972737
2018-12-19 18:30:34 0.109900
2018-12-20 18:30:34 -0.969465
2018-12-21 18:30:34 -0.885282
2018-12-24 18:30:34 -1.722596
2018-12-25 18:30:34 0.678189
2018-12-26 18:30:34 0.402022
2018-12-27 18:30:34 -0.740186
2018-12-28 18:30:34 1.302828
Freq: B, dtype: float64
22、23日为周六、周日结果里缺少。
- freq = "D",则是以日为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods = 10, freq='2D')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-16 18:30:34 0.327716
2018-12-18 18:30:34 0.784813
2018-12-20 18:30:34 1.432993
2018-12-22 18:30:34 1.148707
2018-12-24 18:30:34 0.996547
2018-12-26 18:30:34 -0.210021
2018-12-28 18:30:34 -0.175977
2018-12-30 18:30:34 0.473569
2019-01-01 18:30:34 0.642001
2019-01-03 18:30:34 0.675140
Freq: 2D, dtype: float64
结果里的日期时间序列是日在发生变化,相差2天。
- freq = "W",则是以周为频率产生时间序列,默认以周日为起点来构造即"W-SUN"。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W-SUN')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-16 18:30:34 0.557365
2018-12-23 18:30:34 -0.306496
2018-12-30 18:30:34 -1.172465
2019-01-06 18:30:34 0.434073
2019-01-13 18:30:34 0.106500
2019-01-20 18:30:34 0.773861
2019-01-27 18:30:34 -0.236211
2019-02-03 18:30:34 -0.303260
2019-02-10 18:30:34 0.974439
2019-02-17 18:30:34 -0.356273
Freq: W-SUN, dtype: float64
2018-12-16 18:30:34 0.180012
2018-12-23 18:30:34 -0.977006
2018-12-30 18:30:34 0.095408
2019-01-06 18:30:34 -0.097709
2019-01-13 18:30:34 -0.401469
2019-01-20 18:30:34 -0.283461
2019-01-27 18:30:34 -1.138246
2019-02-03 18:30:34 -1.675089
2019-02-10 18:30:34 0.511324
2019-02-17 18:30:34 0.728807
Freq: W-SUN, dtype: float64
时间的起点是2018-12-15
周六,产生的结果第一条是2018-12-16
周日,每条时间相差7天,共10条记录(periods = 10)。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W-Fri')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-16 18:30:34 -1.133046
2018-12-23 18:30:34 -1.083898
2018-12-30 18:30:34 -1.503690
2019-01-06 18:30:34 -0.866094
2019-01-13 18:30:34 -0.945356
2019-01-20 18:30:34 0.021928
2019-01-27 18:30:34 -0.591696
2019-02-03 18:30:34 -1.710630
2019-02-10 18:30:34 2.121283
2019-02-17 18:30:34 0.739256
Freq: W-SUN, dtype: float64
2018-12-21 18:30:34 2.082080
2018-12-28 18:30:34 1.368807
2019-01-04 18:30:34 0.599276
2019-01-11 18:30:34 -0.149521
2019-01-18 18:30:34 1.134686
2019-01-25 18:30:34 -0.582935
2019-02-01 18:30:34 -0.470655
2019-02-08 18:30:34 0.983203
2019-02-15 18:30:34 -0.067618
2019-02-22 18:30:34 -0.736081
Freq: W-FRI, dtype: float64
语句cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W-Fri')
则是从2018-12-15
(周六)开始产生都是星期五的时间序列,共10个时间,2018-12-15
后的第一个星期五是2018-12-21
,第二个周五则是2018-12-28
。因此"W-FRI"
则是产生每周几这样的一个时间序列。
- freq = "M",则是以月为频率产生时间序列,以月末为时间点,而freq = "MS"则是以月初为时间点。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='M')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='MS')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-31 18:30:34 2.844877
2019-01-31 18:30:34 -0.405763
2019-02-28 18:30:34 1.048116
2019-03-31 18:30:34 -0.353364
2019-04-30 18:30:34 1.146974
2019-05-31 18:30:34 -2.594504
2019-06-30 18:30:34 1.149964
2019-07-31 18:30:34 0.152655
2019-08-31 18:30:34 0.456799
2019-09-30 18:30:34 0.356193
Freq: M, dtype: float64
2019-01-01 18:30:34 -0.410882
2019-02-01 18:30:34 -1.349693
2019-03-01 18:30:34 0.363404
2019-04-01 18:30:34 0.352792
2019-05-01 18:30:34 0.334477
2019-06-01 18:30:34 0.181288
2019-07-01 18:30:34 -0.936703
2019-08-01 18:30:34 -0.512834
2019-09-01 18:30:34 -0.243987
2019-10-01 18:30:34 0.727383
Freq: MS, dtype: float64
2018-12-15
后的第一个月末日期为2018-12-31
,第一个月初为2019-01-01
。
- freq = "BM",则是以月末工作日为频率产生时间序列,但不是每月的最后一天。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='BM')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='BMS')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-31 18:30:34 0.338989
2019-01-31 18:30:34 -0.074689
2019-02-28 18:30:34 -1.309663
2019-03-29 18:30:34 0.139394
2019-04-30 18:30:34 -0.519024
2019-05-31 18:30:34 0.573932
2019-06-28 18:30:34 0.551329
2019-07-31 18:30:34 -0.849871
2019-08-30 18:30:34 -0.685058
2019-09-30 18:30:34 -0.160009
Freq: BM, dtype: float64
2019-01-01 18:30:34 0.499660
2019-02-01 18:30:34 -0.912324
2019-03-01 18:30:34 0.412629
2019-04-01 18:30:34 1.222422
2019-05-01 18:30:34 -0.618880
2019-06-03 18:30:34 0.132562
2019-07-01 18:30:34 0.721672
2019-08-01 18:30:34 -1.086498
2019-09-02 18:30:34 -1.670070
2019-10-01 18:30:34 -2.165835
Freq: BMS, dtype: float64
注意2019-03-29
不是3月的最后一天,2019-03-30
和2019-03-31
非工作日。
而2019-06-03
也非6月第一天,但是工作日,而2019-06-01
、2019-06-02
为休息日。
- freq = "Q",则是以季度(末)为频率产生时间序列,freq = "QS"是以季度(初)。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='q')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='qs')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-31 18:30:34 0.364439
2019-03-31 18:30:34 -0.295537
2019-06-30 18:30:34 0.562707
2019-09-30 18:30:34 -0.226738
2019-12-31 18:30:34 0.623051
2020-03-31 18:30:34 -0.675792
2020-06-30 18:30:34 -0.848371
2020-09-30 18:30:34 -0.805518
2020-12-31 18:30:34 -0.061498
2021-03-31 18:30:34 0.291014
Freq: Q-DEC, dtype: float64
2019-01-01 18:30:34 -0.236873
2019-04-01 18:30:34 -1.399436
2019-07-01 18:30:34 1.011018
2019-10-01 18:30:34 1.254754
2020-01-01 18:30:34 -0.569184
2020-04-01 18:30:34 -1.480181
2020-07-01 18:30:34 -0.396710
2020-10-01 18:30:34 1.157218
2021-01-01 18:30:34 -0.119259
2021-04-01 18:30:34 0.773836
Freq: QS-JAN, dtype: float64
当然Q也可以和B组合,像之前的M一样。
- freq = "A",则是以年(末)为频率产生时间序列,freq = "AS"则是年初。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='a')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='as')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
程序执行结果:
2018-12-31 18:30:34 -0.058588
2019-12-31 18:30:34 -0.676757
2020-12-31 18:30:34 -0.368606
2021-12-31 18:30:34 -0.820318
2022-12-31 18:30:34 0.959945
2023-12-31 18:30:34 -0.144216
2024-12-31 18:30:34 0.827481
2025-12-31 18:30:34 1.812374
2026-12-31 18:30:34 -1.473202
2027-12-31 18:30:34 -1.633083
Freq: A-DEC, dtype: float64
2019-01-01 18:30:34 -0.037793
2020-01-01 18:30:34 1.067194
2021-01-01 18:30:34 -1.517820
2022-01-01 18:30:34 -0.101716
2023-01-01 18:30:34 0.413106
2024-01-01 18:30:34 -0.912453
2025-01-01 18:30:34 0.197084
2026-01-01 18:30:34 -0.513032
2027-01-01 18:30:34 -0.027010
2028-01-01 18:30:34 -0.263569
Freq: AS-JAN, dtype: float64
感谢Klang(金浪)智能数据看板klang.org.cn鼎力支持!