35. Pandas的时间序列数据-date_range参数详解

之前的章节已对date_range函数的基本使用做了简要的演示,这章将对此函数的参数作以较为详细的使用和演示。

  • freq = "T",按分钟为间隔(频率)产生时间序列,等价于"min"。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='T')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='min')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34    0.489893
2018-12-16 18:31:34    0.000442
2018-12-16 18:32:34   -0.465273
2018-12-16 18:33:34   -0.173814
2018-12-16 18:34:34   -0.603672
Freq: T, dtype: float64
2018-12-16 18:30:34    0.690540
2018-12-16 18:31:34   -0.815213
2018-12-16 18:32:34    0.460163
2018-12-16 18:33:34    1.515437
2018-12-16 18:34:34   -0.832920
Freq: T, dtype: float64
  • freq = "S",则是以秒为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='3T10S')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34   -1.078270
2018-12-16 18:33:44   -0.120087
2018-12-16 18:36:54    1.863152
2018-12-16 18:40:04   -0.601866
2018-12-16 18:43:14    0.881057
Freq: 190S, dtype: float64

这里的时间间隔频率为3分10秒。

  • freq = "H",则是以小时为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='2H')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34   -0.182473
2018-12-16 20:30:34    1.037907
2018-12-16 22:30:34   -0.175579
2018-12-17 00:30:34   -0.586400
2018-12-17 02:30:34   -0.334369
Freq: 2H, dtype: float64

从结果可看出时间序列前后相差2小时。

  • freq = "B",则是以工作日为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods = 10, freq='B')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-17 18:30:34    0.011285
2018-12-18 18:30:34    0.972737
2018-12-19 18:30:34    0.109900
2018-12-20 18:30:34   -0.969465
2018-12-21 18:30:34   -0.885282
2018-12-24 18:30:34   -1.722596
2018-12-25 18:30:34    0.678189
2018-12-26 18:30:34    0.402022
2018-12-27 18:30:34   -0.740186
2018-12-28 18:30:34    1.302828
Freq: B, dtype: float64

22、23日为周六、周日结果里缺少。

  • freq = "D",则是以日为频率产生时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods = 10, freq='2D')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34    0.327716
2018-12-18 18:30:34    0.784813
2018-12-20 18:30:34    1.432993
2018-12-22 18:30:34    1.148707
2018-12-24 18:30:34    0.996547
2018-12-26 18:30:34   -0.210021
2018-12-28 18:30:34   -0.175977
2018-12-30 18:30:34    0.473569
2019-01-01 18:30:34    0.642001
2019-01-03 18:30:34    0.675140
Freq: 2D, dtype: float64

结果里的日期时间序列是日在发生变化,相差2天。

  • freq = "W",则是以周为频率产生时间序列,默认以周日为起点来构造即"W-SUN"。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W-SUN')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34    0.557365
2018-12-23 18:30:34   -0.306496
2018-12-30 18:30:34   -1.172465
2019-01-06 18:30:34    0.434073
2019-01-13 18:30:34    0.106500
2019-01-20 18:30:34    0.773861
2019-01-27 18:30:34   -0.236211
2019-02-03 18:30:34   -0.303260
2019-02-10 18:30:34    0.974439
2019-02-17 18:30:34   -0.356273
Freq: W-SUN, dtype: float64
2018-12-16 18:30:34    0.180012
2018-12-23 18:30:34   -0.977006
2018-12-30 18:30:34    0.095408
2019-01-06 18:30:34   -0.097709
2019-01-13 18:30:34   -0.401469
2019-01-20 18:30:34   -0.283461
2019-01-27 18:30:34   -1.138246
2019-02-03 18:30:34   -1.675089
2019-02-10 18:30:34    0.511324
2019-02-17 18:30:34    0.728807
Freq: W-SUN, dtype: float64

时间的起点是2018-12-15周六,产生的结果第一条是2018-12-16周日,每条时间相差7天,共10条记录(periods = 10)。

import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W-Fri')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-16 18:30:34   -1.133046
2018-12-23 18:30:34   -1.083898
2018-12-30 18:30:34   -1.503690
2019-01-06 18:30:34   -0.866094
2019-01-13 18:30:34   -0.945356
2019-01-20 18:30:34    0.021928
2019-01-27 18:30:34   -0.591696
2019-02-03 18:30:34   -1.710630
2019-02-10 18:30:34    2.121283
2019-02-17 18:30:34    0.739256
Freq: W-SUN, dtype: float64
2018-12-21 18:30:34    2.082080
2018-12-28 18:30:34    1.368807
2019-01-04 18:30:34    0.599276
2019-01-11 18:30:34   -0.149521
2019-01-18 18:30:34    1.134686
2019-01-25 18:30:34   -0.582935
2019-02-01 18:30:34   -0.470655
2019-02-08 18:30:34    0.983203
2019-02-15 18:30:34   -0.067618
2019-02-22 18:30:34   -0.736081
Freq: W-FRI, dtype: float64

语句cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='W-Fri')则是从2018-12-15(周六)开始产生都是星期五的时间序列,共10个时间,2018-12-15后的第一个星期五是2018-12-21,第二个周五则是2018-12-28。因此"W-FRI"则是产生每周几这样的一个时间序列。

  • freq = "M",则是以月为频率产生时间序列,以月末为时间点,而freq = "MS"则是以月初为时间点。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='M')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='MS')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-31 18:30:34    2.844877
2019-01-31 18:30:34   -0.405763
2019-02-28 18:30:34    1.048116
2019-03-31 18:30:34   -0.353364
2019-04-30 18:30:34    1.146974
2019-05-31 18:30:34   -2.594504
2019-06-30 18:30:34    1.149964
2019-07-31 18:30:34    0.152655
2019-08-31 18:30:34    0.456799
2019-09-30 18:30:34    0.356193
Freq: M, dtype: float64
2019-01-01 18:30:34   -0.410882
2019-02-01 18:30:34   -1.349693
2019-03-01 18:30:34    0.363404
2019-04-01 18:30:34    0.352792
2019-05-01 18:30:34    0.334477
2019-06-01 18:30:34    0.181288
2019-07-01 18:30:34   -0.936703
2019-08-01 18:30:34   -0.512834
2019-09-01 18:30:34   -0.243987
2019-10-01 18:30:34    0.727383
Freq: MS, dtype: float64

2018-12-15后的第一个月末日期为2018-12-31,第一个月初为2019-01-01

  • freq = "BM",则是以月末工作日为频率产生时间序列,但不是每月的最后一天。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='BM')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='BMS')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-31 18:30:34    0.338989
2019-01-31 18:30:34   -0.074689
2019-02-28 18:30:34   -1.309663
2019-03-29 18:30:34    0.139394
2019-04-30 18:30:34   -0.519024
2019-05-31 18:30:34    0.573932
2019-06-28 18:30:34    0.551329
2019-07-31 18:30:34   -0.849871
2019-08-30 18:30:34   -0.685058
2019-09-30 18:30:34   -0.160009
Freq: BM, dtype: float64
2019-01-01 18:30:34    0.499660
2019-02-01 18:30:34   -0.912324
2019-03-01 18:30:34    0.412629
2019-04-01 18:30:34    1.222422
2019-05-01 18:30:34   -0.618880
2019-06-03 18:30:34    0.132562
2019-07-01 18:30:34    0.721672
2019-08-01 18:30:34   -1.086498
2019-09-02 18:30:34   -1.670070
2019-10-01 18:30:34   -2.165835
Freq: BMS, dtype: float64

注意2019-03-29不是3月的最后一天,2019-03-302019-03-31非工作日。 而2019-06-03也非6月第一天,但是工作日,而2019-06-012019-06-02为休息日。

  • freq = "Q",则是以季度(末)为频率产生时间序列,freq = "QS"是以季度(初)。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='q')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='qs')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-31 18:30:34    0.364439
2019-03-31 18:30:34   -0.295537
2019-06-30 18:30:34    0.562707
2019-09-30 18:30:34   -0.226738
2019-12-31 18:30:34    0.623051
2020-03-31 18:30:34   -0.675792
2020-06-30 18:30:34   -0.848371
2020-09-30 18:30:34   -0.805518
2020-12-31 18:30:34   -0.061498
2021-03-31 18:30:34    0.291014
Freq: Q-DEC, dtype: float64
2019-01-01 18:30:34   -0.236873
2019-04-01 18:30:34   -1.399436
2019-07-01 18:30:34    1.011018
2019-10-01 18:30:34    1.254754
2020-01-01 18:30:34   -0.569184
2020-04-01 18:30:34   -1.480181
2020-07-01 18:30:34   -0.396710
2020-10-01 18:30:34    1.157218
2021-01-01 18:30:34   -0.119259
2021-04-01 18:30:34    0.773836
Freq: QS-JAN, dtype: float64

当然Q也可以和B组合,像之前的M一样。

  • freq = "A",则是以年(末)为频率产生时间序列,freq = "AS"则是年初。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='a')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
cur0 = pd.date_range('2018-12-15 18:30:34', periods = 10, freq='as')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts

程序执行结果:

2018-12-31 18:30:34   -0.058588
2019-12-31 18:30:34   -0.676757
2020-12-31 18:30:34   -0.368606
2021-12-31 18:30:34   -0.820318
2022-12-31 18:30:34    0.959945
2023-12-31 18:30:34   -0.144216
2024-12-31 18:30:34    0.827481
2025-12-31 18:30:34    1.812374
2026-12-31 18:30:34   -1.473202
2027-12-31 18:30:34   -1.633083
Freq: A-DEC, dtype: float64
2019-01-01 18:30:34   -0.037793
2020-01-01 18:30:34    1.067194
2021-01-01 18:30:34   -1.517820
2022-01-01 18:30:34   -0.101716
2023-01-01 18:30:34    0.413106
2024-01-01 18:30:34   -0.912453
2025-01-01 18:30:34    0.197084
2026-01-01 18:30:34   -0.513032
2027-01-01 18:30:34   -0.027010
2028-01-01 18:30:34   -0.263569
Freq: AS-JAN, dtype: float64

感谢Klang(金浪)智能数据看板klang.org.cn鼎力支持!