Skip to content

Incostencies in behaviour of DataFrame. #515

@weqopy

Description

@weqopy

EDIT (@v0dro):
Following is a list of method that should be implemented/corrected to get more consistency:

  • Vector#last.
  • DataFrame#last.
  • Return type of DataFrame#[] must be consistent when using a timeseries. It currently returns either a numerical value of another Vector or DataFrame depending on what you pass into #[].
  • Return nil when element not present in the DataFrame (currently raises error).

Ideally these should be split into separate issues and tackled one at a time.


I'd like to use this data to show the situation made me confused:

[25] pry(main)> dates=["2018-03-30", "2018-04-02", "2018-04-27", "2018-05-31", "2018-06-29", "2018-07-31", "2018-08-31", "2018-09-28", "2018-10-31", "2018-11-30"]
=> ["2018-03-30",
 "2018-04-02",
 "2018-04-27",
 "2018-05-31",
 "2018-06-29",
 "2018-07-31",
 "2018-08-31",
 "2018-09-28",
 "2018-10-31",
 "2018-11-30"]
[26] pry(main)> val=[1.00000001, 0.9999, 0.9908, 1.0885, 1.0586, 1.0374, 0.9456, 0.9638, 0.8397, 0.8788]
=> [1.00000001, 0.9999, 0.9908, 1.0885, 1.0586, 1.0374, 0.9456, 0.9638, 0.8397, 0.8788]
[27] pry(main)> id=Daru::DateTimeIndex.new(dates)
=> #<Daru::DateTimeIndex(10) 2018-03-30T00:00:00+00:00...2018-11-30T00:00:00+00:00>
[28] pry(main)> df = Daru::DataFrame.new({val: val}, index: id)
=> #<Daru::DataFrame(10x1)>
                   val
 2018-03-30 1.00000001
 2018-04-02     0.9999
 2018-04-27     0.9908
 2018-05-31     1.0885
 2018-06-29     1.0586
 2018-07-31     1.0374
 2018-08-31     0.9456
 2018-09-28     0.9638
 2018-10-31     0.8397
 2018-11-30     0.8788
  • first & last
[29] pry(main)> df.val.first
=> 1.00000001
[30] pry(main)> df.val.last 
NoMethodError: undefined method `last' for #<Daru::Vector:0x00007f43dbc591f0>
from /usr/local/lib/ruby/gems/2.4.0/gems/daru-0.2.1/lib/daru/vector.rb:1420:in `method_missing'
# which I supposed it returns 0.8788
  • The return type
[31] pry(main)> df.val['2018-03-30','2018-04-30']
=> #<Daru::Vector(3)>
                                       val
 2018-03-30T00:00:00+           1.00000001
 2018-04-02T00:00:00+               0.9999
 2018-04-27T00:00:00+               0.9908
[32] pry(main)> df.val['2018-04']
=> #<Daru::Vector(2)>
                                       val
 2018-04-02T00:00:00+               0.9999
 2018-04-27T00:00:00+               0.9908
[33] pry(main)> df.val['2018-03-30','2018-04-01']
=> 1.00000001
[34] pry(main)> df.val['2018-03']
=> 1.00000001
# which I supposed [33] and [34] both return:
# => #<Daru::Vector(1)>
#                                        val
#  2018-03-30T00:00:00+           1.00000001
  • errors and a not error
[48] pry(main)> df.val['2018']
=> #<Daru::Vector(10)>
                                       val
 2018-03-30T00:00:00+           1.00000001
 2018-04-02T00:00:00+               0.9999
 2018-04-27T00:00:00+               0.9908
 2018-05-31T00:00:00+               1.0885
 2018-06-29T00:00:00+               1.0586
 2018-07-31T00:00:00+               1.0374
 2018-08-31T00:00:00+               0.9456
 2018-09-28T00:00:00+               0.9638
 2018-10-31T00:00:00+               0.8397
 2018-11-30T00:00:00+               0.8788
[49] pry(main)> df.val['2017']
ArgumentError: Key 2017 is out of bounds
from /usr/local/lib/ruby/gems/2.4.0/gems/daru-0.2.1/lib/daru/date_time/index.rb:362:in `[]'
[50] pry(main)> df.val['2019']
ArgumentError: Key 2019 is out of bounds
from /usr/local/lib/ruby/gems/2.4.0/gems/daru-0.2.1/lib/daru/date_time/index.rb:362:in `[]'
[52] pry(main)> df.val['2018-12']
ArgumentError: bad value for range
from /usr/local/lib/ruby/gems/2.4.0/gems/daru-0.2.1/lib/daru/date_time/index.rb:547:in `slice_between_dates'
[53] pry(main)> df.val['2018-02']
=> #<Daru::Vector(10)>
                                       val
 2018-03-30T00:00:00+           1.00000001
 2018-04-02T00:00:00+               0.9999
 2018-04-27T00:00:00+               0.9908
 2018-05-31T00:00:00+               1.0885
 2018-06-29T00:00:00+               1.0586
 2018-07-31T00:00:00+               1.0374
 2018-08-31T00:00:00+               0.9456
 2018-09-28T00:00:00+               0.9638
 2018-10-31T00:00:00+               0.8397
 2018-11-30T00:00:00+               0.8788
# I supposed all those errors and [53] could return #<Daru::Vector(0)> #

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions