Skip to content

Incorrect results when using daft.lit(1).over(window) (at least, they differ from duckdb / pyspark) #5685

@MarcoGorelli

Description

@MarcoGorelli

Describe the bug

The code below outputs

daft
╭───────┬───────┬────────╮
│ aib      │
│ ---------    │
│ Int64Int64UInt64 │
╞═══════╪═══════╪════════╡
│ 101      │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 111      │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 221      │
╰───────┴───────┴────────╯

(Showing first 3 of 3 rows)
pyspark
+---+---+---+
|  a|  i|  b|
+---+---+---+
|  1|  0|  2|
|  1|  1|  2|
|  2|  2|  1|
+---+---+---+

To Reproduce

import daft
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
from pyspark.sql.window import Window as W
from datetime import datetime

data = {"a": [1, 1, 2], "i": [0, 1, 2]}

print("daft")
print(
    daft.from_pydict(data)
    .with_column(
        "b", daft.functions.count(daft.lit(1)).over(daft.Window().partition_by("a"))
    )
    .collect()
)


session = SparkSession.builder.getOrCreate()
rows = [
    {key: value[i] for key, value in data.items()}
    for i in range(len(data[next(iter(data.keys()))]))
]
df = session.createDataFrame(rows)
df = df.withColumn("b", F.count(F.lit(1)).over(W.partitionBy("a")))
print("pyspark")
df.show()

Expected behavior

Same as pyspark (note: duckdb also matches pyspark)

Component(s)

Built-in Functions

Additional context

Narwhals

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions