-
Notifications
You must be signed in to change notification settings - Fork 354
Open
Labels
Description
Describe the bug
The code below outputs
daft
╭───────┬───────┬────────╮
│ a ┆ i ┆ b │
│ --- ┆ --- ┆ --- │
│ Int64 ┆ Int64 ┆ UInt64 │
╞═══════╪═══════╪════════╡
│ 1 ┆ 0 ┆ 1 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1 ┆ 1 ┆ 1 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ 2 ┆ 1 │
╰───────┴───────┴────────╯
(Showing first 3 of 3 rows)
pyspark
+---+---+---+
| a| i| b|
+---+---+---+
| 1| 0| 2|
| 1| 1| 2|
| 2| 2| 1|
+---+---+---+To Reproduce
import daft
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
from pyspark.sql.window import Window as W
from datetime import datetime
data = {"a": [1, 1, 2], "i": [0, 1, 2]}
print("daft")
print(
daft.from_pydict(data)
.with_column(
"b", daft.functions.count(daft.lit(1)).over(daft.Window().partition_by("a"))
)
.collect()
)
session = SparkSession.builder.getOrCreate()
rows = [
{key: value[i] for key, value in data.items()}
for i in range(len(data[next(iter(data.keys()))]))
]
df = session.createDataFrame(rows)
df = df.withColumn("b", F.count(F.lit(1)).over(W.partitionBy("a")))
print("pyspark")
df.show()Expected behavior
Same as pyspark (note: duckdb also matches pyspark)
Component(s)
Built-in Functions
Additional context
Narwhals