admin管理员组

文章数量:1023794

What is the preferred way to assign/add a new column to a polars dataframe in .select() or .with_columns()?
Are there any differences between the below column assignments using .alias() or the = sign?

import polars as pl

df = pl.DataFrame({"A": [1, 2, 3],
                   "B": [1, 1, 7]})

df = df.with_columns(pl.col("A").sum().alias("a_sum"), 
                     another_sum=pl.col("A").sum()
                     )

I am not sure which one to use.

What is the preferred way to assign/add a new column to a polars dataframe in .select() or .with_columns()?
Are there any differences between the below column assignments using .alias() or the = sign?

import polars as pl

df = pl.DataFrame({"A": [1, 2, 3],
                   "B": [1, 1, 7]})

df = df.with_columns(pl.col("A").sum().alias("a_sum"), 
                     another_sum=pl.col("A").sum()
                     )

I am not sure which one to use.

Share Improve this question edited Nov 18, 2024 at 17:38 mouwsy asked Nov 18, 2024 at 17:34 mouwsymouwsy 1,99316 silver badges27 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 6

The advantage of alias is that it allows you to specify a column name that wouldn't be a valid Python identifier. For example, you could use "a sum!". This can also be achieved by creating a dictionary and using ** to unpack it, passing the items as keyword arguments.

Assignment with = cannot work in this way, as it requires a valid identifier (e.g., another_sum).

df = df.with_columns(pl.col("A").sum().alias("a sum!"), 
                     another_sum=pl.col("A").sum(),
                     **{":) \u2014 also a sum": pl.col("A").sum()}
                     )

Output:

shape: (3, 5)
┌─────┬─────┬────────┬─────────────┬─────────────────┐
│ A   ┆ B   ┆ a sum! ┆ another_sum ┆ :) — also a sum │
│ --- ┆ --- ┆ ---    ┆ ---         ┆ ---             │
│ i64 ┆ i64 ┆ i64    ┆ i64         ┆ i64             │
╞═════╪═════╪════════╪═════════════╪═════════════════╡
│ 1   ┆ 1   ┆ 6      ┆ 6           ┆ 6               │
│ 2   ┆ 1   ┆ 6      ┆ 6           ┆ 6               │
│ 3   ┆ 7   ┆ 6      ┆ 6           ┆ 6               │
└─────┴─────┴────────┴─────────────┴─────────────────┘

The latter just calls alias for you under the hood:

https://github/pola-rs/polars/blob/a0ec630b25aa847699f9c2d7389fee84749a6491/py-polars/polars/_utils/parse/expr.py#L136-L140

So, there's no advantage to either

If you find = more readable, use that

What is the preferred way to assign/add a new column to a polars dataframe in .select() or .with_columns()?
Are there any differences between the below column assignments using .alias() or the = sign?

import polars as pl

df = pl.DataFrame({"A": [1, 2, 3],
                   "B": [1, 1, 7]})

df = df.with_columns(pl.col("A").sum().alias("a_sum"), 
                     another_sum=pl.col("A").sum()
                     )

I am not sure which one to use.

What is the preferred way to assign/add a new column to a polars dataframe in .select() or .with_columns()?
Are there any differences between the below column assignments using .alias() or the = sign?

import polars as pl

df = pl.DataFrame({"A": [1, 2, 3],
                   "B": [1, 1, 7]})

df = df.with_columns(pl.col("A").sum().alias("a_sum"), 
                     another_sum=pl.col("A").sum()
                     )

I am not sure which one to use.

Share Improve this question edited Nov 18, 2024 at 17:38 mouwsy asked Nov 18, 2024 at 17:34 mouwsymouwsy 1,99316 silver badges27 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 6

The advantage of alias is that it allows you to specify a column name that wouldn't be a valid Python identifier. For example, you could use "a sum!". This can also be achieved by creating a dictionary and using ** to unpack it, passing the items as keyword arguments.

Assignment with = cannot work in this way, as it requires a valid identifier (e.g., another_sum).

df = df.with_columns(pl.col("A").sum().alias("a sum!"), 
                     another_sum=pl.col("A").sum(),
                     **{":) \u2014 also a sum": pl.col("A").sum()}
                     )

Output:

shape: (3, 5)
┌─────┬─────┬────────┬─────────────┬─────────────────┐
│ A   ┆ B   ┆ a sum! ┆ another_sum ┆ :) — also a sum │
│ --- ┆ --- ┆ ---    ┆ ---         ┆ ---             │
│ i64 ┆ i64 ┆ i64    ┆ i64         ┆ i64             │
╞═════╪═════╪════════╪═════════════╪═════════════════╡
│ 1   ┆ 1   ┆ 6      ┆ 6           ┆ 6               │
│ 2   ┆ 1   ┆ 6      ┆ 6           ┆ 6               │
│ 3   ┆ 7   ┆ 6      ┆ 6           ┆ 6               │
└─────┴─────┴────────┴─────────────┴─────────────────┘

The latter just calls alias for you under the hood:

https://github/pola-rs/polars/blob/a0ec630b25aa847699f9c2d7389fee84749a6491/py-polars/polars/_utils/parse/expr.py#L136-L140

So, there's no advantage to either

If you find = more readable, use that

本文标签: pythonColumn assignment with alias() or Stack Overflow