pyspark - Error converting spark dataframe to pandas: TypeError: Casting to unit-less dtype 'datetime64' is not -369IT编程

admin管理员组
文章数量:1022989

I'll create a demo dataframe to recreate the error that I see in databricks.

from pyspark.sql.types import StructType, StructField, TimestampType, StringType
from datetime import datetime

# Define the schema
schema = StructType([
    StructField("session_ts", TimestampType(), True),
    StructField("analysis_ts", TimestampType(), True)
])

# Define the data with datetime objects
data = [
    (datetime(2023, 9, 15, 17, 30, 41), datetime(2023, 9, 15, 17, 47, 3)),
    (datetime(2023, 10, 24, 18, 23, 37), datetime(2023, 10, 24, 18, 25, 16)),
    (datetime(2024, 1, 15, 6, 38, 52), datetime(2024, 1, 15, 6, 48, 15)),
    (datetime(2024, 2, 21, 13, 16, 37), datetime(2024, 2, 21, 13, 22, 35)),
    (datetime(2023, 10, 18, 17, 52, 28), datetime(2023, 10, 19, 17, 11, 3))
]

# Create a DataFrame
df = spark.createDataFrame(data, schema=schema)

When I try to convert the pyspark dataframe to pandas I get the error: TypeError: Casting to unit-less dtype 'datetime64' is not supported. Pass e.g. 'datetime64[ns]' instead.

df.toPandas().head()

Casting the fields as TimestampType did not resolve the error.

df = df.withColumn("session_ts", df["session_ts"].cast(TimestampType()))
df = df.withColumn("analysis_ts", df["analysis_ts"].cast(TimestampType()))
df.toPandas()

I was only able to proceed by casting as string, which seems an uneccessary workaround.

df = df.withColumn("session_ts", df["session_ts"].cast(StringType()))
df = df.withColumn("analysis_ts", df["analysis_ts"].cast(StringType()))
df.toPandas()

I'll create a demo dataframe to recreate the error that I see in databricks.

from pyspark.sql.types import StructType, StructField, TimestampType, StringType
from datetime import datetime

# Define the schema
schema = StructType([
    StructField("session_ts", TimestampType(), True),
    StructField("analysis_ts", TimestampType(), True)
])

# Define the data with datetime objects
data = [
    (datetime(2023, 9, 15, 17, 30, 41), datetime(2023, 9, 15, 17, 47, 3)),
    (datetime(2023, 10, 24, 18, 23, 37), datetime(2023, 10, 24, 18, 25, 16)),
    (datetime(2024, 1, 15, 6, 38, 52), datetime(2024, 1, 15, 6, 48, 15)),
    (datetime(2024, 2, 21, 13, 16, 37), datetime(2024, 2, 21, 13, 22, 35)),
    (datetime(2023, 10, 18, 17, 52, 28), datetime(2023, 10, 19, 17, 11, 3))
]

# Create a DataFrame
df = spark.createDataFrame(data, schema=schema)

When I try to convert the pyspark dataframe to pandas I get the error: TypeError: Casting to unit-less dtype 'datetime64' is not supported. Pass e.g. 'datetime64[ns]' instead.

df.toPandas().head()

Casting the fields as TimestampType did not resolve the error.

df = df.withColumn("session_ts", df["session_ts"].cast(TimestampType()))
df = df.withColumn("analysis_ts", df["analysis_ts"].cast(TimestampType()))
df.toPandas()

I was only able to proceed by casting as string, which seems an uneccessary workaround.

df = df.withColumn("session_ts", df["session_ts"].cast(StringType()))
df = df.withColumn("analysis_ts", df["analysis_ts"].cast(StringType()))
df.toPandas()

Share Improve this question asked Nov 18, 2024 at 20:51 Joe 3,8164 gold badges23 silver badges48 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

1) Ensure datetime64[ns] During Conversion

import pyspark.sql.functions as F

Explicitly cast timestamps to ensure compatibility

df = df.withColumn("session_ts", F.col("session_ts").cast("timestamp")) df = df.withColumn("analysis_ts", F.col("analysis_ts").cast("timestamp"))

Convert to pandas

pdf = df.toPandas() print(pdf.head())

2) Disable PyArrow for Conversion (Fallback to Legacy Conversion)

Disable PyArrow during the conversion

spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false")

Convert to pandas

pdf = df.toPandas() print(pdf.head())

I'll create a demo dataframe to recreate the error that I see in databricks.

from pyspark.sql.types import StructType, StructField, TimestampType, StringType
from datetime import datetime

# Define the schema
schema = StructType([
    StructField("session_ts", TimestampType(), True),
    StructField("analysis_ts", TimestampType(), True)
])

# Define the data with datetime objects
data = [
    (datetime(2023, 9, 15, 17, 30, 41), datetime(2023, 9, 15, 17, 47, 3)),
    (datetime(2023, 10, 24, 18, 23, 37), datetime(2023, 10, 24, 18, 25, 16)),
    (datetime(2024, 1, 15, 6, 38, 52), datetime(2024, 1, 15, 6, 48, 15)),
    (datetime(2024, 2, 21, 13, 16, 37), datetime(2024, 2, 21, 13, 22, 35)),
    (datetime(2023, 10, 18, 17, 52, 28), datetime(2023, 10, 19, 17, 11, 3))
]

# Create a DataFrame
df = spark.createDataFrame(data, schema=schema)

When I try to convert the pyspark dataframe to pandas I get the error: TypeError: Casting to unit-less dtype 'datetime64' is not supported. Pass e.g. 'datetime64[ns]' instead.

df.toPandas().head()

Casting the fields as TimestampType did not resolve the error.

df = df.withColumn("session_ts", df["session_ts"].cast(TimestampType()))
df = df.withColumn("analysis_ts", df["analysis_ts"].cast(TimestampType()))
df.toPandas()

I was only able to proceed by casting as string, which seems an uneccessary workaround.

df = df.withColumn("session_ts", df["session_ts"].cast(StringType()))
df = df.withColumn("analysis_ts", df["analysis_ts"].cast(StringType()))
df.toPandas()

I'll create a demo dataframe to recreate the error that I see in databricks.

from pyspark.sql.types import StructType, StructField, TimestampType, StringType
from datetime import datetime

# Define the schema
schema = StructType([
    StructField("session_ts", TimestampType(), True),
    StructField("analysis_ts", TimestampType(), True)
])

# Define the data with datetime objects
data = [
    (datetime(2023, 9, 15, 17, 30, 41), datetime(2023, 9, 15, 17, 47, 3)),
    (datetime(2023, 10, 24, 18, 23, 37), datetime(2023, 10, 24, 18, 25, 16)),
    (datetime(2024, 1, 15, 6, 38, 52), datetime(2024, 1, 15, 6, 48, 15)),
    (datetime(2024, 2, 21, 13, 16, 37), datetime(2024, 2, 21, 13, 22, 35)),
    (datetime(2023, 10, 18, 17, 52, 28), datetime(2023, 10, 19, 17, 11, 3))
]

# Create a DataFrame
df = spark.createDataFrame(data, schema=schema)

When I try to convert the pyspark dataframe to pandas I get the error: TypeError: Casting to unit-less dtype 'datetime64' is not supported. Pass e.g. 'datetime64[ns]' instead.

df.toPandas().head()

Casting the fields as TimestampType did not resolve the error.

df = df.withColumn("session_ts", df["session_ts"].cast(TimestampType()))
df = df.withColumn("analysis_ts", df["analysis_ts"].cast(TimestampType()))
df.toPandas()

I was only able to proceed by casting as string, which seems an uneccessary workaround.

df = df.withColumn("session_ts", df["session_ts"].cast(StringType()))
df = df.withColumn("analysis_ts", df["analysis_ts"].cast(StringType()))
df.toPandas()

Share Improve this question asked Nov 18, 2024 at 20:51 Joe 3,8164 gold badges23 silver badges48 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

1) Ensure datetime64[ns] During Conversion

import pyspark.sql.functions as F

Explicitly cast timestamps to ensure compatibility

df = df.withColumn("session_ts", F.col("session_ts").cast("timestamp")) df = df.withColumn("analysis_ts", F.col("analysis_ts").cast("timestamp"))

Convert to pandas

pdf = df.toPandas() print(pdf.head())

2) Disable PyArrow for Conversion (Fallback to Legacy Conversion)

Disable PyArrow during the conversion

spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "false")

Convert to pandas

pdf = df.toPandas() print(pdf.head())

本文标签：

版权声明：本文标题：pyspark - Error converting spark dataframe to pandas: TypeError: Casting to unit-less dtype 'datetime64' is not 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://it.en369.cn/questions/1745595110a2158119.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

369IT编程

pyspark - Error converting spark dataframe to pandas: TypeError: Casting to unit-less dtype &#39;datetime64&#39; is not

1 Answer 1

1) Ensure datetime64[ns] During Conversion

Explicitly cast timestamps to ensure compatibility

Convert to pandas

2) Disable PyArrow for Conversion (Fallback to Legacy Conversion)

Disable PyArrow during the conversion

Convert to pandas

1 Answer 1

1) Ensure datetime64[ns] During Conversion

Explicitly cast timestamps to ensure compatibility

Convert to pandas

2) Disable PyArrow for Conversion (Fallback to Legacy Conversion)

Disable PyArrow during the conversion

Convert to pandas

更多相关文章

facebook - Using meta verification for a certificate using the on premises API required? - Stack Overflow

python - Column assignment with .alias() or = - Stack Overflow

javascript - What is the difference between object assignment &#39;=&#39; and Object.create() - Stack Overflow

jquery - Waiting for another page to load in JavaScript - Stack Overflow

javascript - Saverestore selection on contentEditable AFTER modifying innerHTML - Stack Overflow

javascript - Pass context to options on React ChartJS 2 - Stack Overflow

C Syntax &quot;Semicolon&quot; - Stack Overflow

javascript - How to get download able url of live stream videos on youtube - Stack Overflow

javascript - ESP8266 serving HTML+js - Stack Overflow

tinymce - Strange wp-admin problem for all usersadminstrators except the original one?

javascript - ?ver= at the end of include - is there a technical effect? - Stack Overflow

javascript - Strapi CMS: Add calculated field - Stack Overflow

javascript - Bootstrap tooltip usage - Stack Overflow

tree - Algorithm to Minimize Time to Complete All Endings in a Multi-Branch Game with Limited Save Points - Stack Overflow

filepicker - Flutter web - photo picker integration with google photo - Stack Overflow

Banno External Application returning 401 error - Stack Overflow

javascript - How to make Fancybox work with carousel plugin? - Stack Overflow

javascript - Get value of an array that starts with a particular string - Stack Overflow

c - Meson doesn&#39;t link “redundant” shared library - Stack Overflow

javascript - My vuejs animationtransition is lagging on path change. - Stack Overflow

发表评论

推荐文章

No swapTransaction found in response. Error: No swapTransaction found in response. Solana - Stack Overflow

javascript - Regex to allow certain special characters - escape issue - Stack Overflow

HTML Opening-Comment is valid JavaScript? - Stack Overflow

javascript - React context not updating value - Stack Overflow

javascript - How to listen to Metamask&#39;s web3&#39;s &quot;confirm&quot;&quot;cancel&quot; event of a

热门文章

javascript - CreateContainerConfigError while deploying on kubernetes - Stack Overflow

html - getting #document (about: blank) when using embed tag with svg - Stack Overflow

Protected email with Javascript - how the code works? - Stack Overflow

javascript - React JS range slider - using an array for the value? - Stack Overflow

javascript - How to make an SVG viewbox wide as its container? - Stack Overflow

Magento tier prices - class declaration for tier price in BUY x for Y - javascript - Stack Overflow

Make checkbox value checked with jQuery or JavaScript - Stack Overflow

Find the first repeated number in a Javascript array - Stack Overflow

javascript - Formatting timezone of Google Charts datetime axis labels - Stack Overflow

javascript - When you click on the button, display a message - Stack Overflow

最新文章

windows设置断电重启开机后自动输入锁屏密码登录

Windows系统设置开机默认开启数字小键盘

Windows11 开机自动同步时间（开机时间不更新问题）

windows配置开机自启动软件或脚本

【Redis】Windows设置Redis为开机自启动

程序员刚毕业，先去大厂镀金还是先去小厂攒经验？

万象2008清空boss账户密码

【Tools】GitBook简明教程

oracle exadata celldisk 闪存盘受损导致性能下降

SDUT 2138 图结构练习——BFSDFS——判断可达性

javascript - How to handle back button when browser is not firing popstate event immediately - Stack Overflow

MAUI Blazor not respecting device language on Android - Stack Overflow

javascript - How use ng-repeat for plain text AngularJS - Stack Overflow

javascript - My vuejs animationtransition is lagging on path change. - Stack Overflow

How can I included CSS to a page or a shortcode?

pyspark - Error converting spark dataframe to pandas: TypeError: Casting to unit-less dtype 'datetime64' is not

javascript - What is the difference between object assignment '=' and Object.create() - Stack Overflow

C Syntax "Semicolon" - Stack Overflow

c - Meson doesn't link “redundant” shared library - Stack Overflow

javascript - How to listen to Metamask's web3's "confirm""cancel" event of a