admin管理员组

文章数量:1025202

I wrote a DataFrame into a Delta table (e.g., demo_table) using the overwrite mode, which involves dropping the table beforehand. After the write operation was successful, I executed the OPTIMIZE command on the table. However, the OPTIMIZE operation took nearly an hour to complete. How can I improve this process?

Note : The table is in a partitioned format. Command : OPTIMIZE schema.demo_table ZORDER BY (custom_id,sales_date) Note : custom_id : Generated new columns , when we create final df final record count would be 3 million records. Not a wider table. Schema have basic data types . integer,string. there is no complex data types. Observation : when i use existing column in Zorder , it executed within 5 min.

I wrote a DataFrame into a Delta table (e.g., demo_table) using the overwrite mode, which involves dropping the table beforehand. After the write operation was successful, I executed the OPTIMIZE command on the table. However, the OPTIMIZE operation took nearly an hour to complete. How can I improve this process?

Note : The table is in a partitioned format. Command : OPTIMIZE schema.demo_table ZORDER BY (custom_id,sales_date) Note : custom_id : Generated new columns , when we create final df final record count would be 3 million records. Not a wider table. Schema have basic data types . integer,string. there is no complex data types. Observation : when i use existing column in Zorder , it executed within 5 min.

本文标签: apache sparkDatabricks OptimizeZorder command is taking too longStack Overflow