close
Skip to content

Reading empty DataPageV2 fails with snappy: corrupt input (empty) #7388

@EnricoMi

Description

@EnricoMi

Describe the bug
Reading a Parquet file that contains an empty DataPage v2 fails with snappy: corrupt input (empty).
Such a page occurs when all values are null.

To Reproduce
Writing a Spark dataset that contains only null values in one column using v2 Parquet writer:

./spark-3.5.5-bin-hadoop3/bin/spark-shell --conf spark.hadoop.parquet.writer.version="v2"
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.5.5
      /_/
         
Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 11.0.26)
Type in expressions to have them evaluated.
Type :help for more information.

scala> Seq(Option.empty[Float]).toDS.write.parquet("parquet-v2-example.parquet")

Expected behavior
The Parquet file should be read.

Additional context
The issue is identical to this Apache Arrow issue: apache/arrow#22459
The fix is identical to Apache Arrow fix: apache/arrow#45252

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugparquetChanges to the parquet crate

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions