Category 'development' | Paolo Bietolini

development

2026.04.18 - A NaN where a Long should be
A PySpark TypeError that looked like a schema bug was actually three steps upstream. Pandas can't represent what BigQuery hands it, (nested structs, nullable integers, timezone-aware timestamps, arbitrary-precision numerics) so every downstream line is a patch against a loss that happened at the moment pandas entered the pipeline. A walk to a twelve-line Arrow replacement, and the rule it points at.
2026.04.18 - Arrow over the wire
Four lines of Spark config to read BigQuery. Underneath: a gRPC read session, one stream per Spark partition, Arrow end-to-end, server-side column and filter pushdown, and dynamic rebalancing when executors finish early. A walkthrough of what those lines actually trigger — and why the middle format isn't replaced with something better, it's removed.
2024.10.17 - Problemi con il form listener per Elementor