Apache Arrow is 10 years old

178 points - yesterday at 1:13 PM

Comments

data_ders yesterday at 3:37 PM

if I could tell myself in 2015 who had just found the feather library and was using it to power my unhinged topic modeling for power point slides work, and explained what feather would become (arrow) and the impact it would have on the date ecosystem. I would have looked at 2026 me like he was a crazy person.

Yet today I feel it was 2016 dataders who is the crazy one lol

aynyc yesterday at 6:23 PM

What's the difference between feather and parquet in terms of usage? I get the design philosophy, but how would you use them differently?

pm90 yesterday at 5:16 PM

Its nice to see useful, impactful interchange formats getting the attention and resources they need, and ecosystems converging around them. Optimizing serialization/deserialization might seem like a "trivial" task at first, but when moving petabytes of data they quickly become the bottlenecks. With common interchange formats, the benefits of these optimizations are shared across stacks. Love to see it.

HoldOnAMinute yesterday at 9:37 PM

I read that entire page and I could not tell you what Apache Arrow is, or what it does.

aerzen yesterday at 6:24 PM

I like arrow for its type system. It's efficient, complete and does not have "infinite precision decimals". Considering Postgres's decimal encoding, using i256 as the backing type is so much saner approach.

mempko yesterday at 5:25 PM

We use Apache Arrow at my company and it's fantastic. The performance is so good. We have terabytes of time-series financial data and use arrow to store it and process it.

actionfromafar yesterday at 3:26 PM

I had to look up what Arrow actually does, and I might have to run some performance comparisons vs sqlite.

It's very neat for some types of data to have columns contiguous in memory.

deleted yesterday at 7:16 PM