Data Lake – aviyehuda.com

Data Engineering: Strategies for data retrieval on multi-dimensional data

Posted on 20/11/2023

You’ve likely heard about the benefits of partitioning data by a single dimension to boost retrieval performance. It’s a common practice in relational databases, NoSQL databases, and, notably, data lakes. For example,…

Spark and Small Files

Posted on 12/03/2022

In my previous post I have showed this short code example: And I asked what may be the problem with that code, assuming that the input ( my_website_visits ) is very big…

Quick tip: Easily find data on the data lake when using AWS Glue Catalog

Posted on 15/01/2021

Finding data on the data lake can sometimes be a challenge. At my current workplace (ZipRecruiter) we have hundreds of tables on the data lake and it’s growing each day. We store…