Finding data on the data lake can sometimes be a challenge. At my current workplace (ZipRecruiter) we have hundreds of tables on the data lake and it’s growing each day. We store…
The right way to use Spark and JDBC
A while ago I had to read data from a MySQL table, do a bit of manipulations on that data and store the results on the disk.The obvious choice was to use…
How to properly collect AWS EMR metrics?
Working with AWS EMR has a lot of benefits. But when it comes to metrics, AWS currently does not supply a proper solution for collecting cluster metrics from EMRs. Well, there is…
Best code convention syndrome
Developers often tend to think that one coding convention is better than another in terms of readability. Some people think that adding a break before the curly braces is more coherent. Some…