The right way to use Spark and JDBC

A while ago I had to read data from a MySQL table, do a bit of manipulations on that data and store the results on the disk. The obvious choice was to use Spark, I was already using it for other stuff and it seemed super easy to implement. This is more or less what I had to do (I removed the part which does the manipulation for the sake of simplicity): Looks good, only it didn’t quite work. Either it was super slow or it totally crashed depends on the size of the table. Tuning Spark and the cluster properties helped a bit, but it didn’t solve the problems. Since I was using AWS EMR, it made sense to give Sqoop a try since it is a part of the applications supported on EMR. Sqoop performed so much better almost instantly, all you needed to do is to set the number of mappers according to the size of the data and it was working perfectly. Since both Spark and Sqoop are a based on Hadoop map-reduce framework, it was clear that Spark can work at least as good as Sqoop, I only needed to find out how to do Continue reading The right way to use Spark and JDBC

Common Unix Commands

Files/Directories cp source destination Copies the source file to destination. ls Listsd the content of the current directory. ls -l – shows also extra info. ls -R – shows content of directory and also subdirectories. ls -m – displays the content in comma sepparated. mv source destination Moves source to destination (if a directory, else it renames source to destination) cd directory Changes the current directory. cd ~ – will change the path to the root of the user. cd / – changes the path to the root of the filesystem. cd .. – changes the path to the directory above. pwd Displays the current directory mkdir directory Creats a new directory. rm -R directory Deletes a directory including subdirectories. The user will be promoted for each read-only file. rm -Rf directory Deletes a directory including subdirectories. The user will NOT be promoted for each read-only file. rm directory Deletes a file. chmod entity+mode+permission file/directory Changes the mode of the file in terms od permissions. Entiry= u (for user), g (for group), o (for other), a (for all). Mode = + (for adding permissions) or – (for removing permissions). Permissions= r (for read), w (for write), x (for execute). Example: Continue reading Common Unix Commands