Development Setup - aseldawy/bdtutorials GitHub Wiki
This tutorial explains how to set up the development environment for big data. This involves installing Oracle JDK 8, Apache Maven, and IntelliJ. Check a video walkthrough for Windows and Mac.
Oracle JDK 8
The first step is to install Oracle JDK 8. While there are newer version of JDK, we will use Oracle JDK 8 since it best supports most big data systems such as Hadoop and Spark. First, go to Oracle JDK download page and scroll down to the right operating system and click download. You will need to create a free Oracle account if you don't have one and log in to access the download. Once the download completes, you can follow the installation wizard and accept all the default options.
- Windows: Note where JDK is downloaded since we will need this shortly.
At this point, the JDK has been successfully installed, but we're not quite done yet. We need to set up some environment variables to make the JDK accessible to the applications that we are going to run.
Set JAVA_HOME
The JAVA_HOME environment variable should point to the directory in which the JDK is installed (not the JRE).
- Windows: Press the windows button, type "Environment variable" in the search, and choose "Edit the environment variables for your account". Click "New" and create the environment variable.
- Ubuntu: Edit the file
~/.profileand add the lineexport JAVA_HOME=...and replace the...with the absolute path of the JDK download folder. - MacOS: Edit the file
~/.zprofileand add the lineexport JAVA_HOME=...and replace the...with the absolute path of the JDK download folder.
Edit PATH
For Windows users only, JDK might not be added to your executable path by default and you need to add it. Edit the PATH environment variable as shown above and add a new line with the value %JAVA_HOME%/bin.
After editing the environment variables, you will need to open a new terminal for the changes to take effect.
Apache Maven
The next step is to install Apache Maven which is a widely used project management tool for Java. Go to [https://maven.apache.org] and choose download. Choose the binary package of the latest version. Once the download is complete, you just need to extract it to any folder. I like to create an Applications folder under my home directory to put all the application binaries. Like we did with JDK, we will add the bin directory of Maven to the PATH environment variable to make it accessible from command line.
To add Maven to your executable path, you will need to add the following command in your ~/.profile (Ubuntu) or ~/.zprofile (MacOS)
export PATH=$PATH:$HOME/Applications/apache-maven-3.6.3/bin
On Windows, you will need to use the graphical interface as explained earlier.
With JDK and Maven installed, you can now create your first project from command line. For that, I like to create a Workspace directory under my home directory to place all projects. Open a command window in you workspace directory and run this Maven command.
mvn archetype:generate -DgroupId=edu.ucr.cs.cs226.ucrnetid -DartifactId=testproject -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
Let's look closely at this command. archetype:generate means that we are generating a new project. groupID is like your personal homepage where you place all your projects and it follows a reversed domain name which is a standard Java naming convention for packages. Make sure to replace ucrnetid with your actual UCRNetID to ensure that you have a unique groupId. The artifactId is the name of this project. arhchetypeArtifactId is the template project that you will generate. The quickstart template contains a simple Hello World! project. Finally, the interactiveMode=false means that it will not prompt for any information and will take them from the command line. Once you hit enter, Maven will download all the required packages and code template and create a project for you. This download happens only once and is cached in your machine so that next time will run faster.
Now let's take a look at the generated project. The pom.xml file contains the project configuration. The src directory contains all your source code. Inside src, there are two subdirectories one for the main code and the other for test code if you use unit testing. Under main, you will find a java directory which includes all java source code. There, you will find a folder structure that follows you groupId and the generated main class which contains a simple Hello World! class.
To compile and run the code, go back to the command line window and type mvn package. This will compile and test your code and then will package your code into a JAR file. It will take some time the first time you run it to download all the required packages but these are again cached on your device to run faster the next time. Once successful, you can run the generated JAR file using the following command.
java -cp asdfasd.jar edu.ucr.cs.bdlab.cs226.ucrnetid.App
Make sure to replace ucrenetid with your actual UCR Net ID.
IntelliJ
We will not work from command line all the time. So, this part shows you how to install IntelliJ which is an IDE for Java. Download the free community version as a compressed package. Once it is downloaded, extract it into your Applications directory. Run bin/idea64 which will guide you through the setup process for the first time. There are many plugins that can be installed but we will go with the minimum that we will need throughout this course to support git, Maven, junit testing, and Scala. Once IntelliJ starts, you can import the Maven project that we created earlier. IntelliJ is well-integrated with Maven and will automatically figure out the project structure. Now, navigate to the App class. If you see an error message at the top, click "setup SDK" and add the JDK that you installed. It might take some time the first time you add the SDK but this is a one-time job. Now, you will see a green arrow that allows you to run the main class.
Congratulations, you now have the development setup ready.