Skip to content

Launch a benchmarking cluster

Dan Crankshaw edited this page Nov 25, 2013 · 1 revision

Steps to launch a GraphX cluster:

This creates a 16 node cluster of m2.4xlarge slaves for benchmarking

Create a set of AWS security credentials. See this page for instructions. Then put your keypair into a file (e.g. ~/aws_creds.sh).

On your local machine:

# LOCAL REPO
# Launch the cluster
cd $GRAPHX_HOME
source $PATH_TO_KEYPAIR_FILE
ec2/spark-ec2 -s 16 -k $KEYPAIR_NAME -i $PATH_TO_SSH_CREDENTIALS -t m2.4xlarge -z us-east-1d --spot-price=1 launch benchmarking
ssh -i $PATH_TO_SSH_CREDENTIALS root@$MASTER_IP_ADDRESS

Now ssh'ed into the master:

# NOW ON MASTER
# Setup cluster
wget https://snap.stanford.edu/data/soc-LiveJournal1.txt.gz
gunzip soc-LiveJournal1.txt.gz
# put data into hdfs
~/ephemeral-hdfs/bin/hadoop dfs -copyFromLocal soc-LiveJournal1.txt /
# git should be installed, but in case it's not or you need other software
yum install git
git clone https://github.com/amplab/graphx.git
cd graphx
sbt/sbt assembly
# copy configuration files from spark
cp ~/spark/conf/core-site.xml ~/spark/conf/slaves ~/spark/conf/spark-env.sh ~/graphx/conf/ 
~/spark/bin/stop-all.sh
~/spark-ec2/copy-dir ~/graphx
~/graphx/bin/start-all.sh

# Example running Analytics
source ~/spark-ec2/ec2-variables.sh
# Run the first time, should finish fine
~/graphx/run-example org.apache.spark.graph.Analytics spark://$MASTERS:7077 pagerank hdfs://$MASTERS:9000/soc-LiveJournal1.txt --numIter=20 --numEPart=128

# Example restarting the cluster
~/graphx/bin/stop-all.sh
~/graphx/bin/start-all.sh

To shut down the cluster, close the ssh session and then back on your local machine:

cd $GRAPHX_HOME
ec2/spark-ec2 destroy benchmarking

I always double check that my nodes actually got shut down from the AWS web console as well.

Clone this wiki locally