Skip to content

An extensible toolset for Spark performance benchmarking

License

Notifications You must be signed in to change notification settings

dos-group/benchspark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

benchspark

An extensible toolset for Spark performance benchmarking.

Currently available Spark jobs (including dataset generators):

Data Type Algorithm
Vector KMeans
Vector LinearRegression
Vector LogisticRegression
Tabular GroupByCount
Tabular Join
Tabular SelectWhereOrderBy
Text Grep
Text Sort
Text WordCount

Compilation

To compile the jobs to a jar file:

cd spark
sbt package

Getting Started

  1. Adjust run_scripts/submit_local_job to your local setup and execute it.
  2. Later you can extend the script to submit jobs to a cluster that is available to you, be that in a public cloud or an on-premise setup.

About

An extensible toolset for Spark performance benchmarking

Resources

License

Stars

Watchers

Forks

Packages

No packages published