Quick Start

What is Apache Toree

Apache Toree has one main goal: provide the foundation for interactive applications to connect and use Apache Spark.

The project intends to provide applications with the ability to send both packaged jars and code snippets. As it implements the latest Jupyter message protocol, Apache Toree can easily plug into the Jupyter ecosystem for quick, interactive data exploration.

Installing as kernel in Jupyter

This requires you to have a distribution of Apache Spark downloaded to the system where Apache Toree will run. The following commands will install Apache Toree.

pip install --upgrade toree
jupyter toree install --spark_home=/usr/local/bin/apache-spark/

Your Hello World example

One of the most common ways to use Apache Toree is for interactive data exploration in a Jupyter Notebook. You will first need to install the notebook and get the notebook server running:

pip install notebook
jupyter notebook

The following clip shows a simple notebook running Scala code to print Hello, World!. Each of the code cells can be run by pressing Shift-Enter on your keyboard.

Drawing

A key component to Apache Toree is that is will automatically create a SparkContext binding for you. This can be accessed through the variable sc. The following clip shows code accessing the SparkContext and returning a value.

Drawing

Where to try Apache Toree?