distributed systems, real-time data stream mining, stream mining, data stream, big data stream,

View the Project on GitHub ambodi/sentinel

Sentinel is project written in Java to perform real-time stream mining on Twitter Public Stream using SAMOA and Apache Storm. Sentinel is a distributed system that aims to use new distributed algorithms. Currently Sentinel only performs real-time distributed system. See Tasks section for details on how to work with Sentinel.


Twitter Stream Instance

This component implements SAMOA's InstanceStreamClass which gets stream from StreamAPIReader and performs sketching, filtering and etc.

Twitter Stream API Reader

This components connects to Twitter Public Stream API and reads instances and keeps the instances in an adaptive sliding window.


Represents an MVC style model, set of core attributes, setters and getters.


Processors perform text normalization to Tweets such as removing emoticons, URLs and Twitter Specific Characters.


Sketching algorithms such as SpaceSavings keep a summary of the text in-memory so that real-time stream mining could become possible. Also, they enable online approaches to data stream mining which are more adaptive than hold-out approaches, e.g. batch analysis of stream data.

Feature Reducer

This components transforms tweet texts into an sparse feature vectors and only keeps frequent features in memory only.

Language Detector

This component uses the classification approach for detecting language of a tweet according to Language Detection Library for Java.


Sentinel is a module of a bigger project. In order to use Sentinel, you need to run it with Apache Storm and SAMOA. Read the information at


Clone SAMOA fork for Sentinel

git clone

Clone [Sentinel]

git clone

Put Sentinel under


Add file in the root of the project. More info at Twitter 4J's Documentation on Generic properties


mvn clean install

Local Cluster:

mvn package 

Apache Storm Cluster:

mvn -Pstorm package


Real-time Sentiment Analysis on Twitter Public Stream

Run via Bash

Using Vertical Hoeffding Tree as a distributed parallel classification algorithm, you can perform sentiment analysis on Twitter Public Stream with Prequential Evaluation Task.

To perform sentiment analysis on a sample of 100000 tweets in real-time with 4 parallel nodes in your local cluster, run

bin/samoa local target/SAMOA-Local-0.2.0-SNAPSHOT.jar "PrequentialEvaluation -d /tmp/dump.csv -i 1000000 -f 100000 -l (classifiers.trees.VerticalHoeffdingTree -p 4) -s"

Or if you run it in Apache Storm, run

bin/samoa storm target/SAMOA-Storm-0.2.0-SNAPSHOT.jar "PrequentialEvaluation -d /tmp/dump.csv -i 1000000 -f 100000 -l (classifiers.trees.VerticalHoeffdingTree -p 4) -s"

Configuration by Code

Put the following code under samoa-local(samoa-storm)/src/main/java/com/yahoo/labs/samoa/:

public static void main( String[] args ) {
    PrequentialEvaluation pe = new PrequentialEvaluation();
    pe.setFactory(new SimpleComponentFactory());

    pe.learnerOption.setValueViaCLIString("classifiers.trees.VerticalHoeffdingTree -p 1");


Run mvn -X exec:java

This is preferred if you are developing and want to make use of debug mode.


The use and distribution terms for this software are covered by the Apache License, Version 2.0 (