streaming-cluster-example-application

Inspiration

Inspiration for this project mainly is the FLINKFORWARD conference we attended this year. We are very interrested in streaming applications and distributed stream processing pipelines.

What it does

This project is more of an example what one could do with such a streaming pipeline. Right now all it does is count the processed events per a given window size. Because we also were interested in learning GO, we build a Go client which connects to the twitter streaming API and fetches tweets which mention a given topic. Those tweets are published to a Kafka cluster and then consumed by the simple flinkjob which does the processing.

How We built it

Go client to fetch tweets and publish them to kafka
Kafka Cluster on AWS with 3 Nodes
Flink master on AWS (1 Node, could scale for bigger processing if needed)
Flink Job (Can run on local machine or flink node)

Challenges We ran into

Practically everything regarding the setup of the cluster. AWS was a pain at first. The Kafka cluster didnt do what we wanted. And sadly we also didnt manage to add ElasticSearch with Kibana to show a little dashboard of the Streaming results :(