Inspiration for this project mainly is the FLINKFORWARD conference we attended this year. We are very interrested in streaming applications and distributed stream processing pipelines.
What it does
This project is more of an example what one could do with such a streaming pipeline. Right now all it does is count the processed events per a given window size. Because we also were interested in learning GO, we build a Go client which connects to the twitter streaming API and fetches tweets which mention a given topic. Those tweets are published to a Kafka cluster and then consumed by the simple flinkjob which does the processing.
How We built it
- Go client to fetch tweets and publish them to kafka
- Kafka Cluster on AWS with 3 Nodes
- Flink master on AWS (1 Node, could scale for bigger processing if needed)
- Flink Job (Can run on local machine or flink node)
Challenges We ran into
Practically everything regarding the setup of the cluster. AWS was a pain at first. The Kafka cluster didnt do what we wanted. And sadly we also didnt manage to add ElasticSearch with Kibana to show a little dashboard of the Streaming results :(
Accomplishments that We are proud of
Even though the programming part was quite small, we are proud to have a small cluster for simple streaming apllications.
What We learned
A little Go, getting started with AWS, Kafka, Flink
What's next for streaming-cluster-example-application
One could build more flink processing jobs.. but the AWS Credit will run out quite quickly so the cluster will be gone then.