Building a Serverless Clickstream Analytics Pipeline on AWS
Inspiration
I wanted to build something beyond just a tutorial—something real, scalable, and meaningful. Clickstream analytics caught my attention because of how widely it's used in the industry. The idea of designing a fully serverless pipeline that could process and visualize real-time data excited me, and I saw this as an opportunity to think like an architect.
What it does
This project collects user interaction data from a web application, processes it in real time, and visualizes it in Amazon QuickSight. The backend is containerized and runs on AWS ECS Fargate, while the frontend is a static site hosted on S3 + CloudFront. The data flows through a custom API, Lambda, DynamoDB, and EventBridge, eventually landing in S3 in a Parquet format for efficient analysis.
How I built it
- Frontend: Hosted as a static site on S3 + CloudFront.
- Backend: An Express.js server containerized with Docker and deployed on ECS Fargate.
- Data Ingestion: Custom API triggers Lambda to store click events in DynamoDB.
- Processing & Transformation: EventBridge triggers a Lambda function to process data and convert it into Parquet format.
- Storage: Processed data is stored in S3 for further analysis.
- Visualization: Amazon QuickSight connects to S3 to generate analytics dashboards.
Challenges I ran into
- Defining the Scope: Keeping the project simple yet effective was tricky.
- Securing ALB + HTTPS: Took two days to fix SSL issues; Route 53, ACM, and ALB needed correct configurations.
- Optimizing Data Format: Raw JSON was inefficient and costly. Switching to Parquet optimized storage and querying.
- Automating Data Processing: Initially, I considered manual DynamoDB scans but later automated it with EventBridge + Lambda.
- IAM Headaches: QuickSight wasn’t working due to missing S3 permissions. Lesson learned—always check IAM first!
- Containerizing the Right Parts: I initially planned to containerize both frontend and backend, but rebuilding and pushing images for every frontend update was unnecessary. Hosting it via S3 + CloudFront simplified things.
Accomplishments that I'm proud of
- Built a fully serverless, scalable analytics pipeline.
- Overcame multiple debugging challenges to get everything working.
- Created an architecture diagram that effectively explains the system.
- Learned how to think like an architect while designing a real-world project.
What I learned
- Every problem has a solution. Just stay on it, and you'll figure it out.
- Docker is great, but not always necessary. Hosting the frontend on S3 + CloudFront was the simpler and more efficient choice.
- Small details matter. From IAM roles to API Gateway configurations, every small component plays a big role in making the system work.
- Building real projects forces you to think deeply. This wasn’t just about setting up AWS services—it was about solving real problems.
What's next for this project?
- Automating QuickSight Dashboards: Right now, dashboards are manually configured; automating them would be cool.
- Adding More Events: Expanding beyond simple button clicks to capture richer user interactions.
- Exploring Real-Time Streaming: Investigating Kinesis or Kafka for even faster data processing.
If you’ve read this far, I hope you found this interesting and insightful! Feel free to reach out if you have any questions or suggestions—I’d love to hear your thoughts.
Built With
- amazon-athena
- amazon-cloudfront-cdn
- amazon-dynamodb
- amazon-web-services
- aws-glue
- docker
- eventbridge
- express.js
- lambda
- quicksight
- react
Log in or sign up for Devpost to join the conversation.