Building a Serverless Clickstream Analytics Pipeline on AWS

Inspiration

I wanted to build something beyond just a tutorial—something real, scalable, and meaningful. Clickstream analytics caught my attention because of how widely it's used in the industry. The idea of designing a fully serverless pipeline that could process and visualize real-time data excited me, and I saw this as an opportunity to think like an architect.

What it does

This project collects user interaction data from a web application, processes it in real time, and visualizes it in Amazon QuickSight. The backend is containerized and runs on AWS ECS Fargate, while the frontend is a static site hosted on S3 + CloudFront. The data flows through a custom API, Lambda, DynamoDB, and EventBridge, eventually landing in S3 in a Parquet format for efficient analysis.

How I built it

Frontend: Hosted as a static site on S3 + CloudFront.
Backend: An Express.js server containerized with Docker and deployed on ECS Fargate.
Data Ingestion: Custom API triggers Lambda to store click events in DynamoDB.
Processing & Transformation: EventBridge triggers a Lambda function to process data and convert it into Parquet format.
Storage: Processed data is stored in S3 for further analysis.
Visualization: Amazon QuickSight connects to S3 to generate analytics dashboards.

Challenges I ran into

Defining the Scope: Keeping the project simple yet effective was tricky.
Securing ALB + HTTPS: Took two days to fix SSL issues; Route 53, ACM, and ALB needed correct configurations.
Optimizing Data Format: Raw JSON was inefficient and costly. Switching to Parquet optimized storage and querying.
Automating Data Processing: Initially, I considered manual DynamoDB scans but later automated it with EventBridge + Lambda.
IAM Headaches: QuickSight wasn’t working due to missing S3 permissions. Lesson learned—always check IAM first!
Containerizing the Right Parts: I initially planned to containerize both frontend and backend, but rebuilding and pushing images for every frontend update was unnecessary. Hosting it via S3 + CloudFront simplified things.

Accomplishments that I'm proud of

Built a fully serverless, scalable analytics pipeline.
Overcame multiple debugging challenges to get everything working.
Created an architecture diagram that effectively explains the system.
Learned how to think like an architect while designing a real-world project.

What I learned

Every problem has a solution. Just stay on it, and you'll figure it out.
Docker is great, but not always necessary. Hosting the frontend on S3 + CloudFront was the simpler and more efficient choice.
Small details matter. From IAM roles to API Gateway configurations, every small component plays a big role in making the system work.
Building real projects forces you to think deeply. This wasn’t just about setting up AWS services—it was about solving real problems.

What's next for this project?

Automating QuickSight Dashboards: Right now, dashboards are manually configured; automating them would be cool.
Adding More Events: Expanding beyond simple button clicks to capture richer user interactions.
Exploring Real-Time Streaming: Investigating Kinesis or Kafka for even faster data processing.

If you’ve read this far, I hope you found this interesting and insightful! Feel free to reach out if you have any questions or suggestions—I’d love to hear your thoughts.

Built With

amazon-athena
amazon-cloudfront-cdn
amazon-dynamodb
amazon-web-services
aws-glue
docker
eventbridge
express.js
lambda
quicksight
react

Updates

Rohith Gowtham G started this project — Mar 13, 2025 10:53 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.