Building a Serverless Clickstream Analytics Pipeline on AWS

Inspiration

I wanted to build something beyond just a tutorial—something real, scalable, and meaningful. Clickstream analytics caught my attention because of how widely it's used in the industry. The idea of designing a fully serverless pipeline that could process and visualize real-time data excited me, and I saw this as an opportunity to think like an architect.

What it does

This project collects user interaction data from a web application, processes it in real time, and visualizes it in Amazon QuickSight. The backend is containerized and runs on AWS ECS Fargate, while the frontend is a static site hosted on S3 + CloudFront. The data flows through a custom API, Lambda, DynamoDB, and EventBridge, eventually landing in S3 in a Parquet format for efficient analysis.

How I built it

  • Frontend: Hosted as a static site on S3 + CloudFront.
  • Backend: An Express.js server containerized with Docker and deployed on ECS Fargate.
  • Data Ingestion: Custom API triggers Lambda to store click events in DynamoDB.
  • Processing & Transformation: EventBridge triggers a Lambda function to process data and convert it into Parquet format.
  • Storage: Processed data is stored in S3 for further analysis.
  • Visualization: Amazon QuickSight connects to S3 to generate analytics dashboards.

Challenges I ran into

  • Defining the Scope: Keeping the project simple yet effective was tricky.
  • Securing ALB + HTTPS: Took two days to fix SSL issues; Route 53, ACM, and ALB needed correct configurations.
  • Optimizing Data Format: Raw JSON was inefficient and costly. Switching to Parquet optimized storage and querying.
  • Automating Data Processing: Initially, I considered manual DynamoDB scans but later automated it with EventBridge + Lambda.
  • IAM Headaches: QuickSight wasn’t working due to missing S3 permissions. Lesson learned—always check IAM first!
  • Containerizing the Right Parts: I initially planned to containerize both frontend and backend, but rebuilding and pushing images for every frontend update was unnecessary. Hosting it via S3 + CloudFront simplified things.

Accomplishments that I'm proud of

  • Built a fully serverless, scalable analytics pipeline.
  • Overcame multiple debugging challenges to get everything working.
  • Created an architecture diagram that effectively explains the system.
  • Learned how to think like an architect while designing a real-world project.

What I learned

  • Every problem has a solution. Just stay on it, and you'll figure it out.
  • Docker is great, but not always necessary. Hosting the frontend on S3 + CloudFront was the simpler and more efficient choice.
  • Small details matter. From IAM roles to API Gateway configurations, every small component plays a big role in making the system work.
  • Building real projects forces you to think deeply. This wasn’t just about setting up AWS services—it was about solving real problems.

What's next for this project?

  • Automating QuickSight Dashboards: Right now, dashboards are manually configured; automating them would be cool.
  • Adding More Events: Expanding beyond simple button clicks to capture richer user interactions.
  • Exploring Real-Time Streaming: Investigating Kinesis or Kafka for even faster data processing.

If you’ve read this far, I hope you found this interesting and insightful! Feel free to reach out if you have any questions or suggestions—I’d love to hear your thoughts.

Built With

Share this project:

Updates