Inspiration
Pharma brand perception is no longer shaped only by official communication or clinical literature. Influencers on platforms such as YouTube, Twitter, and Instagram increasingly act as intermediaries who interpret, frame, and sometimes distort medical information. Audience reactions in comments often reveal trust, skepticism, misinformation, or unmet concerns that are invisible in traditional analytics. The project was inspired by the absence of structured, scalable tools that can analyze this influencer-driven perception layer with sufficient depth, rigor, and interpretability for pharma use cases.
About the Project
digit-izer is an AI-driven market perception analysis system focused on influencer-generated content and audience reactions across social platforms. The system ingests videos, posts, and comments, converts multimodal data into structured representations, and analyzes how narratives propagated by influencers affect audience perception of pharma brands, drugs, or health topics.
A core design principle is separating influencer intent from audience interpretation. The platform identifies dominant narratives, tracks how they evolve over time, segments the responding audience, and quantifies perception shifts across platforms in a form usable by market research, brand, and communication teams.
How I Built It
Content ingestion is performed using platform-specific collectors. Video content is handled via yt-dlp, followed by automated speech-to-text transcription. Textual data from transcripts, captions, and comments is normalized, filtered for noise, and aligned temporally.
The analysis layer is built using transformer-based language models, which form the backbone for sentiment analysis, topic extraction, and named entity recognition. Rather than relying on coarse sentiment scores, the system applies aspect-based sentiment analysis to disentangle opinions related to efficacy, safety, trust, cost, and side effects, reflecting current best practices in healthcare NLP.
Audience reactions are embedded using sentence-level semantic encoders and clustered to infer latent audience personas. This embedding-based segmentation approach is grounded in recent research showing superior performance over keyword-driven or rule-based methods. Perception signals are weighted by engagement intensity to better approximate opinion impact.
To handle video-driven narratives, the system uses a late-fusion multimodal architecture inspired by CLIP-style alignment, combining transcript semantics with engagement metadata instead of brittle end-to-end video models. Narrative evolution is tracked as time-series over topic clusters, aligning with modern discourse analysis research that emphasizes narrative dynamics over static aggregates.
The architecture is modular and auditable, allowing individual components to be extended or replaced without affecting the overall pipeline.
What I Learned
A key insight was that perception is inherently multi-dimensional. Influencer tone, audience sentiment, and engagement dynamics frequently diverge. Comments often surface concerns and misconceptions absent from the original content. Another important learning was the necessity of explainability and traceability when applying state-of-the-art models in regulated domains such as pharma.
Challenges
The primary challenges included handling noisy user-generated content, aligning video narratives with asynchronous comment discourse, and avoiding overconfident inference in the absence of explicit demographic data. Platform heterogeneity and data access constraints further limited real-time analysis. These challenges were addressed through conservative modeling assumptions, engagement-weighted metrics, and a strict separation between descriptive analytics and prescriptive interpretation.
Built With
- clustering
- fastapi
- natural-language-processing
- postgresql
- python
- react
- speech-to-text
- transformer-based
- vector-database
- whisper
- yt-dlp
Log in or sign up for Devpost to join the conversation.