Inspiration

Drug discovery tooling is usually locked inside heavy desktop suites or server clusters. For students, clinicians, and early-stage startups, that creates a huge barrier: you need a workstation, a license, a GPU – just to answer a simple question like “Is this molecule likely to be soluble and non-toxic?”

PocketQSAR was born from the idea of putting QSAR in your pocket:

  • Show that a modern ARM phone is powerful enough to run meaningful cheminformatics models fully on-device with ExecuTorch.
  • Turn abstract ML metrics into visual, intuitive chemistry – chemical space maps, descriptor radar plots, and 2D structures you can tap.
  • Provide an educational playground where anyone can feel how changes in MW, logP or TPSA move a molecule across developability space.
  • Use the ARM AI challenge as a real-world testbed: not just “it runs on a phone”, but “here is the latency, cores, ABI and performance profile of this QSAR model on your device”.

In short: PocketQSAR is an experiment in democratizing molecular ML – making serious drug-discovery style analysis feel as simple as opening an app.

What it does

⚙️ What PocketQSAR Does

PocketQSAR is an on-device QSAR workbench for small molecules:

  • Runs a QSAR model on your phone
    Uses an ExecuTorch .pte model to predict:

    • Aqueous solubility (logS)
    • Toxicity probability
    • A combined Developability score (0–100)
  • Visualizes chemical space

    • Each demo molecule is a point in a 2D “chemical space” scatter plot.
    • Tap a point to select that molecule, run inference, and see where it sits relative to others (green = friendlier / red = riskier).
  • Shows 2D molecular structures

    • For the selected molecule, the app loads a 2D structure image from assets/mol_imgs and displays it next to the predictions.
  • Explains key descriptors

    • MW, logP and TPSA are shown with progress bars and numeric values.
    • A radar chart highlights how “balanced” the molecule is across these properties.
  • Keeps a prediction history

    • Every run is stored in a lightweight in-memory log.
    • A History dialog lets you quickly review past molecules and their scores.
  • Lets you browse the whole library

    • The Browse molecules button opens a list of all demo compounds so judges can jump directly to any example.
  • Benchmarks ARM on-device performance

    • A Run latency benchmark button runs many inferences on the current molecule and reports:
    • Average, p50, p90, min, max latency (ms)
    • Device model, hardware, ABI and core count
    • This makes the app both a QSAR demo and a mini ARM AI benchmark tool.

How we built it

🛠 How We Built It

We split the work into two main pieces: a Python / ML pipeline and an Android / ExecuTorch app.

1. Data + Model in Python (Colab)

  • Started from the ESOL (Delaney) aqueous solubility dataset.
  • Used RDKit to:
    • Parse SMILES
    • Compute descriptors (MW, logP, TPSA, HBD, HBA, etc.)
    • Generate model input features (2048-dim fingerprint-like vector).
  • Trained a small PyTorch MLP that:
    • Regresses logS (solubility).
    • Predicts toxicity probability (binary output via sigmoid).
  • Tuned basic hyperparameters (layers, hidden size, dropout) until it was stable on validation and small enough for on-device inference.

2. Preparing data for mobile

  • Selected a curated subset of molecules for the demo.
  • Built a mobile-friendly JSON (mobile_molecules.json) with, for each molecule:
    • id, name
    • descriptors (mw, logp, tpsa, hbd, hba, etc.)
    • features (the exact FloatArray we feed into ExecuTorch)
    • embed2d coordinates from a PCA/UMAP embedding to drive the chemical space scatterplot.
  • Generated 2D PNG structures for each molecule using RDKit and saved them to app/src/main/assets/mol_imgs/.

3. Exporting the model to ExecuTorch

  • Took the trained PyTorch model and:
    • Traced/exported it to an ExecuTorch-compatible representation.
    • Targeted an XNNPACK / CPU backend suitable for ARM devices.
  • Saved the final mobile model as qsar_net_xnnpack.pte.
  • Placed the .pte model under assets/models/ and wrote a small helper (ModelFileHelper) to:
    • Copy it to internal storage on first run.
    • Load it via org.pytorch.executorch.Module.load().

4. Android app in Kotlin

  • Built a single-activity app (MainActivity) in Kotlin, wired to:
    • QsarExecuTorch: thin wrapper around ExecuTorch model inference.
    • DemoMoleculeRepository: loads and parses mobile_molecules.json.
  • Designed the UI in activity_main.xml:
    • Run Demo Prediction button.
    • 2D molecule image (ImageView) that loads from assets/mol_imgs.
    • ChemicalSpaceView: custom View that draws the 2D embedding; tap to select molecules.
    • DescriptorRadarView: custom View for MW/logP/TPSA radar plot.
    • Descriptor bars (MW, logP, TPSA) + a Developability score progress bar.
    • History and Browse molecules buttons.

5. On-device ARM benchmarking

  • Instrumented the model call in Kotlin with SystemClock.elapsedRealtimeNanos().
  • Added a “Run latency benchmark” section:
    • Runs the model N times (e.g. 50 / 200 / 1000) on the current molecule.
    • Collects per-run latency in ms, sorts, and computes:
    • Average, p50, p90, min, max.
    • Displays:
    • Build.MODEL, Build.HARDWARE, SUPPORTED_ABIS,
    • availableProcessors(),
    • and the latency stats.
  • This turns PocketQSAR into both:
    • A scientific demo (QSAR + descriptors + chemical space), and
    • A practical ARM ExecuTorch benchmark on real devices.

6. Packaging & testing

  • Built debug and release APKs from Android Studio.
  • Tested on:
    • Emulator (x86_64, for layout & logic),
    • Real ARM phones (for actual latency and power of ExecuTorch on-device).
  • Captured screenshots and benchmark outputs for the ARM AI challenge submission.

Challenges we ran into

🚧 Challenges We Ran Into

We hit real-world problems at almost every layer of the stack:

1. Getting from PyTorch → ONNX → ExecuTorch

Exporting the QSAR model wasn’t trivial:

  • The first ONNX export failed because we accidentally wrapped the model in a plain function, and torch.export requires an actual nn.Module.
  • We also saw dynamic shape / opset warnings (17 vs 18) and had to adjust the export call and simplify the model so ExecuTorch could run it reliably on mobile.

Once we finally had a .pte file, we still had to confirm that input tensor shapes and dtypes matched exactly on Android.

2. Feature dimension mismatch (2048 vs 2055)

On device we hit:

Attempted to resize a static tensor. Expected shape (1, 2048), but received (1, 2055).

This forced us to go back and “audit” the full pipeline:

  • Check how features were built in Python.
  • Confirm the model’s first layer input size.
  • Make sure the JSON feature vectors and the Kotlin inputDim were in sync.

After aligning everything to 2048 and regenerating the assets, ExecuTorch finally ran without shape errors.

3. Android + ExecuTorch integration and assets

We hit several small but painful Android issues:

  • Making sure the .pte model was correctly copied from assets/ to internal storage before calling Module.load().
  • Getting the package name / namespace / paths consistent so Java/Kotlin code, XML views and custom views (ChemicalSpaceView, DescriptorRadarView) all worked together.
  • Dealing with NullPointerException when the custom views or findViewById didn’t match the XML layout IDs.

4. Rendering molecules & matching images

For the 2D structures:

  • Filenames generated offline didn’t always match our in-app IDs / names.
  • We had to implement a fuzzy matching loader that tries multiple patterns: id.png, name.png, lowercased, underscores, etc.
  • Also handle the “no image found” case gracefully so the app doesn’t show stale or broken images.

5. Making the UI both scientific and mobile-friendly

We wanted:

  • Chemical space scatter,
  • Descriptor bars,
  • Radar plot,
  • Structure image,
  • Prediction text,
  • History, browsing, and benchmark controls…

…all on a phone screen.

Balancing information density vs readability was a challenge:

  • We moved to a scrollable layout, simplified labels, and used small custom views instead of heavy chart libraries to keep performance snappy on ARM.

6. On-device benchmarking nuances

Measuring latency correctly isn’t as simple as end - start:

  • We had to add a warm-up pass so that we don’t benchmark the first-time JIT / cache effects.
  • Use SystemClock.elapsedRealtimeNanos() instead of currentTimeMillis.
  • Sort latencies and report avg / p50 / p90 / min / max so judges see stable, distribution-aware numbers, not just a single noisy value.

All of these challenges shaped PocketQSAR into something that is not only a working demo, but also a realistic snapshot of what running molecular ML on ARM phones actually feels like.

Accomplishments that we're proud of

1. Turning a phone into a mini medicinal chemistry lab

We didn’t just show a number on screen – we built a full QSAR workbench:

  • On-device prediction of logS and toxicity probability using ExecuTorch.
  • A derived Developability score (0–100) that combines solubility, toxicity, and simple Lipinski-style rules.
  • Linked 2D structures, descriptors, chemical space, and scores in one coherent workflow that feels like a pocket-sized drug discovery UI.

2. Rich, explainable visualizations on mobile

On a small ARM device, we still managed to pack:

  • An interactive chemical space view where each molecule is a point you can tap.
  • A descriptor radar plot for MW/logP/TPSA to give quick “shape” intuition.
  • Live descriptor bars and a clear progress-based developability gauge.
  • Instant feedback when switching molecules, so the model feels tangible and not like a black box.

3. Three-tier on-device benchmark system

We’re especially proud of the built-in ARM benchmarking, designed for judges:

  • Quick benchmark (e.g. 50 runs)

    • Fast sanity check while demoing.
    • Shows how ExecuTorch responds even under light load.
  • Standard benchmark (e.g. 200 runs)

    • Balanced mode for screenshots and fair comparisons between devices.
    • Enough runs to stabilize average and median latency.
  • Deep benchmark (e.g. 1000 runs)

    • Serious, statistically meaningful measurement for the ARM challenge.
    • Reports:
    • Average latency
    • p50 and p90
    • Min / Max
    • Device model, hardware, ABI, and CPU core count

All three modes share the same UI and use real ExecuTorch inference on the actual device, so PocketQSAR doubles as a QSAR demo and a portable ARM AI benchmarking tool.

4. A complete end-to-end pipeline, not just a demo screen

We’re proud that PocketQSAR covers the full path:

  • Data curation (ESOL), descriptor engineering, and model training in PyTorch.
  • Model export to ExecuTorch and careful input-shape alignment.
  • JSON + image asset pipeline for molecules, descriptors, and embeddings.
  • Kotlin/Android integration with custom views, history, browsing, and benchmarking.

It’s a project that touches ML, cheminformatics, mobile engineering, and performance profiling – all running smoothly on a single ARM phone.

What we learned

1. Bridging ML and mobile is more than “export and run”

We learned that taking a PyTorch model to a phone is not a one-click step:

  • torch.onnx.export / torch.export are sensitive to how the model is wrapped – you really do need a clean nn.Module.
  • Input shapes and dtypes must match exactly from Python → JSON → Kotlin → ExecuTorch; even a small mismatch (2048 vs 2055) blows up at runtime.
  • Designing the network with mobile in mind (size, ops, output heads) up front saves pain later.

2. ExecuTorch is powerful but strict

Working with ExecuTorch taught us:

  • Keep the model graph simple and well-behaved – avoid exotic layers or unsupported ops when planning for mobile.
  • Asset handling matters: the .pte file must be copied correctly from assets/ to internal storage and loaded from the right path.
  • When ExecuTorch fails, the errors are usually telling you something real about shapes, static tensors, or unsupported behavior.

3. On-device benchmarking needs rigor, not just a stopwatch

We learned how tricky “just benchmark it” can be:

  • You need warm-up runs to avoid counting JIT / cold-start overhead.
  • Using elapsedRealtimeNanos() and multiple runs gives you stable distributions (avg, p50, p90, min, max), not just one noisy number.
  • Different run counts (quick / standard / deep) are useful for different audiences: live demo vs serious analysis.

4. Good UX makes ML feel understandable

We also learned how much visual design affects understanding:

  • Chemical space plots + radar charts + descriptor bars make the model’s behavior feel more intuitive than raw numbers.
  • Small details (colors for risk, smooth layout in a scroll view, clear labels) matter when you’re trying to explain ML decisions to non-ML users.
  • Tapping points and instantly seeing predictions and structures is a powerful way to build trust in a model.

5. Mobile constraints force clarity

Working within a phone’s constraints taught us to:

  • Keep the model compact and efficient enough for real-time use.
  • Avoid overcomplicated charts / libraries and instead build lightweight custom views that are fast and easy to control.
  • Think end-to-end: training, export, assets, Android lifecycle, and user experience all have to line up for the app to feel “simple” to the user.

Overall, we came away with a much clearer idea of what it takes to turn a research QSAR model into a real, shippable, on-device experience on ARM.

What's next for PocketQSAR: On-Device Drug Discovery with ExecuTorch

PocketQSAR is a first step toward serious cheminformatics on ARM phones. There are several directions we’re excited to explore:

1. Richer models and endpoints

  • Add more prediction heads beyond logS + toxicity:
    • Permeability, clearance, hERG risk, basic ADMET flags.
  • Experiment with multi-task learning so one compact model can power several medicinal chemistry questions at once.
  • Integrate uncertainty estimation (MC dropout / ensembles) so the app can say “I don’t know” when far outside its training domain.

2. SMILES input and lightweight editing

  • Let users type or paste SMILES and run on-the-fly predictions.
  • A simple “edit mode” to tweak fragments or R-groups and instantly see how developability shifts.
  • Highlight counterfactual suggestions (e.g., “reduce logP”, “lower MW”) based on descriptor patterns.

3. More advanced chemical space analytics

  • Ship UMAP + Louvain / Leiden clusters precomputed for larger libraries, so users can explore pockets of similar chemotypes on-device.
  • Allow switching between PCA, UMAP, and maybe t-SNE views.
  • Add simple cluster-level summaries: typical MW/logP/TPSA ranges and risk profiles per cluster.

4. Deeper ARM + ExecuTorch benchmarking

  • Extend the current three benchmark modes with:
    • Optional throughput mode (molecules per second).
    • Battery / thermal awareness (not full profiling, but user-facing hints).
  • Compare different ExecuTorch backends (where available) or model variants to illustrate accuracy vs latency trade-offs on ARM.
  • Export benchmark reports (e.g., JSON) that can be aggregated across devices for larger studies.

5. Domain adaptation for specific projects

  • Allow loading different .pte models (e.g., “oncology set”, “CNS set”) so PocketQSAR becomes a front-end shell for multiple specialized pipelines.
  • Explore on-device fine-tuning or calibration on small local datasets (where feasible) to adapt scores to specific labs or chemotypes.

6. Educational and collaboration features

  • Add an “Explain this prediction” screen:
    • Show descriptor contributions, thresholds, and rule-of-five style messages.
  • Include a teaching mode for students with guided examples (“find a more soluble analogue”, “minimize toxicity at fixed MW”).
  • Optional export of session history (molecules + scores + timings) that teams can share or attach to reports.

In short, PocketQSAR can evolve from a single-app demo into a modular, ARM-first platform for interactive, explainable, and benchmarkable molecular ML on everyday devices.

Built With

Share this project:

Updates