This repository contains a set of scripts to automate the process of gathering data from malware samples, training a machine learning model on that data, and plotting its classification accuracy.

  1. Make a copy of config-template.ini called config.ini and edit it.

  2. Ensure that the "tools" subdirectory has been initialized ("$ git submodule update --init tools")

  3. Either use get_samples.py to download samples or copy them into "all_apks" from another source.

  4. sort_malicious.py uses andrototal.org to sort them into "malicious_apk" and "benign_apk" folders.

  5. extract_apks.sh unpacks the .apk files into folders and checks the AndroidManifest.xml files for validity.

  6. parse_xml.py reads the AndroidManifest.xml files and puts the permissions requested by each app into "app_permission_vectors.json".

  7. run_trials.sh runs the tensorflow_learn.py script (where the ML happens) a number of times and writes the results to "results.csv".

  8. plot_data.py plots the data produced by the previous step using matplotlib.

Built With

Share this project:

Updates