Inspiration
Discovering genomic features is a task that comes with some rules, and a lot of exceptions to those rules. Expressing these rules all in one place and understanding the way that they play off of each other is something that doesn't happen.
What it does
Depicted below is a model transcription region where the character 'X' represents a wildcard and 'Y' represents the payload. In this example, there are two AT-rich sequences: one (the Pribnow box) appearing 10 characters before the translation section and another appearing 35 characters before the translation section. This structure is typical for bacteria. It also supports eukaryotes.
How we built it
The s(CASP) program is an implementation of rules for identifying promoters in DNA sequences. Our team first compiled these rules in common English as seen in this document. These English rules were then converted to s(CASP) code.
Challenges we ran into
Learning s(CASP).
Accomplishments that we're proud of
Actually getting an MVP working with a genome feature.
What we learned
- s(CASP)
- DNA
- Transcription
What's next for Automated Genome Feature Discovery
Implement other genomic features beyond just promoters, such as enhancers. We'd also like to have other forms of input data such as histone modifications and DNA methylation to future our rules.
Built With
- prolog
- python
- s(casp)

Log in or sign up for Devpost to join the conversation.