Hacker's Background
Hi! My name is Kathryn Dahlgren and I'm a second year PhD student in Computer Science at UCSC. My research interests primarily focus on topics in the fields of Databases and Distributed Systems.
Project Background
A major problem facing the cutting edge of research in the field of Databases is the general negligence surrounding the integrated support of sophisticated data management tools in NoSQL database systems. In the late 1900s, a number of successful database management systems (DBMSs) centralized around the relational data model and promoted the standardization of a number of database management tools, including basic aggregates, joins, and various types of integrity constraints. However, as traditional relational DBMSs became more and more feature-heavy and more and more costly to build and maintain, a number of researchers and developers in the early 2000s rebelled against the concept of monolithic software adhering to a strict data model as the best way to satisfy database needs. The results triggered the development of a plethora of different "NoSQL" database systems (DBSs) characterized by more flexible data models and leaner supplies of built-in features. Accordingly, NoSQL DBSs generally ship without the range of data management tools rendered canonical by traditional DBMSs. However, the generally accepted design decision forces users as a whole to engage in a k-implementation cycle to supply their particular instances of NoSQL systems with standard data management tools. Such a circumstance increases the amount of work required to transform a NoSQL database system into a database management system, which translates, ultimately, into wasted time and resources.
Accordingly, to free NoSQL users from the k-implementation nightmare, I propose Piper, a curated open source package index and management system for collecting and maintaining general implementations of data management tools for installation on a wide variety of NoSQL systems.
Inspiration
Piper is inspired by my research into the cutting edge of NoSQL systems as part of a graduate class in Database management systems. My project for the class consisted of building a NoSQL DBS infused with high-level data management features, including built-in support for basic aggregate functions, joins, and domain subsumption integrity constraints via ontologies. During the development of the NoSQL system, I realized my sudden engagement in the k-implementation nightmare passively constructed by the design decisions of NoSQL DBS developers. After presenting my ideas for a package index and management system at a recent conference on innovative database research (CIDR'17), I was encouraged by the citation of the problem by leaders in the field and developed a keen impression of the gravity of the current situation and the associated bottlenecks of the problem on the impending horizon of database technologies.
What it does
Piper is a package index and management system for database management tools allowing users of NoSQL systems to customize the functionality of NoSQL databases with standardized implementations of database management tools without engaging in the k-implementation cycle currently plaguing the field of databases. The project submitted as part of Hack UCSC is the first prototype of the Piper system. Currently, the prototype supports the five basic aggregate functions and a simple (albeit inefficient) implementation of natural JOIN on representatives of two classes of NoSQL systems, specifically the MongoDB document store and the PickleDB key-value store.
How I built it
The Piper prototype currently manifests as a Github organization composed of the base Piper management code (piper) and two indexed packages (aggsPack and simpleJoin). All the core code is written in Python. Scripts designed to orchestrate code execution on a Linux(-based) system are written in Bash.
How it works
Piper provides a set of packages encompassing standard implementations of data management tools. Piper also provides the interface for translating the database calls native to the underlying NoSQL database into the calls necessary to support the management tool logic. Specifically, for every NoSQL system supported by Piper, there exists an adapter in the Piper code base for transforming a subset of the NoSQL system API into functions needed by the standard management tool implementations supported by the index. Currently, the prototype includes two adapters, one for the MongoDB document store and one for the PickleDB key-value store. At the moment, the adapters are hand-written, but the small size of the necessary transformation code renders the process nearly trivial. (An exciting area of future work constitutes the automatic generation of adapters.)
Accordingly, from the user's perspective, given an interface with the underlying NoSQL database,the user merely needs to install the desired packages from the piper index and utilize the associated API of the installed package directly in the interface. As a result, instead of creating home-grown implementation of the desired data management tools, Piper reduces the space of user concerns to just worrying about appropriately defining the paths to install piper packages.
Getting Started
After cloning piper (https://github.com/PiperProject/piper), please review the README for installation and usage instructions, guidelines, and examples.
Challenges I ran into
I encountered a number of challenges during the development of Piper, primarily in relation to lack of experience with particular tools and processes. The most prominent challenge eluding a clean, elegant solution during the given duration of the hackathon relates to automating the installation of packages from the Piper Project organization as submodules for a single, specific package, which is also necessarily managed in the Piper Project organization. During development, I wanted to push changes made to the package index submodules to their respective git repos, while not pushing the submodules themselves with piper commits. In the interest of time, I adopted an unsatisfactory solution hack requiring (1) copying submodule updates to external clones of index packages and pushing the results and (2) uninstalling the submodules before committing and pushing piper updates. An immediate task for future work is sitting down and investing the time in fiddling with the git commands more closely to render a cleaner solution.
Accomplishments that I'm proud of
The Piper prototype as a whole represents a source of pride. The project constitutes the working pinnacle of an idea borne through many days and nights invested in the development of my own NoSQL DBMS. Furthermore, the project represents a promising open source solution to a current problem which will only increase in importance in the coming years. Overall, I take pride in exploring an initial general solution to a very real and very challenging problem in the field of Databases surrounding the k-implementation of database management tools rendered necessary by the general lack of built-in support for data management as provided by NoSQL systems.
What I learned
The Piper prototype afforded a number of lessons. Over the course of the 36-hour period, I : <> Learned how to use both MongoDB and PickleDB for the first time; <> Learned how to create a Github organization; <> Learned how to better micro-manage Github submodules; and <> Gained a better appreciation for the level of user interaction necessary to drive the success of Piper in the future (which is succinctly summarized in the phrase 'minimal as possible').
What's next for Piper Project
The immediate horizon for the Piper Project suggests a number of extensions: <> The integration of more high-level database management tools esp.: - Different flavors of joins; - Group/Order by; and - Domain subsumption integrity constraints. <> The current prototype is strongly tied to the process of interacting with underlying NoSQL database code via a Python interface. Extending the variety of NoSQL systems supported by Piper will require building better interfaces for database systems implemented using other languages.
Additionally, the NoSQL data management tool k-implementation problem represents a relatively unexplored space in the field of Databases. However, the problem is already straining the experiences of NoSQL users and developers. Furthermore, given the widespread and growing popularity of NoSQL technologies, the problem will only continue to increase in importance over the coming years. Accordingly, Piper represents a timely solution with the potential to render a sizable impact on the lives of NoSQL users and on removing an imminent challenge to the progress in the field of Databases. I look forward to continuing my work on Piper and extending and augmenting the system into a successful and influential open source project.
Log in or sign up for Devpost to join the conversation.