For data scientists, developing neural network models is often hard to coordinate and manage, due to the need to juggle diverse tasks such as pre-processing, PyTorch layers, loss functions and post-processing, as well as maintenance of config files, code bases and communicating results between teams. PADL is a tool to alleviate several aspects of this work.
While developing and deploying our deep learning models in PyTorch, we found that important design decisions and even data-dependent hyper-parameters took place not just in the forward passes/ modules but also in the pre-processing and post-processing. For example:
- in NLP the exact steps and objects necessary to convert a sentence to a tensor
- in neural translation the details of beam search post-processing and filtering based on business logic
- in vision applications, the normalization constants applied to image tensors
- in classification the label lookup dictionaries, formatting the tensor to human readable output
In terms of the functional mental model for deep learning we typically enjoy working with, these steps constitute key initial and end nodes on the computation graph which is executed for each model forward or backward pass.
The standard approach to deal with these steps is to maintain a library of routines for these software components and log with the model or in code which functions are necessary to deploy and use the model. This approach has several drawbacks.
- A complex versioning problem is created in which each model may require a different version of this library. This means that models using different versions cannot be served side-by-side.
- To import and use the correct pre- and post-processing is a laborious process when working interactively (as data scientists are accustomed to doing)
- It is difficult to create exciting variants of a model based on slightly different pre and post-processing without first going through the steps to modify the library in a git branch or similar
- There is no easy way to robustly save and inspect the results of "quick and dirty" experimentation in, for example, jupyter notebooks. This way of operating is a major workhorse of a data-scientists' daily routine.
In creating PADL we aimed to create:
- A beautiful functional API including all mission critical computational steps in a single formalism -- pre-processing, post-processing, forward pass, batching and inference modes.
- An intuitive serialization/ saving routine, yielding nicely formatted output, saved weights and necessary data blobs which allows for easily comprehensible and reproducible results even after creating a model in a highly experimental, "notebook" fashion.
- An "interactive" or "notebook-friendly" philosophy, with print statements and model inspection designed with a view to applying and viewing the models, and inspecting model outputs.
With PADL it's easy to maintain a single pipeline object for each experiment which includes pre-processing, forward pass and post-processing, based on the central
Transform abstraction. When the time comes to inspect previous results, simply load that object and inspect the model topology and outputs interactively in a Jupyter or IPython session. When moving to production, simply load the entire pipeline into the serving environment or app, without needing to maintain disparate libraries for the various model components. If the experiment needs to be reproduced down the line, then simply re-execute the experiment by pointing the training function to the saved model output.
What it does
Defining atomic transforms
from padl import this, transform, batch, unbatch, value import padl import torch
Transform definition using
transform decorator. Any callable class implementing
__call__ can also become a transform:
@transform def split_string(x): return x.split() @transform class ToInteger: def __init__(self, words): self.words = words + ['<unk>'] self.dictionary = dict(zip(self.words, range(len(self.words)))) def __call__(self, word): if not word in self.dictionary: word = '<unk>' return self.dictionary[word] to_integer = ToInteger(WORDS) EOS_VALUE = to_integer.dictionary['</s>'] @transform def to_tensor(x): x = x[:10][:] for _ in range(10 - len(x)): x.append(EOS_VALUE) return torch.tensor(x)
transform also supports inline lambda functions as transforms:
split_string = transform(lambda x: x.split())
this yields inline transforms which reflexively reference object methods:
left_shift = this[:, :-1] lower_case = this.lower()
PyTorch layers are first class citizens via
@transform class LM(torch.nn.Module): def __init__(self, n_words): super().__init__() self.rnn = torch.nn.GRU(64, 512, 2, batch_first=True) self.embed = torch.nn.Embedding(n_words, 64) self.project = torch.nn.Linear(512, n_words) def forward(self, x): output = self.rnn(self.embed(x)) return self.project(output) model = LM(N_WORDS) print(isinstance(model, torch.nn.Module)) # prints "True" print(isinstance(model, padl.transforms.Transform)) # prints "True"
Finally, it's possibly to invoke all callables from an imported module as
Transforms directly. This saves writing the transforms explicitly:
import numpy import torchvision normalize = transform(torchvision).transforms.Normalize(*args, **kwargs) cosine = transform(numpy).cos print(isinstance(normalize, padl.transforms.Transform)) # prints "True" print(isinstance(cosine, padl.transforms.Transform)) # prints "True"
Defining compound transforms
Atomic transforms may be combined using 3 functional primitives:
Transform composition: compose
s = transform_1 >> transform_2
Applying a single transform over multiple inputs: map
s = ~ transform
Applying transforms in parallel to multiple inputs: parallel
s = transform_1 / transform_2
Applying multiple transforms to a single input: rollout
s = transform_1 + transform_2
Large transforms may be built in terms of combinations of these operations. For example the branching example above would be implemented by:
preprocess = ( lower_case >> clean >> tokenize >> ~ to_integer >> to_tensor >> batch ) forward_pass = ( left_shift >> IfTrain(word_dropout) >> model ) train_model = ( (preprocess >> model >> left_shift) + (preprocess >> right_shift) ) >> loss
Passing inputs between transform stages
In a compose model, if
transform_1 has 2 outputs and
transform_2 has 2 outputs, then in applying the composition:
transform_1 >> transform_2 to data, the outputs of
transform_1 are passed to
transform_2 positionally. So output-1 of
transform_1 is passed to input-1 of
transform_2 has only one input, then the outputs of
transform_1 are passed as a tuple to
In an upcoming release, we plan to allow for passing inputs from one stage to the next using input/ output names.
Often it is instructive to look at slices of a model -- this helps with e.g. checking intermediate computations:
Individual components may be obtained using indexing:
step_1 = model
Naming transforms inside models
Transform instances may be named inline:
s = (transform_1 - 'a') / (transform_2 - 'b')
These components may then be referenced using
print(s['a'] == s) # prints "True"
Applying transforms to data
To pass single data points may be passed through the transform:
prediction = t.infer_apply('the cat sat on the mat .')
To pass data points in batches but no gradients:
for x in t.eval_apply( ['the cat sat on the mat', 'the dog sh...', 'the man stepped in th...', 'the man kic...'], batch_size=2, num_workers=2, ): ...
To pass data points in batches but with gradients:
for x in t.train_apply( ['the cat sat on the mat', 'the dog sh...', 'the man stepped in th...', 'the man kic...'], batch_size=2, num_workers=2, ): ...
"batch" and "unbatch" key transforms
batch transform denotes where to split a transform between preprocessing and forward pass. The
unbatch transform denotes where to split between forward pass and postprocessing. Everything before
batch is performed in the data loader. This means that multiprocessing may be leveraged without extra boilerplate, to prepare data quickly for the forward pass. Every between
unbatch is performed on the GPU (is CUDA is being used) and in batches. Everything after
unbatch downstream is applied in a for loop over the rows of output of the forward pass.
Transform.infer_apply to apply a transform to a single data point, the transforms
batch adds the additional dimension which is otherwise created by batching in the data loader implicit in
Transform.eval_apply. Analogously, in
Transform.infer_apply the unbatch transform serves to remove this additional dimension, so that the output going to the postprocessing step has the same number of dimensions as the rows which come out of the forward pass in
As a very simple example:
m = transform(torch.nn.Linear)(10, 20) t = ( transform(lambda x: torch.tensor(x)) >> batch >> m >> unbatch >> this.tolist() )
t.infer_apply(x) is approximately equivalent to:
m(torch.tensor(x).unsqueeze(0))[0, :, :].tolist()
t.train_apply(x) are approximately equivalent to:
[y.tolist() for y in m(torch.stack([torch.tensor(y) for y in x]))]
Important methods such as all model parameters are accessible via
o = torch.optim.Adam(model.pd_parameters(), lr=LR)
For a model which emits a tensor scalar, training is super straightforward using standard torch functionality:
for loss in model.train_apply(TRAIN_DATA, batch_size=BATCH_SIZE, num_workers=NUM_WORKERS): o.zero_grad() loss.backward() o.step()
from padl import load model.save('test.padl')
from padl import load model = load('test.padl')
How we built it
We built PADL at LF1 with a team of 6, in the process of building highly branching and multi-modal models using NLP, information retrieval and vision deep learning. We originally started with a more complex version of the software which was statically typed, with a pattern matcher, and a large focus on jit compilation. In the process we realized that the clear value proposition of the concept lay in model building, saving and working interactively with the models. We would like to remain agnostic with respect to whether users prefer to use JIT models or standard modules.
Our philosophy in the released version is to maintain a minimal set of software requirements and to keep the transform concept and its associated builder and serializer central. Once these are available, users can build their transforms using a combination of pytorch and whichever favourite data-science packages they wish to use for their data processing. Key work-horses are inspect and the ast parser, as well as, of course, Pytorch functionality.
Challenges we ran into
Operator precedence in python
PADL overloads certain python operators to enhance the usablity of it's functional API. However the python operators have a certain built in precendence which must be respected. A challenge was to find a suitable collection of primitives with corresponding python operators, whose precedence also reflected the intuitions of a data scientists building a transform.
Introspecting python objects
inspect to access the code which created a
Transform object. Often we needed to extract imports and global variables which were key in creating the object and isolate them so that we can robustly save our models. This presented several technical challenges, especially for objects with multiple dependencies and nested definitions.
batchify operator, we are able to handle batching and extraction from batches automatically within our
Transform objects. This presented a challenging task to identify how a PyTorch data-loader object is built by recursively navigating the computation graph and splitting it between pre-processing, forward pass and post-processing.
Accomplishments that we're proud of
Completely self contained model saving
With PADL we are able to save and load models which require multiple imports, data blobs, weights, layers etc.. without specification of additional paths, packages, files etc.. Everything occurs by introspection and code navigation of the created
Transform object. This gives the security that the training results may be easily loaded and recovered at any point down the line.
Due to the human readable format of saved output, it is super easy to inspect a previously trained model and even modify it in a new experiment. This alleviates a key pain point in the data-science life cycle, namely reproducibility.
An enjoyable developer experience
We found that the functional philosophy applied at the high level of model structure and the object oriented Pytorch approach applied at the individual layers and forward pass level provide an optimal marriage and get very close to a commonly used mental model which data-scientists enjoy working in. The operator overloading means that the way that
Transform objects are written is visually close to the way data-scientists think about their models.
What we learned
New features of
python3 such as advanced introspection have enabled new ways of working for python developers -- for example as used in
pytest. We learned that these features can be used to great advantage for working with Pytorch code and models. By building on top of the great Pytorch data API, we learned that PADL's
Transform formalism can provide intuitive and easy to use abstractions for deep learning development and deployment.
What's next for PADL
In the next steps we plan to support:
- Model conversion/ interchangeability with MAR files so that for example a simple interface with
torchservemay be built.
- A simple interface to
- Support for arbitrary serialization.
- Model import from torch model repository.
- Support for skip-connections between diverse points on the
Log in or sign up for Devpost to join the conversation.