Structured Pruning of Deep Convolutional Neural Networks

Structured Pruning of Deep Convolutional Neural Networks, Sajid Anwar et al. In the ACM Journal on Emerging Technologies in Computing special issue on hardware and algorithms for learning-on-a-chip, May 2017.

Notes

Quick, a software engineer mentions a “performance” problem to you. What do they mean?

This is, of course, an unfair question. There are too many different ideas that all get branded “performance” for us to know what we are trying to solve. This paper is simultaneously about two different flavours of performance.

On the one hand, the “performance” of a neural network is related to its ability to perform its task: the correctness of its inferences. There isn’t really a good way to know what neural network configuration will perform efficiently for the (unknown) function you want it to approximate. Therefore, the rule of thumb is find a network that’s too complex, and stop training it when it begins to overfit (when its performance starts to degrade, because it’s being too specific about whether an input looks like an example from the training set rather than whether it shares important features).

Now we meet the other kind of performance: the amount of resources consumed to do the work. A large neural network needs to do a lot of computations with a lot of numbers to classify an input, and that means using a lot of processor cycles and a lot of memory. Because our approach to designing the network was to overspecify it, we are using more computer than we need. But if that computer is relatively low-specification and battery operated—a mobile phone for example—this may render our solution unusable.

So, how can we turn a complex neural network into a simpler neural network? While this isn’t totally satisfying, the answer is: “guess”. Turn off bits of the network (i.e. set weights to zero), and see whether it still classifies (performs) well. This act of turning bits of the network off is called pruning.

Ironically some previous work in this space has actually not been great for performance (the resource kind). You can “unroll” convolutional layers (commonly found in image-classifying networks) into matrix multiplications, and you can turn that into a sparse matrix by approximating all small weights with zero and only storing the non-zero values and their locations. But now, even though you have fewer calculations, you may have more memory accesses in trying to solve where the weights should be used. And that could be slower than not reducing the network.

The work in this paper takes a structured approach to pruning the network. Whole feature maps (scores indicating whether particular characteristics of an image were found, and where, in the input image) can be removed, the network retrained, and the performance (ability to classify) measured afterwards. At smaller scales, the kernels can be pruned in particular deterministic ways, replacing a full weights matrix with a start index, a “stride” (gap between each non-zero value) and the list of non-zero weights. The different possibilities are explored using a combination of random generation and evolutionary iteration; networks that have a misclassification rate within given tolerance the original are kept into subsequent generations.

The results seem promising. With pruning at both levels of abstraction, the resulting network is just as deep (it contains as many layers) but it has fewer nodes at each layer and fewer connections between nodes. The systematic pruning approach means that the resulting networks are smaller in memory and faster in use: CPU time measurements are down approximately two thirds when compared with the initial, unpruned network.

However, be careful when interpreting the graph: the authors are showing the reduced execution time of the unrolled matrix multiplication for one layer of one network configuration. It is not clear what this means for overall behaviour of the network, what the misclassification rate of this network was (they show a tolerance cutoff at 4%, which may be too high for a given use case), or in general how the CPU time savings vary with network topology. In other words, we have a single graph, and don’t know how to generalise it.

I hope that at some point a sound theoretical basis for choosing the architecture for a neural network to solve a given problem will be developed. In fact, I sort of hope that it exists now, and that I haven’t found it. I don’t think so: for the moment, whack-a-mole is the state of the art, and this paper shows you can whack quite a lot of moles and still get reasonable results.

Posted in academia, AI | Leave a comment

On the continuous history of approximation

The Difference Engine – the Charles Babbage machine, not the steampunk novel – is a device for finding successive solutions to polynomial equations by adding up the differences introduced by each term between the successive input values.

This sounds like a fairly niche market, but in fact it’s quite useful because there are a whole lot of other functions that can be approximated by polynomial equations. The approach, which is based in calculus, generates a Taylor series (or a MacLaurin series, if the approximation is for input values near zero).

Now, it happens that this collection of other functions includes logarithms:

\(ln(1+x) \approx x – x^2/2 + x^3/3 – x^4/4 + \ldots\)

and exponents:

\(e^x \approx 1 + x + x^2/2! + x^3/3! + x^4/4! + \ldots\)

and so, given a difference engine, you can make tables of logarithms and exponents.

In fact, your computer is probably using exactly this approach to calculate those functions. Here’s how glibc calculates ln(x) for x roughly equal to 1:

  r = x - 1.0;
  r2 = r * r;
  r3 = r * r2;
  y = r3 * (B[1] + r * B[2] + r2 * B[3]
    + r3 * (B[4] + r * B[5] + r2 * B[6]
        + r3 * (B[7] + r * B[8] + r2 * B[9] + r3 * B[10])));
  // some more twiddling that add terms in r and r*r, then return y

In other words, it works out r so that it is calculating ln(1+r), instead of ln(x). Then it adds together r + a*r^2 + b*r^3 + c*r^4 + d*r^5 + ... + k*r^12…it does the Taylor series for ln(1+r)!

Now given these approximations, we can combine numbers into probabilities (using the sigmoid function, which is in terms of e^x) and find the errors on those probabilities (using the cross entropy, which is in terms of ln(x). We can build a learning neural network!

And, more than a century after it was designed, our technique could still do it using the Difference Engine.

Posted in AI | Tagged | Leave a comment

HPC’s Shift to the Cloud

Timothy Prickett Morgan writes on The Next Platform about the slow but inevitable shift to cloudy infrastructure. It seems that a tipping point has been reached, where the amount of IT money spent on “cloudy” infrastructure overtook the amount spent on “traditional” datacentre gear. This happened in 2018Q3, according to the IDC report cited in the article.

Prickett Morgan suggests that the transformation from bare metal to the cloud has been faster in HPC than in enterprise IT. In some senses, this makes sense, because HPC has long had the sorts of abstractions between the application and its environment that make it possible to change infrastructure. The days where an atomic energy or climate situation would be capable of running only on dedicated hardware with integrated bench seating are long gone, and all of the top supercomputers are now (highly tuned, admittedly) GNU/Linux clusters running on normal-ish CPUs: mostly Intel, some IBM POWER, and ARM are moving from evaluation to deployment too. All of these technologies, as well as the Nvidia GPUs used in CUDA codes and deep learning applications, and even Google’s TPUs, are to be found in public cloud environments from the big providers.

On the other hand, there still are big honkin’ boxes of bare metal, with the number one spot changing almost every year. So not all HPC applications are cloud-suitable, and some of those codes that people are trying to port to the cloud may prove challenging. Here’s a summary of the components of a “traditional” HPC deployment, and how it might help or hinder adaptation to the cloud.

Infrastructure

Modules

Plenty of HPC sites already virtualise their filesystems to some extent, with the Modules package. Modules let administrators separate the installation and management of packages from their availability to users, by defining modulefiles that configure the environment to make each package accessible.

Where a team is already using modules to set up its environment for building or running codes, adopting containers and similar abstractions should be straightforward. Docker images, for example, can contain the module packages and the environment changes necessary to use the modules, and the HPC application image can be composed on top to include the relevant environment.

Job submission

HPC systems tend to already be built with the kind of self-service in mind that devops teams in commercial software development strive to provide. This heritage has evolved from the necessarily multi-user nature of a large supercomputer deployment. Mainframe batch submission systems, grid middleware (such as Sun -> Oracle -> Univa Grid Engine) and SLURM are based around the idea that a user can request a certain amount of resources to run their codes, the request being queued until the resources are available.

The open source SLURM project already supports cloud-native demand scheduling. Others are using Kubernetes as an elastic demand scheduler.

However, a lot of teams find job-specific submission scripts with hard coded assumptions about the environment they will run on, and codes that are tightly coupled to the submission script. Loosening that coupling will require some effort, but will make the codes “portable” to a cloud environment and enable new workflows for testing and development.

File systems

HPC sites frequently use high-performance parallel filesystems like Lustre or IBM’s GPFS. While these filesystems can be deployed to a cloud environment, the performance characteristics will differ and it will be harder to tune to the specific topology offered by a physical deployment. Notice that HPC filesystems do not perform well in some scenarios so some applications like AI training may benefit from re-evaluating the data access strategy anyway. Portable codes could be tested against new hardware without significant capital outlay; for example Google Cloud uses Intel Optane non-volatile memory.

Job-specific nodes

A traditional cluster will often have login nodes for accessing the cluster from the scientific workstations, batch nodes for running and using the batch submission systems, compiler nodes for building codes, metadata nodes if it uses a parallel filesystem, and finally compute nodes on which the simulations and deep learning jobs are actually executed. The compute nodes may be divided into groups to service different queues, or to differentiate between testing/debugging and production jobs.

While operations teams may be interested in getting close to 100% utilisation out of the compute nodes, the fact is that the other classes of machine exist because they need different configurations, not because they need to always be available with dedicated hardware. They are ideal candidates to lead the transition to on-demand scaling, perhaps treating a physical cluster as a “private cloud” that commits as much hardware to compute as possible, scaling its other functions as needed.

Meanwhile, compilation and computation can be modelled as serverless workloads, consuming resource when they are executing but scaling to zero when not in use.

Application Support

MPI

MPI libraries like Open MPI already support demand-based scaling at job launch, using the -np option to control how many processes are started and the --hostfile to indicate where those hosts are. In principle it might seem like the hosts in the host file could be discovered using the Kubernetes service registry or similar services from other cloud orchestration layers. In practice the MPI library will need to support launching the process on the nodes so a middleware (see above) will still need to be deployed on top, or the MPI software extended with native support for the cloud’s orchestration API.

Software Licences

This turns out to be one of the biggest hurdles for demand-scaling for many teams. HPC software such as proprietary compilers, numerical algorithms libraries and developer tools are licensed with a particular maximum number of parallel uses. Lab-developed codes may have evolved with assumptions about where the licence file is located, and not built defensively against the possibility that a licence can’t be checked out. The ISV may have built assumptions into their licensing scheme, for example the host having a fixed IP or MAC address. A researcher or developer could have copied a particular licence file into their home directory, using that beyond other agreements being arranged with the vendor.

Where the licensing scheme is flexible enough to allow portability of the software, a good technique is to centralise management of the licenses using a secrets store, for example Vault, and to inject them into the HPC applications’ containers when they are launched.

Alternatively, particularly if the licensing scheme is too rigid, it’s worth evaluating the effort required and performance impact sustained to port the code to a different technology, for example an open source compiler. The trade off of this approach is that on the one hand, increased deployment flexibility is strategically beneficial, but the short term costs, staffing requirements and impact on the scientific mission can make it hard to justify or unworkable.

Conclusion

While there are significant benefits to be had in porting high-performance codes to cloud environments, the task is not without its challenges. Labrary consultancy with Graham Lee, bringing his experience in cloud-first devops teams, scalable systems at Facebook, and High-Performance Computing on ARM, can help your team identify and overcome these challenges.

Graham will be at the HPC, Big Data and Data Science devroom at FOSDEM in Brussels, February 2-3. Say hello, grab some time and let’s move your codes forward!

Posted in HPC | Leave a comment

The ABC of Software Engineering Research

About this paper

The ABC of Software Engineering Research by Klaas-Jan Stol and Brian Fitzgerald, published October 2018. See link for full citation.

Notes

There are too many ways in which terms describing research methods in software engineering get used, and these authors have a solution. The reason, at least according to the introductory discussion in this paper, is in part a case of discipline envy. This is the idea that we don’t quite know what software engineering is, but we know what those people over there do, and we like that, so we’ll co-opt it.

You could argue that the entire idea of software engineering is discipline envy. A collection of computing experts from academia and industry didn’t quite know how to formalise the problems faced by software makers in the 1960s, but they did know what engineers do, and they liked that. In fact, it’s not clear that they (or at least we, their intellectual descendents) truly understood what engineers do, but nonetheless we gave it a jolly good go. In 1968, in the town of Garmisch in Germany, a discipline was born.

Now, it’s interesting that discipline envy has turned up so early in this discussion, because the contribution in this paper is a framework borrowed from social science researchers. To understand the applicability of cross-discipline seeding, we have to ask how strong the analogy “software engineering is just like X” seems to be (as well as identify how well the proposed idea worked in field X). So, is software engineering like a social science?

Here, the authors carve the field in two. They distinguish “solution-seeking” research, in which we identify what we ought to do about a problem, from “knowledge-seeking” research, in which we identify what people do do about the problem. The bad news about ditching the solution-seeking half of the discipline is that we just lost the engineering from software engineering, the bit where we use scientific results to propose novel solutions to problems.

The good news is that knowledge-seeking software engineering research does look quite a bit like a social science. People, in some context, do things, and we can try to understand that. Indeed that is the origin of the ABC initialism: Actors (the people), Behaviour (the things) and Context.

Well, we can understand bits of it at a time. Like good consultants, the authors introduce a quadrant diagram. On one axis, the “generalisability” of a research method, from highly universal contexts to deeply specific contexts. On the other, the “obtrusiveness” of the method, reflecting whether the researchers are passive observers or active interferers.

As the Labrary stands at the intersection, we approve of the idea that two different things lie on a continuum, rather than being an either/or choice. This framework makes the point that while a particular research technique or strategy is situated somewhere in the general/obtrusive map, others are available elsewhere. The reaction to a highly-controlled lab experiment should not be to declare that the result is not generalisable, but to understand what else could be done to explore generalisations of its results.

The discussions of where particular research strategies fit in the map are interesting, though some of the analogies drawn are fairly tenuous. The authors show where on the map the maximum applicability of a method for each of the key properties lies: universally contextual, unobtrusive research generalises over Actors best, while highly-obtrusive methods allow more precise measures of particular Behaviours and more focus gives a more realistic Context. It would be really beneficial to see a similar framework for “solution-seeking” literature, so we can evaluate the applicability of techniques developed in software engineering research to “practical” problems.

Posted in academia, software-engineering | Tagged | Leave a comment

Java By Contract: a Worked Example

Java by Contract is an implementation of Design by Contract, as promoted by Bertrand Meyer and the Eiffel Software company, for the Java programming language. The contract is specified using standard Java methods and annotations, making it a more reliable tool than earlier work which used javadoc comments and rewrote the Java source code to include the relevant tests.

Which is all well and good, but how do you use it? Here’s an example.

The problem

There is a whole class of algorithms to approximately find roots to a function, using an iterative technique. Given, in Java syntax, the abstract type MathFunction that implements a function over the double type:

interface MathFunction {
    double f(double x);
}

Define the abstract interface that exposes such an iterative solution, including the details of its contract.

The solution

The interface is designed using the Command-Query Separation Principle. Given access to the function f(), the interface has a command findRoot(seed1, seed2) which locates the root between those two values, and a query root() which returns that root. Additionally, a boolean query exhaustedIterations() reports whether the solution converged.

Both of the queries have the precondition that the command must previously have successfully run; i.e. you cannot ask what the answer was without requesting that the answer be discovered.

The contract on the command is more interesting. The precondition is that for the two seed values seed1 and seed2, one of them must correspond to a point f(x) > 0 and the other to a point f(x) < 0 (it does not matter which). This guarantees an odd, and therefore non-zero, number of roots[*] to f(x) between the two, and the method will iterate toward one of them. If the precondition does not hold, then an even number of roots (including possibly zero) lies between the seed values, so it cannot be guaranteed that a solution exists to find.

In return for satisfying the precondition, the command guarantees that it either finds a root or exhausts its iteration allowance looking. Another way of putting that: if the method exits early, it is because it has already found a convergent solution.

Many, but not all, of these contract details can be provided as default method implementations in the interface. The remainder must be supplied by the implementing class.

/**
 * Given a mathematical function f over the doubles, and two bounds for a root to that function,
 * find the root using an (unspecified) iterative approach.
 * A root is an input value x such that f(x)=0.
 */
public interface RootFinder {
    /**
     * @return The function that this object is finding a root for.
     */
    MathFunction f();
    /**
     * A root to the function f() is thought to lie between seed1 and seed2. Find it.
     * @param seed1 One boundary for the root to f().
     * @param seed2 Another boundary for the root to f().
     */
    @Precondition(name = "seedGuessesStraddleRoot")
    @Postcondition(name = "earlyExitImpliesConvergence")
    void findRoot(double seed1, double seed2);
    /**
     * @return The root to the function f() that was discovered.
     */
    @Precondition(name = "guessWasCalculated")
    Double root();
    /**
     * @return Whether the iterative solution used the maximum number of iterations.
     */
    @Precondition(name = "guessWasCalculated")
    boolean exhaustedIterations();

    default Boolean guessWasCalculated() {
        return this.root() != null;
    }
    default Boolean seedGuessesStraddleRoot(Double seed1, Double seed2) {
        double r1 = f().f(seed1);
        double r2 = f().f(seed2);
        return ((r1 > 0 && r2 < 0) || (r1 < 0 && r2 > 0));
    }
    Boolean earlyExitImpliesConvergence(Double seed1, Double seed2, Void result);
}

Example usage

There are swaths of algorithms to implement this interface. See, for example, the book Numerical Recipes. Given a particular implementation, we can look for roots of a simple function, for example f(x) = x^2 - 2:

    RootFinder squareRootOfTwo = SecantRootFinder.finderForFunction((double x) -> x*x - 2);
    squareRootOfTwo.findRoot(1.0, 2.0);
    System.out.println(String.format("Root: %f", squareRootOfTwo.root()));
    System.out.println(String.format("The solution did%s converge before hitting the iteration limit",
            squareRootOfTwo.exhaustedIterations()?"n't":""));

This suggests that a root exists at x~=1.414214, and that it converged on the solution before running out of goes. Let’s see if there’s another root between 2 and 3:

Exception in thread "main" online.labrary.javaByContract.ContractViolationException:
  online.labrary.javaByContract.Precondition seedGuessesStraddleRoot had unexpected value false on object
  online.labrary.jbcTests.TestGeneratorTests$SecantRootFinder@29774679
    at javaByContract/online.labrary.javaByContract.ContractEnforcer.invoke(ContractEnforcer.java:92)
    at jdk.proxy1/com.sun.proxy.jdk.proxy1.$Proxy4.findRoot(Unknown Source)
    at javaByContract/online.labrary.rootFinder.RootFinder.main(RootFinder.java:10)

Whoops! I’m holding it wrong: the function doesn’t change sign between x=2 and x=3. I shouldn’t expect the tool to work, and indeed it’s been designed to communicate that expectation by failing a precondition.

[*] Nitpick: roots _or singularities_.

Posted in Java | Leave a comment

Updates to JavaByContract

Some improvements to JavaByContract, the design-by-contract tool for Java:

  • Preconditions, Postconditions and Invariants now appear in the Javadoc for types that use JavaByContract. While this is only a small source change, it’s a huge usability improvement, as programmers using your types can now read the contracts for those types in their documentation.
  • There is Javadoc for the JavaByContract package.
  • The error message on contract violation distinguishes between precondition, postcondition and invariant violation.

I’m speaking generally about moving beyond TDD, using JavaByContract as a specific example, at Coventry Tech Meetup next week. See you there!

Posted in Java | Leave a comment

Impossibility and Uncertainty in AI

About this paper

Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function), Peter Eckersley. Submitted to the ArXiV on December 31, 2018.

Notes

Ethical considerations in artificial intelligence applications have arguably been present since the birth of the field, if not earlier. Karel Čapek wrote R.U.R., the play that introduced both the word “robot” to our lexicon, and the question of whether such creations would want to work. Isaac Asimov and Mary Shelley both asked how a society with “natural” and “artificial” people would function.

Getting Trollied

More recently, as we’ve been able to create things that we can both identify as solving problems and market under the term Artificial Intelligence, the question of AI ethics has reappeared with different specialisations. Should we apply AI techniques to particular fields of endeavour at all, such as autonomous weaponry? Who should be responsible for a decision made, for an example, by an autonomous vehicle? How should such a vehicle make a decision when faced by the classic trolley problem?

Eckersley opens this paper with an aside, important to those who consider AI as an engineering discipline (which is not a clear statement in itself): autonomous vehicle research is a long, long way away from even contemplating the trolley problem. Currently, reducing the uncertainty with which obstacles are detected and identified from noisy input signals is a much more pressing issue.

However, human drivers are better at object identification, so may be sufficiently advanced to address the trolley problem should it arise in piloting a vehicle. And this is really an important point in thinking philosophically about how AIs should act: we already have intelligent actors, and already think about how they act.

Utilitarianism

Indeed, we already have artificial intelligences, including society-level AIs. A government bureaucracy; an enterprise; a policy document; all of these are abstractions of human intelligence behind machines, rules, and assumptions of how to work in given contexts. Any time that someone relies on external input to make a decision, whether that input is a checklist, a company handbook, a ready reckoner tool or a complex software-built decision support tool, you could replace that tool with an AI system with no change in the ethics of the situation.

Anywhere you could consider an AI making choices that are suboptimal to the welfare of the affected people, you could imagine those choices being suggested by the checklist or ready reckoner. None of this is to minimise the impact or applicability of AI ethics, quite the contrary: if we are only considering these problems now because we think AI has become useful, we are late to the party. Very late: what was the Soviet Gosplan other than a big decision support system trying to optimise a society along utilitarian lines?

The early sections of this paper summarise earlier proofs that it is impossible to define an objective utility function that will optimise for multiple, conflicting output dimensions. Not because it is hard to discover or compute the function, but because an acceptable trade-off may not exist. The proofs take this form:

  • imagine a guess at a solution, s.
  • there is a rule r1 which, applied to s, yields a better solution, s1.
  • there is a rule r2 which, applied to s1, yields a better solution, s2.
  • there is a rule rN which, applied to s(n-1), yields a better solution, s.

In other words, the different states that represent solutions cannot be “totally ordered” from “best” to “worst”; a cycle exists such that the question “which is the best” is paradoxical. While that has clear impact on AIs that use objective utility functions to choose solutions to their inputs, the wider implication is that no utilitarian system can objectively choose the “best” arrangement for welfare.

If you’re building the multivac from Isaac Asimov’s Franchise, then I’m going to have to disappoint you now: it’s impossible. But so is having a room full of people with slide rules and 383 different folders containing next year’s production targets. It’s not building an AI to optimise for the utility function that cannot be done; it’s finding a maximum on the utility function.

Finding a Workaround

A few techniques for addressing this problem are discussed and rejected: we could ignore the problem on the basis that our AIs are not currently important enough to have huge impact on social welfare. But AI is being used in welfare systems, in criminal justice systems, in financial management. And even where AI is not used, corporations and bureaucracies are already being applied, and these will have their decision support tools, which will eventually include AI.

Similarly, simplifying the problem is possible, but undesirable. You could define some kind of moral exchange rate to reduce the dimensionality of your problem: this much wealth in retirement is worth this much poverty today; this much social equality can be traded for this much value of statistical life. Or you could accept one or more of the undesirable properties of a solution as axiomatic; we can solve for inequality if we accept poverty, perhaps. Neither of these are ethically unambiguous.

Embracing Uncertainty

Ultimately, Eckersley concludes that the only feasible approach is to relax the requirement for objective, total ordering of the possible outcomes. One way is to introduce partial ordering of at least two of the comparison dimensions: to go from saying that a solution is { better than, worse than, the same as } another along some axis to saying it is { better than, worse than, the same as, incomparable to } the other solution. While this breaks the deadlock it’s difficult to work with, and domain-specific interpretations of partial ordering will be needed. And now we’re back at the trolley problem: there are multiple outcomes to choose from, we don’t know (because it’s impossible to say) which is better, and yet a decision must be taken. Which?

A second model replaces the definite ordering of solutions with probabilistic ordering. Rather than the impossible statement “outcome B is better than outcome A”, the pair of statements “outcome B has probability p of being better than outcome A” and “outcome A has probability (1-p) of being better than outcome B” are given. Now a system can find the outcome with the highest likelihood of being best, or pick from some with sufficient likelihood of being best, even though it is still impossible to find the best. There will always be some chance of violating a constraint, but those should at least be free of systematic bias.

Implications

If your AI strategy is based on the idea that your domain can be “solved numerically”, by finding some utility function or by training an AI to approximate the utility function, then you need to ask yourself whether this is true without introducing some biases or unethical choices into the solutions. Consider whether your system should find what is “best” and do it, or find actions that are “good enough” and either choose between them, or present them for human supervision.

Posted in academia, AI | Leave a comment

The App that Wasn’t (Yet)

One of the early goals written into the mission statement of the Labrary was an eponymous app for organising research notes. I’ve used Mekentosj Springer Readcube Papers for years, and encountered Mendeley and others, and found that they were all more focussed on the minutiae of reference management, rather than the activity of studying and learning from the material you’re collecting in your library. Clearly those are successful apps that have an audience, but is there space for something more lightweight?

I talked to a few people, and the answer was yes. There were people in software engineering, data science, and physics who identified as “light” consumers of academic literature, people who read the primary literature to learn from and find techniques to apply, but do not need or even want the full cognitive weight of bibliographic reference management. They (well, “we”, I wanted it too) wanted to make notes while they were reading papers, and find those notes again. We wanted to keep tags on interesting references to follow up. We wanted to identify the questions we had, and whether they were answered. And we wanted to have enough information—but not more—to help us find the original article again.

My first prototype was as simple as I could make it. There’s a picture below: it’s a ring binder, with topic dividers, and paper notes (at least one separate sheet for each article) which quickly converged on a pro forma layout as shown.

An early prototype of the Labrary app.

An early prototype of the Labrary app.

I liked it, in fact I quickly got to a point where I wouldn’t read an article unless I had access to a pad and pen to add a page to my binder. People I showed it to liked it, too. So this seemed like a good time to crack open the software making tools!

The first software prototype was put together in spare time using GNUstep and Renaissance, and evinced two problems:

  • The UI design led back down the route of “bibliopedantry”, forcing students to put more effort into getting the citation details correct than they wanted to.
  • Renaissance lacked support for some Cocoa controls it would have been helpful to use, so there was a choice to be made to invest more into improving Renaissance or finding a different UI layout tool.
A screenshot of the ill-fated "Library" window in Labrary's GNUstep prototype.

A screenshot of the ill-fated “Library” window in Labrary’s GNUstep prototype.

This experience made me look for other inspiration for ways to organise the user interface so that students get the experience of taking notes, not of fiddling with citation data. I considered writing Labrary as a plugin for the free Calibre e-reader app, so that Labrary could focus on being about study notes and Calibre could focus on being about library management. But ultimately I found the tool that solved the problem best: Apple’s Finder.

The Labrary pro forma note as Finder stationery.

The Labrary pro forma note as Finder stationery.

I’ve recreated the pro forma note from the binder as a text file, and set the “Stationery Pad” flag in the Finder. When I open this file, Finder creates a duplicate and opens that instead, in my editor of choice: ready to become a new study note! I put this in a folder with a Zim index file, so I can get the “shoebox” view of all the notes by opening the folder in Zim. It also does full-content searching, so the goal of finding a student’s notes again is achieved.

Zim open on my research notes folder.

Zim open on my research notes folder.

I’m glad I created the lo-fi paper prototype. It let me understand what I was trying to achieve, and show very quickly that my software implementation was going in the wrong direction. And I’m always happy to be the person to say “do we need to write this, or can it be built out of other bits?”, as I explored for this project with Zim and Calibre.

Posted in architecture of sorts, design | Leave a comment

Research Watch, and Java by Contract

I introduced Java by Contract, a tool for building design-by-contract style invariants, preconditions and postconditions in Java using annotations. It’s MIT licensed, contributions are welcome, and I hope this helps lots of people to introduce stronger correctness checking into your software. And book office hours if you’d like me to help you with that.

Java by Contract came about as part of Research Watch, a new blog series over at The Labrary where I talk about academic work and how us “practitioners” (i.e. people who computer who aren’t in academia) can make use of the results. The first post considers a report of Teaching Quality Object-Oriented Programming to computer science students.

By the way, I will be speaking at Coventry Tech Meetup on 10th January on the topic “Beyond TDD”, and Java by Contract will make an appearance there.

Long-time SICPers readers will remember Programming Literate, a Tumblr discussing results from empirical software engineering. And if you don’t, you’ll probably remember your feeds exploding on July 15, 2013 when I imported all of the posts from there to here. You can think of Research Watch as a reboot of Programming Literate. There’ll be papers new and vintage, empirical and opinionated, on a range of computing topics. If that sounds interesting, subscribe to the Labrary’s RSS feed.

Posted in academia, Java, OOP | Leave a comment

Teaching Quality Object-Oriented Programming

About this paper

Teaching Quality OOP by Yishai A. Feldman, published March 2005 (see the link for full citation).

Notes

One of the points made in my book Object-Oriented Programming the Easy Way is that objects should be specified by their interfaces through contracts, which say what messages the objects respond to, how you use them, and what happens as a result.

While it is up to any one object to decide how it responds to messages, we need to know whether that object represents a useful addition to our system. In other words, we want to know what the object will do in response to what messages. (Page 61)

The book Structure and Interpretation of Computer Programs says the same thing. An abstract data type has a collection of things that can be done, and stuff that happens when you do it.

In general, we can think of data as defined by some collection of selectors and constructors, together with specified conditions that these procedures must fulfill in order to be a valid representation. (§2.1.3)

This seems evident. Knowing that I have a int count(), a boolean contains(Object o) and a void add(Object o) method is insufficient, I need to know how they interact before I can use them. For an array, given:

int x = a.count();
a.add(anObject);
int y = a.count();

you would always expect y - x == 1. For a set, it would depend on the content of the collection before addition; it could be 1 or 0. Knowing what methods are called is insufficient to know what type I am dealing with.

Why is it, then, that programming languages give you types for expressing the methods on an object, but not what they do or how they relate? Why does a Java interface, a Swift protocol, or a C++ abstract class only have the part of the contract related to names, not behaviours?

Famously, the Eiffel language addresses this, and it’s here that we come into contact with the paper that is the topic of this post. Eiffel, and its underlying theory, is well-described in the book Object-Oriented Software Construction by Bertrand Meyer, CTO of Eiffel Software and researcher at ETH Zurich. Feldman wanted to teach his students the theory of “Quality Software” based on two principles that are well-described in OOSC:

  • Design by Contract, because it’s more approachable and usable than formal methods, and more useful than testing; and
  • Command-Query Separation, because it isolates state changes, making it easier to draw conclusions about the behaviour of a software system.

But he didn’t want to teach them Eiffel, because:

it might leave the students with the mistaken and harmful impression that quality programming is confined to one language. (§3)

Additionally, OOSC does not include any exercises, so is not useful as a teaching support book. I will note here that Meyer has also written A Touch of Class, which does have supporting teaching material including exercises, but still uses the Eiffel language and therefore only solves half of the author’s problems.

Feldman taught the theory from OOSC using Eiffel notation, and encouraged students to complete exercises that were in variants of the Java programming language. Tools available at the time read the contract out of special additions to the class’s Javadoc comments, and modified the source code to include assertions at the relevant points in execution.

This led me to wonder about modern Java syntax, and whether it’s possible to make a similar tool using Java’s annotation features so that programmers don’t have to worry whether a source conversion tool has introduced errors, or trace changes to the source code when working back from a failure report to the broken source.

The answer is yes, and so now the Labrary can offer Java by Contract as a tool for designing Java types by contract. It encodes the parts of the contract as names of methods to invoke that return boolean, failing if the answer is false. A rewrite of the Map interface from page 13 of the paper is given below.

public interface Map {
  /**
   * Does the key k appear in the map?
   */
  @Precondition(name = "nonNullK")
  @Postcondition(name = "inMapIffInKeys")
  boolean has(Object k);

  default boolean nonNullK(Object k) {
    return (k != null);
  }
  default boolean inMapIffInKeys(Object k, Boolean result) {
    return (result == this.keys().has(k));
  }

  /**
   * The value of the map at key k, null if undefined.
   */
  @Precondition(name = "nonNullK")
  @Postcondition(name = "nonNullValueIffHasKey")
  Object item(Object k);

  default boolean nonNullValueIffHasKey(Object k, Object ret) {
    return ((ret != null) == this.has(k));
  }

  /**
   * Associate key k with value v.
   */
  @Precondition(name = "nonNullKeyAndValue")
  @Postcondition(name = "hasK")
  @Postcondition(name = "itemForKIsNowV")
  void put(Object k, Object v);

  default boolean nonNullKeyAndValue(Object k, Object v) {
    return ((k != null) && (v != null));
  }
  default boolean hasK(Object k, Object v, Void result) {
    return this.has(k);
  }
  default boolean itemForKIsNowV(Object k, Object v, Void result) {
    return (this.item(k) == v)
  }

  /**
   * Remove key k and associated value from map.
   */
  @Precondition(name = "nonNullK")
  @Postcondition(name = "doesNotHaveK")
  void prune(Object k);

  default boolean doesNotHaveK(Object k, Void result) {
    return !this.has(k);
  }

  /**
   * The set of all keys in the map.
   */
  @Postcondition(name = "nonNullReturn")
  ReadOnlySet keys();

  default boolean nonNullReturn(ReadOnlySet result) {
    return (result != null);
  }
}

This approach lets us write contract conditions that have full access to the internal state of the objects they are implemented on, so internal invariants can be verified in addition to the properties explored through the interface (by specifying them on the implementing class, not on the declaring interface). As a few pages of the paper are dedicated to the author’s (and the class’s) experience with various design by contract tools and their shortcomings, it’s valuable to explore this space and come up with better approaches. I hope that Java by Contract is a useful addition.

Feldman additionally notes that the Java class library, including the Collections library, violates the Command-Query Separation principle. He even notes that the language does: the construct:

x++;

is both a command (increment x) and a query (what value did x previously have?) in a single expression. The early exercises in his class instruct students to write CQS-satisfying library objects (hence the Map example above), which the subsequent exercises build on.

Feldman left academia the year after this paper was written, and is now at IBM Research. That means the trail of development of this class has run out, and we cannot say how the teaching would have adapted to the subsequent decade of progress in Java.

Posted in academia | Leave a comment