I initially wanted to make an alternative to tarsnapper, and then I realized there is no need to make it tarsnap-specific because I have a few other periodic backups that could also use the same cleanup system. And the Unix philosophy is a great guide here:
Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.
Expect the output of every program to become the input to another, as yet unknown, program. Don’t clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.
I thought of the program in 2021 and started the implementation that March. The core of the program is the time filtering algorithm, so it was a good opportunity to try writing property-based tests more seriously. It went well at first for simple properties (like never remove times newer than “now”), but quickly became very complicated for more complex properties that needed to generate different times and verify the algorithm results. So I got bogged down and couldn’t continue. The old history is in the repository, before the _start tag.
I restarted the project this January, and it’s been more successful this time. Most of the unit tests are property-based tests, although I had to add several example tests for certain cases where defining and implementing certain properties would be hard, and even harder to understand later. Example tests are still easier to understand for more complex scenarios, but they won’t uncover edge cases unless you think of them yourself.
Ultimately I realized that a lot of the difficulty here comes from calendar-based date manipulations, which is hard to formalize due to the variable number of days in the months. For example, 2023-05-31 - 1 month = 2023-04-30 because there is no2023-04-31, but 2023-04-30 + 1 month = 2023-05-30, not 2023-05-31. Therefore, the roundtrip property (subtracting a time interval from a time and adding it back is the same as the initial time) does not always hold.
In the new implementation, I split the GFS cleanup process into three steps (the GFS module):
given the GFS ranges (defining time intervals where only a single backup should be kept) and the time “now”, generate a list of checkpoint times, between which a single backup should be kept;
determine the extra times to remove;
ignore (don’t remove) the newest time.
The pieces are implemented in the GFS.Internal modules. This was very helpful to manage the complexity of tests: there are property-based tests for each part; even though the parts themselves are small, implementing data generators for properties of the entire process is complicated.
I did get some deeper understanding of property-based tests.
Getting arbitrary values
There are two ways of receiving arbitrary values from QuickCheck.
The first is property $ \foo bar -> … where you get arbitrary foo and bar in your function; two parameters is just an example, it can accept 1+ parameter based on your needs. These parameters are automatically printed on test failure (using their Show instances). In order to use this, the types have to be instances of Arbitrary.
The second way is to use forAll chooseX $ \(baz, bax) -> … inside property to get additional generated values. This is more ad-hoc and lightweight because you only need to define a function chooseX :: … -> Gen (Baz, Bax). The benefits are: chooseX can accept parameters, you can easily have many different variants of the generators specific for each test case, it doesn’t require a newtype (see below). In order to display the values, or any other information, when the test fails, use the counterexample function.
On the other hand, Arbitrary instances can provide values for shrinking (I couldn’t figure out a good approach to shrinking in this project).
It makes sense to write simpler tests first, which means more generic tests first. For example, given a function that takes a list of integers and removes the maximum value, we can come up with these properties from less to more specific:
“non-empty list’s size is decreased” — the function should always remove something from a (non-empty) list; in this case, it’s easy to figure out that exactly one item is removed:
“non-empty list’s size is decreased by 1” (assuming no duplicates);
“non-empty list’s maximum value is removed” — see below.
Repeating the implementation code in tests in order to verify a property is no good. I realized that it’s often possible to generate input data knowing the expected result. For example, given the same function that removes the maximum value from a list of integers, there are (at least) two ways to test the “non-empty list’s maximum value is removed” property:
generate a list of integers, get the expected result by removing the maximum value, call the function with the original list, verify the result — this very likely duplicates the implementation code;
generate an integer, which will be the maximum value, generate a list of integers smaller than the maximum, add the maximum to the list and call the function, verify the result with the generated list — this doesn’t duplicate the implementation code by generating the expected values first.
Types are important. You could add an Arbitrary instance for a base type (e.g., NominalDiffTime) or your own type in the test target, which generates an orphan instance warning because both the type and the Arbitrary typeclass are defined outside. This is not an issue in this case, but a better way is to add a newtype wrapper like this:
Overall, there have been a few challenges even in such a small project. As usual, there is more testing code for the algorithm than the algorithm code itself; this is necessary to ensure it that it does work as expected (up to a point) because removing backups should be done lightly. I’m happy that I have a setup for cleaning older Tarsnap backups, and want to slightly extend the CLI to make it easier to work other systems.