Don’t Repeat Yourself (Meta Edition)

“Don’t Repeat Yourself”

Developers should all be familiar with this mantra, but we tend to think of it only as it relates to the code, and not the coder. It’s the second part that I want to talk about, so skip ahead if you want to, but I’m going to talk about the traditional DRY (Don’t Repeat Yourself) principle first.

Don’t Repeat Your Code

We factor out code all the time. We move things to a base class, or extract it to a helper so that we only have one copy to maintain. If we find a flaw, we fix it once, in one place, and the rest of the code reaps the benefits.

This is infinitely better than the copy/paste reuse model in which we’d have to spend not only the time to fix the problem, but to hunt down all of the other places where this code exists and fix them too. If, that is, you are lucky enough to work on a team that values craftsmanship and long-term maintainability of the codebase. All too often, you find yourself in a situation where some piece of code has been copied and pasted into a few locations before anyone realized that it was becoming into a pattern. One developer wrote something he needed, and then someone else needed to do either the same thing, or something very similar. It’s the “very similar” where things start to fall apart.

The problem is that there are now two copies of very similar code, but they’re not quite identical. If a flaw is found, someone has to find all the instances of that flaw and correct them. This is more time-consuming than fixing one single shared instance of that code. It gets worse though. Since the instances aren’t guaranteed to be identical, how do you know if you found them all? If the variable names are different in each instance, what are you even going to search for? Maybe you’ll get lucky and there are a couple keywords in a row that aren’t instance-specific, and you can search on those to find all of the copies. Hopefully, you recognize the pattern and extract it before it becomes widespread.

Once is unique, twice is coincidence, three times is a pattern. The “Rule of Three” states that we should have refactored out the first two instances as soon as we needed the same code a third time. Hopefully you don’t work in a control-freak environment where you are prevented from touching the first two instances because they’re not “your code”, or they are locked down because the original features they were written for are considered “done” and are not to be touched in any way ever again.

Don’t laugh. I know places like that.

That’s why we have unit tests in the first place. I want to be able to prove that my change didn’t break someone else’s code. I can only do that if I have adequate coverage of that code in the first place. With good test coverage, I can refactor with wild abandon, and know that I haven’t changed any of our documented assumptions about how the code should behave.

Don’t Repeat Your Mistakes

This is what I’m really here to talk about today. I want developers to start thinking about how the DRY principle relates to their daily activities, and in particular how it relates to our mistakes. We’re human, we forget things, we slip up, we communicate poorly with our team members, and things fall through the cracks as a result. When we find the resulting flaws, our instinct is to fix them and move on. The next time you find a problem, I’d like you to stop and think about whether you can do anything to prevent it from happening again in the future. Can you stop anyone else from making the same mistake?

One way I like to combat this problem is through unit testing, but not in the way you’re thinking. Developers who are “test infected” like to put tests around everything, but most stop at the immediate feature at hand. When a bug is discovered, we write a test that demonstrates it by failing (Red). We take our best stab at fixing the bug until the system works (Green). Finally, we clean up after ourselves, extract and refactor code to help out the next guy, and use the new test to check that we didn’t break the code in the process (Refactor).

Whenever possible, I like to write a “guard rail” test that lets me know whether anyone else has done the same thing. Here are just a few examples of this kind of test.

Missing Parameters

Here’s a unit test that uses reflection to let me know if any of my API controller endpoints have unused route parameters.

[Test]
public void Path_parameters_must_match_method_parameters()
{
    var apiAssembly = typeof(Controller).Assembly;
    var baseType = typeof(Controller).GetTypeInfo();
    var controllerTypes = apiAssembly.GetTypes()
        .Where(x => !x.IsGenericTypeDefinition
            && !x.IsAbstract && baseType.IsAssignableFrom(x));
    var violations = new List<string>();
    var regex = new Regex(@"(?<=\{)[^}]*(?=\})");

    foreach (var controllerType in controllerTypes)
    {
        var methods = controllerType.GetMethods(
            BindingFlags.Public | BindingFlags.Instance);
        foreach (var method in methods)
        {
            var attribute = method.GetCustomAttributes()
                .OfType<HttpMethodAttribute>().FirstOrDefault();
            if (attribute?.Template != null)
            {
                var routeParameters = regex.Matches(attribute.Template)
                    .Select(x => x.Value);
                var methodParameters = method.GetParameters()
                    .Select(x => x.Name);
                var unmatched = routeParameters.Except(methodParameters);
                violations.AddRange(unmatched.Select(x => 
                    $"{controllerType.Name}.{method.Name} - {x}"));
            }
        }
    }

    if (violations.Any())
    {
        Assert.Fail($"The following route parameters have no matching method parameters:\r\n  {string.Join("\r\n  ", violations)}");
    }
}

This particular test depends on the fact that we’re defining our routes directly in the HttpMethodAttribute (HttpGet, HttpPost, etc.), but it could easily be adjusted to account for explicit RouteAttribute usage as well.

The test extracts the route from the parameter, uses a RegEx to find all of the parameters in the format {param}, and then compares them to the method’s parameters. If there are any unused route parameters, the test fails.

Note: I probably could have collapsed the nested foreach loops into a single LINQ projection if I wanted to (I’m known for that), but in the case of this test, the readability was more important. I wanted other developers to be able to look at this and see exactly what was going on.

So what have we accomplished? Well, for one thing, no-one can accidentally leave out a route parameter without breaking the build. I’ve not only found and fixed my immediate a problem, but I’ve prevented myself or anyone else from repeating the mistake in the future.

Bad References

Here’s another example. Tools like ReSharper make it extremely easy to add missing references to a project, or “using” statements to a class. Tools make it so easy, in fact, that you can accidentally add a reference to something you shouldn’t. Here’s a test I made to make sure that no-one adds a reference from the API assembly to the Data assembly. There’s a service layer in between these two assemblies for a reason.

[Test]
public void The_Api_project_should_not_reference_the_data_project()
{
    var apiAssembly = typeof(Api.Startup).Assembly;
    var dataAssembly = typeof(Data.Startup).Assembly;
    var referencedAssemblies = apiAssembly.GetReferencedAssemblies();
    referencedAssemblies.Any(x => x.FullName == dataAssembly.FullName)
        .ShouldBeFalse("The Api assembly should not directly reference the data assembly.");
}

It’s simple, right, but it’s going to save me a huge headache caused by an errant “Alt-Enter-Enter”. I litter my test projects with these kinds of “structural” tests to raise a red flag if anyone repeats a mistake I’ve made in the past.

Right Concept, Wrong Place

The difference between Unit and Integration tests seems obvious to me. The former tests system components in isolation from one another, the latter tests the complete, fully-assembled, real-world system from end to end. It may do so against a fake database, or on a secondary network that simulates the real production system, but the important thing is that it’s the real code doing what the real code will really do. To make sure someone doesn’t get the testing patterns in the wrong place, here’s a one line test to make sure that the integration test assembly doesn’t touch the mocking framework.

[Test]
public void Integration_tests_should_not_reference_Moq()
{
    GetType().Assembly.GetReferencedAssemblies()
        .Any(x => x.Name == "Moq")
        .ShouldBeFalse("The Integraton test assembly should not reference Moq.");
}

A fresh, junior member of the team that’s new to the concepts and the differences between them will be reminded by this test if they are applying the right patterns to the wrong set of tests.

Right Attribute, Wrong Layer

In my most recent project, the DTOs (Data Transfer Objects) are returned from an API to a public consumer. They are, more or less, the ViewModels of this system, and as such, they need to be validated when they are posted back to an endpoint. We can do this easily through the use of the DataAnnotation attributes, but we need to make sure we’re using the right one. MaxLengthAttribute is used to control how Entity Framework will generate migrations, whereas StringLengthAttribute is used by MVC to validate models. They are very similar, but they are not the same. It’s very easy to slip up and use the wrong one, and then your validation won’t work.

These two tests work in tandem to make sure that you are using the StringLengthAttribute, and that you aren’t using the MaxLengthAttribute on any DTOs. Of course, this depends on your DTOs having a common base class.

[Test]
public void String_Dto_properties_should_not_use_MaxLengthAttribute()
{
    var baseType = typeof(Dto).GetTypeInfo();
    var types = baseType.Assembly.DefinedTypes
        .Where(x => !x.IsAbstract && baseType.IsAssignableFrom(x));
    var propertyTypes = types.SelectMany(x => x.GetProperties())
        .Where(x => x.PropertyType == typeof(string) 
            && x.GetCustomAttributes<MaxLengthAttribute>().Any())
        .Select(x => $"{x.DeclaringType.FullName}.{x.Name}");
    if (propertyTypes.Any())
    {
        Assert.Fail($"The MaxLengthAttribute is for controlling database generation. DTOs should use the StringLengthAttribute instead.\r\nThe following String DTO properties incorrectly use the MaxLength attribute:\r\n  {string.Join("\r\n  ", propertyTypes)}");
    }
}

[Test]
public void String_Dto_properties_should_specify_StringLength()
{
    var baseType = typeof(Dto).GetTypeInfo();
    var types = baseType.Assembly.DefinedTypes
        .Where(x => !x.IsAbstract && baseType.IsAssignableFrom(x));
    var propertyTypes = types.SelectMany(x => x.GetProperties())
        .Where(x => x.PropertyType == typeof(string) 
            && !x.GetCustomAttributes<StringLengthAttribute>().Any())
        .Select(x => $"{x.DeclaringType.FullName}.{x.Name}");
    if (propertyTypes.Any())
    {
        Assert.Fail($"The following String DTO properties have no StringLength attribute:\r\n  {string.Join("\r\n  ", propertyTypes)}");
    }
}

Check Yourself

These are just a few examples of kind of “structural” or “compliance” tests that I write on a typical project. If I can stop myself from making the same mistake twice, then that’s good, and will save me time down the road in the form of bugs I don’t have to investigate.