Cruft (noun) [INFORMAL | COMPUTING] : badly designed, unnecessarily complicated, or unwanted code or software.
First and foremost, keep things simple…
As much as possible, keep everything smaller. Smaller projects are better. Smaller packages are better. Smaller interfaces are better. Smaller classes are better. Smaller functions / methods are better. Smaller tests are better.
Make sure any developer on the team (or a new joiner) can encompass the entirety of your “unit” in their mind. This reduces accidental complexity and makes everything else easier.
When working with existing code which is intimidating, try to think of ways to break it down into smaller pieces and to simplify. Think of ways to reduce it. Deleting code (when done with care) can be as valuable as writing new code.
In particular, strive to keep functions concise. Bigger functions create bigger problems and are more difficult to test.
begin by simplifying the logic
“Branching” is the term used to describe a point in the code where the business logic can go in one of several different ways. Branching adds complexity, as each possible path needs to be understood and tested in isolation. The way to reduce branching is to use fewer conditional statements (like
if). Code that uses a lot of
if statements (or other conditional logic) has many branches, which makes it difficult to follow, to debug and to test.
“Side-effects” is a term we use to describe any way in which the code affects its environment or the flow of execution, other than a function returning a value. Side effects, like branching, also increase the logical complexity of the code, making it less readable, difficult to follow and difficult to test.
As much as possible, minimise and isolate side effects. This includes not only I/O, network calls and OS interrupts, but also raising exceptions and modifying the internal state of data structures defined outside the function.
Of course, both branching and side-effects are unavoidable – they are essential building blocks of any useful software system. The important thing is to remain conscious about the added complexity and do our best to stay on top of it.
Try to keep side effects higher up in the call stack. This helps make code more reusable and simplifies testing by reducing the need to use mocks.
Keep your code clean
Keep function signatures clean, readable and consistent. This helps form team habits and expectations which increase productivity.
Make good use of type annotations / type hints in Python. They make the codebase more readable. The type checker is an essential tool in the build process that can catch both trivial errors and unexpected business logic behaviours. (Make sure to correctly use
int, etc. ) There is no need to go overboard with annotations. Try to keep your annotations clean and simple. Messy annotations are often symptoms of design that can be improved (e.g. try to choose better data structures or refactor into even simpler functions).
Limit the use of class inheritance. Keep inheritance hierarchies shallow. Overuse of inheritance makes code difficult to follow and often results in complex objects with unnecessary properties and methods. See also: “Liskov substitution principle”.
Keep your tests clean
Keep tests clean, readable and concise. Avoid patching and mocking as much as possible. Messy tests are a symptom of messy code.
Remember, unit tests are meant to exercise units. Units should be small. Unit tests should be small.
A classical unit test provides some controlled input to a function (or method) and makes assertions about the return value.
Avoid writing tests which only assert on log entries or other side effects. If it’s impossible to test the function in any other way, it may be better to re-design the function to be testable in a more classical way.
Avoid writing tests that are overly concerned with HOW a method does what it does. E.g. tests which only make assertions about what other functions a method calls are missing the point of unit testing. Again, if there is no obviously better way to test that method, it may be better to spend time refactoring it instead of coming up with over-complicated but meaningless assertions.
When testing exceptions in Python, make use of
assertRaises() instead of wrapping the function in a try / except block and failing the test explicitly in the else clause.
Whenever a more elaborate setup is required, which involves a lot of test data (e.g. configuration JSON, etc.) – put the test data in an external fixture, instead of polluting the main test code with large blobs of data, making it difficult to read and maintain.
Keep improving your mastery of the language
Make the best possible use of the Python language (or whichever language you are using). Avoid reinventing the wheel or using libraries where a simpler language construct can do. In Python, make sure to master the use of list and dictionary comprehensions; the ternary operator (
"a if condition else b"); the use of “
or" for fall-back values / defaults, generators, etc.
Continuously improve your knowledge of the (rather vast) Python standard library and the functions and constructs available, which do not require any external dependencies.
Learn a bit about functional programming and try to apply the main principles to your Python code as much as possible. Use the higher-order features of the language and the standard library for cleaner, more concise and more powerful functional programming solutions to common problems.
Reduce reliance on external dependencies
Scrutinise all dependence on external libraries and frameworks to keep them to the necessary minimum. Dependency cruft is a real thing – and it can weigh a lot.
As much as possible, ensure use of the latest releases of all libraries and tools (including Python itself.) The problems you’ll run into with “bleeding edge” tools are less common than the problems you’ll have with out-of-date tools.
In your Python virtual environment, make use of pip-tools to stay on top of your
requirements.txt files and dependency versions.
Do your best to name things better
Names are important (except when they’re not – see next paragraph). It’s notoriously hard, but try your best to come up with short, concise, yet meaningful names to files, modules, classes, functions, variables, etc. It is generally safe to assume that whoever is reading your code has enough domain knowledge as to be familiar with the terminology in the context, but whenever possible, avoid too much jargon and stick to simpler terms / names.
The scope of named things is also important. For very short-lived, temporary variables (for example, those used in one-liners) it’s OK to use much shorter symbolic names (e.g.,
n) for the sake of brevity. For longer-lived, more important names, if the value of (or use of) the variable changes in a fundamentally different way to what it was when you first named it – make sure to change the name as well. You don’t have to be extra frugal when dispensing new variables in those circumstances. It’s far more important that
original_values changes to
processed_values once it’s been modified, than to save a couple bytes of memory by stubbornly sticking to the old variable.
Avoid shadowing, redefining or overloading variables (or other names). This
makes logic difficult to follow by humans and often leads to mistakes. It also
makes code difficult to follow by static analysis tools – e.g. type checkers
(MyPy / PyTypes spin extra hard to try and infer all possible types each time a
variable is redefined, which can lead to false positives as well as inefficient
type inference). Be particularly careful to avoid shadowing built-in keywords or
library names (e.g.