Home Notes On Laszlo Sragner's PyData Talk On Code Smells
Post
Cancel

Notes On Laszlo Sragner's PyData Talk On Code Smells

Introduction

This post is my notes from the video Laszlo Sragner - Code Smells in Data Science: What can we do about them? | PyData London 2023. I don’t necessary agree with all the points made, but I echo them as they’re delivered.

Notes

Motivation

Why do we care about code smells?

  • Programming is communication
  • Communication needs a language
  • We read more than we write
  • Issues need standardized solutions
  • Drive for productivity

Code smells are not bugs, and don’t need immediate attention. They’re also not technical debt or code rot. Code smell might cause problems, but not necessarily. A code smell can prevent changes to the code. There are various named code smells. Code smells have (or should have) recipes to resolve them.

Readability

  • Dead and unreachable code
    • Delete code
  • Comments explaining code
    • Delete comments
    • Rename variables
    • Extract helpers
  • Excess varaibles
    • Inline
    • Comprehensions
  • Improper variable scope
    • Move lines with the same variables together
  • Too many levels: if branches
    • Extract gaurd clauses
  • Too many levels: for loops
    • Use comprehensions
  • Multiple returns

Outcomes

  • The code is one continuous logical flow
  • Guard clauses on the top
  • “Happy path” on the left
  • Variable lifecycle is contained
  • Return in the last line

Bloaters

  • Bloaters -> Extract class
    • Long parameter lists
    • Data clumps
    • Primative obsession
  • Couplers -> Extract method
    • Feature envy
  • Boolean parameters -> Dependency Injection
    • Move code to a class
    • Instantiate and call it in main()
  • Couplers

Establishing Culture

  • Code review
    • Programming is communication
  • Total cost of ownership
    • Managing long and short term goals
  • Developer happiness
    • Autonomy
    • Mastery
    • Relatedness
  • Drive cultural change
    • Increase velocity

Takeaways

  • Programming is communication
  • Concentrate on the dataflow
  • Identity problem areas
  • Prepare the code
  • Identity code smells
  • Use refactoring recipes
  • Evaluate the TCO of code
  • Establish a culture

Commentary

I agree that programming is sometimes a form of communication. It can be a form of communication to others or yourself in the future. There is no objective way to do this. Some person will look at a piece of code and then decide it is very clear, but another person could look at that code and decide that it is very unclear. How code is written for communication matters, but it matters in a statistical way. And yet high-quality empirical research to characterize this is rare because producing such studies requires an extraordinary amount of work.

Communication doesn’t necessarily require a language, at least in the linguist’s sense of the term. You can communicate with pets or wildlife that do not have a language. For computer programmers a useful language is obviously the source code. But not just the source code. Patrick Viafore’s Robust Python: Write clean and maintainable code discusses multiple modes of communication (executable) code, comments,version control history, tests, wikis, in-project documentation, video recordings, design documentation, agile boards, email, instant messaging, direct communication, talks, code review, and meetings. Each of these modes vary by their cost and the degree of proximity required. Some modes are synchronous, and others asychronous.

I’ve heard this “we read more than we write” claim before. I don’t know if it is true in general. On some projects I have read way more than I coded, and other times I coded pretty much the whole time. I suspect this just depends too much on context.

Having standardized solutions to well-understood issues makes sense. Issues that are not well-understood are not ready for a standard solution.

Productivity is definitely an important aspect of programming that the end-users/clients often appreciate. As long as it is genuine productivity and not just speed running towards something mediocre or worse.

Code smells are pretty much just people’s intuitions about what is good or bad. They can come up with rationalizations, but they should often be taken with a grain of salt since what is good for human productivity isn’t always intuition and sometimes there is subjectivity about what is better/worse anyway.

This post is licensed under CC BY 4.0 by the author.

Pareto World

Which Scikit-Learn Models Have Built-In Uncertainty Quantification?