Tests – living documentation – or are they?

One of biggest advantages of tests is them being live documentation – one that screams and goes red if you go against it’s advice. I decided to follow Daniel Lindner’s idea, and put this to test.

So, this Saturday I had a very special code kata.

We had teams (pairs, and one triplet) working on three assignments for 3 hours. Then, we had pizza break and a revelation.

– Guys, now please delete all the implementation code, leave just tests. Next, choose a team that had a different assignment than you had, and give them your tests, while they give you theirs. Next, using those tests, recreate the code they deleted, and let them have at your tests recreating your code.

Oh, the faces they made!

Next part of the exercise was dedicated to code recreation, without speaking to the people who created it, using just tests. With that, I hoped we will have some nice clues about how much tests work like a living documentation.

Not here

There is a lot I’ve written about the event and a lot I could have written more. But, blog posts should be dedicated to their topic, so I’ll skip the information on preparation, how it went, the organization and about the idea itself, how to organize this… Sometime this week, I hope.
Finally, there are lessons I learned, as organizer. I could have done better. This, I’ll share perhaps even today.

For now, I’ll focus on test as live documentation. Starting with teams feedbacks and adding my own try to reconstruct code from test. I started Sunday evening (10PM), and I finished around 2AM with one code fully recreated (I believe). I took notes while I did it, so all my notes are fresh.

Event retrospective

We discussed a lot (over an hour) and had a lot of ideas. I’ll skip all those NOT dealing with reconstruction (bulk, unfortunately).

  • make short coding session (an hour) and make the switch then, then make a dojo with feedback from reconstruction
  • surprise is essential 😀
  • verbose names may be problematic, some very long test names had their opponents
  • naming was only mentioned to be a problem by two people
  • environmental problems can last quite some time, imposing the language from the get-go is a mighty fine idea
  • recording the feedback is nice, but won’t it be too long to watch later?
  • false TDD is easy to spot – you make all tests pass with nulls and/or other minimal implementation
  • more tests = better

Unfortunately there was little time spent on reconstruction – this caused us not to have as many insights into tests being live docs as I hoped we would. Well, next time.

My take at reconstructing

GitHub link – TBA once I categorize the code and push it at last. I’ll create a separate branch for just tests, in case one wants to recreate this on his own. If you wish to first try your own hand at this, later read, stop here.

Despite trying the assignments, I had yet to try rewrite code from tests. I first took a bite at code that was discussed at most length during retrospective, the one with long test names.

First look

First I just looked at it. Remembering the discussion points I was quick to learn that indeed

  • testing hands comparison was divided into Equal Cards and Not Equals Cards
  • tests were grouped into given when then, some even had comments – that was cool, I immediately felt somewhat at ease
  • tests had VERY long names, VERY descriptive… if you, as I did, knew the convention beforehand. If you just wanted to look in IDE test window… the normal size gave you little info
too long names can be detrimental
too long!

First steps should be easy

Nevertheless, I thought first steps would be trivial. Bulk reconstruction, I’ll just use IDE to recreate missing objects / methods.

Not entirely though.

    1. STS did NOT pull JUnit to build path upon importing new Maven project. I had to add it to build path myself, refreshes, cleans, etc. did not help. Adding to build path was easy, sure, but I found myself raising my eyebrow at tool’s inability to do this immediately. Well, I was on a very weak net connection, so let’s leave it at that.


    1. Guys used Maven. And they’ve deleted the “main” folder. That meant each and every time I wanted to create an object I had to change it’s location from suggested src/test/java to src/main/java one. Mildly annoying, but Eclipse never tries to differentiate between main and test dir (I experience this each time I see a class without tests and try to create a new one – it then suggests src/main/java dir, and more than few times I had to move the test class manually later).
      It’s just that with bulk recreation that was annoying, and if there would be more classes, I’d either suggest to first create them in test dir then move them all, or to write yourself a macro / shell script to create them automatically.


    1. Enums. This was interesting, I nearly created classes where Enumerations were used. If not for the fact I KNEW Enums were used, and not for the Color.CLUB notation, I might have first used classes. :-)Also, while on Enums, I should add, that it’s MUCH easier to create enums manually than to do so with IDE’s help. Of course, in case contents are clear, as is with cards colors or values! I knew what to put there, and this was much faster and more orderly, then recreating it “as errors went”. I first did the latter, and soon I had CardValues enumeration like: CARD_2, CARD_A, CARD_9…
      Then I decided I want my enum in order and promptly finished it so. One more thing about this decision that turned out very important. Comparison relied on ordinality of the enums. Had I left it to IDE to generate them for me in the order I encountered them, I wouldn’t have them completed (guys used only 3 cards and one color) and the ordering would effectively destroy comparison in some cases. 🙂


  1. Time! With all that, first steps took me less than writing this part of this post.

Let’s have at it! CardValueCompare

First steps for me ended in CardValueCompareTest. I thus decided this will be first test class to be made green. Few moments of reading gave me first few glimpses in the authors intent.

Poker hands?

They compared card values, not hands. I found that surprising, I would calculate whole hand and compare it that way.

Comparison result = a number?

Comparison result was an integer. That made me raise my eyebrow and pause. I actually noticed that when I was recreating missing methods, since this one had return type int, but somehow then I only registered it, now, upon more “slower” reading, I took notice of it. Especially since assert messages (yes, they had assertion messages!), code and even a constant (private static final int EQUAL_OBJECTS_RESULT = 0;) spelled it out very clearly.

Typo can do damage!

Consider test name:

public void testCompareOtherCardValue_someCards_shouldReturnZero(){

Now actually, what they wanted was:

public void testCompareOtherCardValue_sAmeCards_shouldReturnZero(){

Capitalization mine. Why I say so: they had the constant for equality, right? And it was set at zero. Thankfully, constant was really a giveaway that this typo was just that – a typo. Furthermore, test itself compared one card to itself and had the expected and actual results defined as actualEqualObjectsResult and expectedEqualObjectResult.

Still, with all that, I had my doubts for two minutes, before reading the test code, which was actually intended, comparing SOME cards, or same one. Then I fixed the typo – first change in test code.

Usually helpful – now detrimental IDE

Default implementation of int returning method in STS is return 0; and this alone is enough to get some tests passing. While not exactly an insight into what authors had in mind, I played with some results and noticed that IDE option to “show failures only” was definitely not helping in reconstructing.

It simply hid information! Because I knew so little of this code. Usually, when I’m working with code I know, code I wrote, or code I’ve already read, the info on passing tests is irrelevant, I want to focus on those NOT passing. This time around, it’s simply NOT the case. So, I turned the option off and was rewarded with… little information, since first thing that guys use in test names is method name. Since all tests were for one method… that was all that appeared in the JUnit test result window in it’s default size and with default STS layout. I promptly moved the window down. This was better, but somehow I found it irking. I like the JUnit window at the side. Also, the window rearranged it’s layout, now failure trace was NEXT to test names, and I liked it below. :/

I moved the window back, maximized the code and decided to try it that way. Still, the tests names were at this point too long for me and I wished authors skipped the method name. I ended up using either maximized view or JUnit at the bottom, neither was enough to show test names though.

Suit? Value? Both?

First failing tests I took on, was one that wanted greater than zero returned when first card was greater. Since method that compared the card took whole card and since int was returned, I went with enum compareTo method only on value. Why only on value? Because I knew the rules, suits weren’t needed. Still, I noted for future, that by passing whole card for comparison, suggestion was to use whole card rather than just it’s value. How then should this be attempted? Tests offered no clue. If results of value comparison were to be added to that of color comparison, or multiplied – no information was offered.

That was interesting for me, I made a point to remember to look for something in tests that would tell me otherwise, since I wondered, what other folks would do, if they wouldn’t know rules like I did and therefore had no way of knowing that suits were unordered and NOT taken into account when comparing cards. Poker I played usually had suits ordered (perhaps wrong)!

Green and onto next one

Minimum implementation of compare method to get all tests green:
return value.compareTo(card_Second.value);
this plus parametrized constructor of card was all I needed, at cost of one warning on private field suit, which was never really used in card except when constructor was setting it’s value.

In the end, this was a test to see if Enum compareTo works as expected. While they were very useful to let me know what compareTo implementation was, I frankly wonder, if tests like this are all that common. Still, recalling some conversations with guys I had when they were writing the code, I can see why such tests.

Next one: EqualCardsCompareTest

This one was lightning fast. Upon seeing two private fields, “first” and “second” both of Card type, the @Before setUp, which set both of them at 2 Club, I took a look at tests: two on equality of two cards, two on hashCodes being the same.
Names of tests were great help – though I still would cut out first parts.

Minimal implementation left to IDE: generated hashCode and equals methods did the trick.

Next one: last compare. NotEqualCardsCompareTest

Test class had lowerCard (2) and greaterCard (A – for Ace), both of Clubs. This unfortunately gave little information if authors intended suits be unordered or just didn’t care about them (or perhaps have forgotten they may matter).

By the time I got to this test, I’ve had all the implementation for it to pass with flying colours… or, in this case, flying greens. 😉

Last step: GameEngineTest

First run showed some errors (missing implementation) and failed assertions (likewise).

I started with assertions, since they expected ints. Seemed easier. After adding a private int deckSize to gameEngine and making getDeckSize return 52, I had my first test. Then, adding a private setter to deckSize and tweaking drawCards method so that it called said setter, I had two tests green.

Including “testInit_void_shouldCreateCardset”. This bothered me slightly, as I had no cardset for now. Heck, I had no deck! I had just some phony int. :-/

Fortunately, testInit_void_shouldCreateUniqueCardset was stark red and that made it my next target. Obviously I was being peevish and stubborn at the same time, but I wanted to see if I can get this to work without creating a constructor or having the init method I knew guys had. 🙂

So I inlined field initialization, and made it from test error to test failure – I had the deck, but it didn’t had the size it should.
Two foreachs later it did. Foreachs landed in getDeck. I tried cheating and pushing 52 nulls inside, but:
– test used HashSet implementation so I followed (nulls or same cards – out),
– deck was a collection of Cards with size 52 – easy to see what needs to be in
– test name was about Unique Card set.
Nuff said. Was time to concede with peevishness. I complied, refactored my code and put init stuff in a constructor.

Just in time, next test used getDeck to see if after drawing it still had the cards that were drawn. This one nicely told me that:
– I need to draw cards
– and remove them from deck afterwards

It also confused me with using a different deck implementation. All of a sudden deck became an ArrayList, not a HashSet. :-/

Still, I complied and it said it’s NOT enough. Why? No clue for a while, which had showed me the need for a debug method. I promptly added a toString on Card, to avoid having to look at message that said expected Card@45432, Card@543455… but got Card@… Card@… etc. and told me nothing but “test failed”.

With toString it was much easier to debug. I found out that… ordering in the hand matters!

Hand of all 2s and Spade 3 was different than hand of all 2s and Spade 3:


ordering matters? say what?
ordering matters? say what?

The problem was in using List. Hand drawn was NOT a list, it was a set of unique cards. I made the test pass when I realized that ordering mattered.

In other words – I have fallen into enum trap, which I thought I’ve avoided. Turned out I didn’t. What led me to it was the ordering. I used foreachs to fill out the deck, and this caused the spade to be second (it was second in enum!). When I implemented hand drawing, I made it draw one hand: all 2s and one 3, in a different order. So, when the test was comparing, the ordering factored into two collections being NOT equal.

I changed the ordering in the Suit enum and last test started passing.

Closing thoughts

Three test classes here were about comparing cards. One covered general rules, one covered equality, one non-equality. During retrospective one of the guys said, that he would compile all three into one class. I kinda concur. I had on occasion split up test classes into cases (sometimes by logic, sometimes by something else) but usually then, I leave a Javadoc explaining my reasoning, on test class, with proper @see tags, pointing to other split test classes. I missed seeing this here – I could extrapolate the intent, but I was uncertain if I was right and if this is really worth it.

What further complicated this problem was that all three classes really made sure Enum compareTo method worked as designed. This was actually testing Java itself, and in my book that’s going to deep. You should test business logic, not language you use. I’d rather see test cases like: “a full house wins against four of a kind or “two pairs win against pair” and other. That would tell me more about the code then a dozen tests verifying that if I compare two cards, one latter one is null, then exception will be thrown, or if cards hash code will be same while cards are equal, or different when they’re not.

Also, multiple tests passed with one line of code. Once I had enums and compareTo method (one liner!) I essentially had three test classes passing! That’s serious test duplication. Or perhaps my BDD sympathies are showing.

What you pass to method matters. Method comparing a card taking a card suggests both suit and value matter. Method taking only card value is clearer in it’s message.

What your method returns matters. public int getDeckSize() suggests that class has deck size as integer. This is not the same as card set.

Game engine was both game engine and a deck. It also dealt cards. Perhaps too many things at once? Especially strange as tests for cards comparison were so split, and here tests for deck and drawing and game engine… were intertwined.

Expected integer, method getDeckSize in GameEngine that returns int, this really made me think about private int deckSize. But had I been honestly coding like usual, I’d have applied dependency injection (and other nice notions) to infuse GameEngine with Deck, and would refactor test code. This time however, I tried to treat tests as something that should NOT change.

Summary on reconstructing code as tests

  1. Illuminating. Really, I learned a lot. This was totally worth it.
  2. Task knowledge helped in realizing differences between actual task and the implementation that could be derived from tests. As well as realize the parts that were missing. Major differences:
    1. Suits should NOT matter, when comparing cards, but nothing in tests showed this.
    2. Cards should NOT have been compared, Hands should be compared instead.
    3. Card ordering in deck and in hand drawn should NOT matter, but it did.
    4. Comparing on Enum had a hidden implication that this enumeration will be in particular order. Yet recreating via IDE would’ve effectively killed it.
  3. When basing just on tests and recreating code, IDE functions can be detrimental (show only failures) even if in day-to-day work are useful.
  4. Verbose tests names are a mild annoyance. I’d prefer if guys skipped first part of each tests name (the one with method name).
  5. Assertion messages in tests help a lot!
  6. Typo in test name can sow confusion, but clear test code will prevail.
  7. Business tests mean more than implementation tests. I have high chances of knowing (or googling) how enum comparator works, I could better use tests showcasing rules typical for your application only.
  8. Clarity of the code helps TREMENDOUSLY. Guys not only used assertion messages and long tests names, they also had constants, meaningfully named variables, etc. Their test code was split into given when then sections and generally was a pleasure to work with. <bows> I know this kinda ties with point about typo, but readability of this code deserves praise. Really.
  9. If during the kata I would spent more time with those guys, I would be able to point them better and the end result would be more TDD like. The advice would go along these lines: write all test scenarios (behaviours?) first, then go with making them happen.
  10. What you pass to method matters, what your method returns also matters.
  11. IDE can led you astray (Enums ordering) if not enough test data is present. If I wouldn’t know card suits, and had not deck in GameEngine with size tested for being equal to 52 I could have left card colors at just clubs (only one used in tests).
  12. Test setup tells a lot about intent. Equality comparison (and non-equality comparison) tests had very nicely constructed setups.
  13. I definitely did NOT play nice when recreating. I went for minimal implementation and did not care for code, because I wanted to lean heavily on tests, and tests alone, avoiding any changes there.


Thanks to AGS sending me this link.

Thanks to Daniel for blogging about it, and replying to my email and questions, and even kindly sending me the assignments!

Also thanks to people who helped me organize this, both at my company and outside.

Finally, thanks to all Software Craftsmans from Kraków who participated! 🙂 Links to GitHub will appear shortly, also on SCKRK page!


One thought on “Tests – living documentation – or are they?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s