Tuesday, 5 September 2017

Program as a total function - the real nature of bug

Things are obvious to us. We live on auto-pilot passing the same things everyday so we stopped wondering what they really are. We got accustomed to them and stop any reflection or everyday analysis.

"2+2=4" seems to be obvious. Can there be any surprise? Maybe in language which allows to override "+" operator someone could set a trap but it should not be anythings surprising here. Yet, actually there is a non trivial proof that 2+2 is actually 4. You can find it here -> A real proof that 2+2=4

Another situation and another statement. "There is a bug in my program" - this is very common situation so you most likely become used to the context of it and you will not redefine general concepts of "program" and "bug" each time - it's obvious like 2+2=4.

But if program is working without any exception is there a bug in it? If there is a bug but no one is able to spot it - can we say that there is a bug? Is a bug the same in theory and practice? To answer this questions we need to investigate concept of "program" and "bug" a little bit deeper...

Roots

The picture posted above is copied from wikipedia page called Human Computer. Yes... "computers" existed even before computers were invented.

The term "computer", in use from the early 17th century (the first known written reference dates from 1613), meant "one who computes": a person performing mathematical calculations, before electronic computers became commercially available.

Ha! So not only we need to define program but it is also not clear what is a computer. So now the sentence "I just bought a computer" may as well describe act of purchase of metal box or in the worst case act of slavery.

The part where they are described as "a person performing mathematical calculations" is very interesting. So "computer" was performing mathematical calculation - even those very simple like the following one :

fun isRectangularTriangle(a:Integer,b:Integer,c:Integer):Boolean = 
                         pow(a,2)+pow(b,2) == pow(c,2)

So this "mathematical calculations" checks if

a^2 + b^2 = c^2
or otherwise if triangle is rectangular. But can we be sure it is working that way? What if pow actually triggers just "bip" from computer speaker?

What would be extremely useful here is a "law" or "theorem" that for every possible argument we are sure that pow(x,2)=x*x . Of course now we need to define how works this sign "*" but let's assume this is standard multiplication - it is just an introduction to more interesting topics.

Can something go wrong?

If we look closely at our function of rectangular triangle we may think that obvious problems appear at the ends of int bounds. What is int range? "I know, I know in java doc it is 2 31" .Yes. This is an example of seeing world through single programming language lenses. Unfortunately many programmers stays for their whole life with their first commercial language and mistake its limitations with general programming rules. If you switch from Java to Haskell you will see that there can be no limits for an Integer:

Mathematical proof of reading from a file

While we theorize about rectangular triangle problem we need to take one step back to not miss a very interesting area. The question is : how a,b and c are actually provided . Maybe there is somewhere csv file with only one line?

3,4,5

Then maybe we can read those values with provided two functions

fun read(f:File):String
fun parse(line:String):(a:Integer,b:Integer,c:Integer)

And then compose those three functions to have one solid mechanism which calculates answer according to provided data file:

read andThen parse andThen isRectangularTriangle

- Hooooooooooooooold on! And what in case there is an error?
- what error?
- file reading error
- what? .. how??
- like there is a no file or something
- so maybe... maybe... we could assume that it is always there ...?

The following problem - very practical - was described with possible solutions in the following abstract :

As it is stated on the screen shot - theoretical knowledge come across serious difficulties when you want to describe something which can break. Ho you can describe mathematically a function of reading file or executing query on the database?

solution to this problem was finally found by Eugenio Moggi and abstract above is taken from his work called Notions of computation and monads

Bug as a function

As a reminder : an example of Total Function is an addition because for every argument there is defined result. Contrary to this - division - is not defined for zero so it is Partial Function . When we have a method/function/whatever which claims that is able to divide any two numbers

divide(a:Double,b:Double) : Double
Then we have an error for. For one specific value but still - an error. Other Examples :

  • Function RegistrationForm => User is defined for every "proper" form but not defined for a form which has value "aaaaa" in age field.
  • Function File=>String is defined for a set of file which actually exist under expected location
  • Function OrderId=>Order is defined only for existing orders
  • Function RequestToExternalAPI => RESPONSE is of course partial function defined for cases when we actually receive any response

Dear reader - I don't know if you came to definition of error from this perspective. During my "Java Period" I treated partial functions like total functions by cheating with exceptions. Sometimes we control exceptions by throwing them but sometimes exceptions control us by throwing themselves by surprise (maybe breaking 1000000 dollar purchase process). So beside most obvious definition that a bug is a situation when program returns incorrect result we can add another one :

Error - situation when partial function is used as a total function.

So something like this :

 readGeometry:String=>Boolean = parseLine andThen isRectangularTriangle

Can be defined as a bug because parseLine function is defined only for a specially formatted line which is a subset of all possible text lines. This solution can even pass all tests if test data is very naive.

Partial Function visualization. Red represents input parameter not defined for given function - like database query which causes failure

Compositions and Effects

The general problem with exceptions is that it is difficult or even impossible to compose functions when we are cheating with types. and we want to compose because composition is the Holy Grail of programming. How to explain it? when we have a function String=>(Int,Int) which promise us that it will return tuple (Int,Int) but in reality it can return this tuple or who knows which exception then we can not just compose this function with another one (Int,Int) => Boolean because I can not be sure that I will receive (int,int) at input.

OK - it is very easy to find flaws but is there any solution? Yes there is! Instead of pretending that partial function is total we can actually convert partial function into total function using type manipulation. And this is (most likely) what Eugenio Moggi proposed in his work ("most likely" because I'm unable to understand full of it)

So lets start from classical solution :

val divide:(Int,Int)=>Int = (a,b) => a/b

try{
    divide(4,0)
}catch{
    case e:ArithmeticException => logException
}

It is important to understand here that try is an artificial glue here which connects reality to fake types. Lets reveal true nature of a function.

val partialDivide : PartialFunction[(Int,Int),Int] = {
    case (a,b) if b!=0 => a/b
}

println(partialDivide.isDefinedAt(4,0))  //false

Now the magic happens! - we change partial function into total function by introducing different category of types

val totalDivide: ((Int, Int)) => Option[Int] =partialDivide.lift

println(totalDivide(4,0))  //None
println(totalDivide(4,2))  //Some(2)

"totalDivide" is a total function and for every argument it has defined result. Where division has sense result will be Some(number) but where it is not defined - zero - it is None. The question may appear - "ok, but how this differs from null?" . First of all we know the type of Option - it may be Option[Int] , Option[String] , Option[User] where null is just ... null. When we have defined set of type values Some and None we can transfrom those values with pattern matching and high order functions - first mechanism doesn't exists in Java and second is quite new - maybe that's why this approach is not so popular in the mainstream.

We talked a little bit about composition so how this totality can help here? There is whole theory describing how to compose those things - some popular words which can be known to you dear reader are Functor or Monad but in general you can call those "wrapper types" - Effects. So Option will be an effect of "lack of something" like result of division of zero, Future can be an effect of time in computation etc...

This Effect should follow some well defined laws which can be checked by tools like quickcheck or scalacheck - there is even ready to use test template in scalaz

If our effect follow those laws we can easily compose them with pure total functions like in this example

 Option(x).map(f).map(g) is equal to Option(x).map(f andThen g)

Summary

Lets stop here. The topic is enormous and a lot of powerful Effects composition constructs like Monad Transfrormers, Kleisli or Foldable waits around the corner but lets stop here. We started this article with thesis that everyday we work on auto-pilot mode and we don't have time or energy to contemplate things we do everyday. So instead diving into details of machinery stay on more abstract level and compare your usual approach to writing programs with philosophy of programming descibed in this article. Maybe something magical will happen...

Monday, 7 August 2017

Levels of Abstraction

At a very beginning feel very different levels of abstractions in sentences: "turn on the computer" vs "on molecular level trigger flow of electrons which cause millions of transistors and hundreds of various electrical circuits to obtain particular state"

In our daily life and during our daily communication we subconsciously adapt level of details in our words so that other side can understood us with minimal mental energy and without necessity to decode many irrelevant aspects of information.

Do we do the same in our code?

Robert C. Martin in his book "Clean Code" describes easy to maintain functions/methods as :

  • small
  • responsible for only one thing
  • working on one level of abstraction

First two points should be quite easy to imagine (however functional languages like scala added another constraint that functions should not only be short but also thin). Yet the third point about levels of abstractions is... well abstract itself thus may be non intuitive.

It is always easier to demonstrate abstract concepts with use of a practical metaphor. And what can be more practical than recipe for Scrambled eggs!

The recipe

  1. Heat the butter in a small frying pan
  2. Drop the melted butter in small chopped slices and finely chopped chives
  3. Break the eggshell with a sharp knife and pour the contents into the pan
  4. Add some salt and pepper and mix until the eggs are cut

The recipe is rather intuitive and no one should have problem with understanding it. On the level of abstraction used in the recipe you only need to understand how to handle couple "elements" from kitchen domain and our final product is ready!

However some relatively complex things are happening "beneath". Those complex things brings up some concepts not important from our food domain perspective.

If we would like to model this recipe in our code then how such code with broken level of abstraction would look like?

The bad code

add(butter)

while(butter.isInConsistentState){
  for(fatMolecule in butter)
    for( atom in atomsInfatMolecule){
      atom.incrementEnergy(heat)
  }
}
add(slicedHam)

while(chive.isInNotManyPieces){
  use knife to create force which will break chive internal structure 
}

add(choppedChive)

knife.storePotentialEnergy
knife.transformPotentialEnergyIntoKineticEnergy
knife.hit(egg)

add(eggContent)
add(salt) 

for(peeperSeed in A pinch of pepper){
 add(peeperSeed)
}

and heat till it is ready

(Proper naming of physical concepts may not be accurate but it just an illustration. Also food names are taken directly from google translate - author knows only several English words from food domain : proteins,fat, carbohydrates, chicken,rice ,broccoli)

Now imagine such recipe would be posted on some page with food recipes. People would be extremely confused. Moreover to understand some concepts it is not enough anymore to be just "good cook" - now it would be good to remember some things from physic lessons.

Now let's return to more natural one level of abstraction

One Level of Abstraction

fryingPan.add(butter)
fryingPan.heat(butterIsMelted)
fryingPan.add(slicedHam)
fryingPan.add(choppedChive)
fryingPan.add(break(egg))
fryingPan.add(salt)
fryingPan.add(pepper)
fryingPan.heatTillready()

This recipe is shorter, more readable and doesn't force you to know concepts from outside "food domain". You can also observe an interesting visual effect which can be used as test for one level of abstraction - in the second implementation there are no indentations!

So after this pseudo-example how this problem touches IT? Very often you will see something similar to this :

function(object){
  domainOperation()
  internals=object.getInternals
  ...
  while(..){
   for{
    for{
     for{
      if(...){
       for{...}
      }
     }
    } 
   }
  }
  domainOperation()
  for(i in object internals) ...
}

For managers

This situation cost your company real money. When levels of abstractions are mixed code analysis will be longer thus more expensive. Also very often changes will have to be introduced on couple levels raising probability of introducing new bugs so new costs again. Remember that.

This code breaks one level of abstraction and very often forces reader to jump between concepts from different domains. So next time you will see a domain concept cut by technical low level detail you should recognize that you just jumped through different layers of abstractions — maybe it’s your fault or maybe someone some times ago wrote a shitty code and said “this is technical debt, this is fine”.

Sunday, 16 July 2017

Rich in paradigms

"You can have more!" - this slogan can be find in many tv commercials. This thesis is quite dangerous in IT world because you not necessary want to have more lines of code which may rise probability of bug (although there are stories of some companies having number of code lines in KPIs!) but on the other hand maybe you can benefit by having/knowing more approaches to solve given problem?

Our brains are Overloaded and maybe that's why simple explanations are so pleasant for us. I remember there was a time when I was actually jealous that .NET developers have only one framework to learn when in Java you would feel frustrated knowing you don't understand N available tools because to "be pragmatic" you would have to justify why you chose one approach over another. Yet having rich choice between tool you had only one choice in solution domain - „everything is an object”.

This approach started loosing popularity in recent years When I had a chance to teach scala to both C# and Java developers – first group understood functions quite naturally and even monads were something familiar to them - „it's like LINQ, it's like LINQ”. And Java? "Java made functions popular by not having them". When in 2014 we posted news about Java8 on our JUG group – well, C# group invited us to their meeting to see what will be in Java in 5-10 years.

By having richer language C# devs started learning new paradigms earlier than standard Java dev but of course there is other side – the less complicated language is then you have less power but also maybe less chances to make exotic mistakes. So the question is can we "have more" good things and "have less" problems at the same time?

Golden hammer

On the picture above you can see programming paradigms family – there are several different programming paradigms but we usually focusing on only one of them. Why?

I observed that when programmers are discussing about different approaches to code problems they often concentrate on some particular differences which only in a very specific situation shows „superiority” of their favorite solution. So (now we entering "metaphor mode" ) for example when someone claims that fish is better than bird because bird reveals many disadvantages 10 meters under water – then second person will give counter argument that fish is unable to catch food on trees. Yet – both fish and bird were adapted adapted to specific "context" by evolution.

Can similar adaptation occur on more conceptual level? For example we have paradigm with limitation „an object has to have a mutable state” and paradigm two - „code has to preserve referential transparency” (RT in short means – function calls need to be independent from surrounding state – this improves code readability a lot!)

Similarly to bird and fish it is easy to find examples when those conceptual limitations brings more problems than solutions. But again like with the "animal metaphor" – those approaches evolved in a specific circumstances.

And limitation understood as specialization can bring a lot of good things. Each year we organize Global Day Of Code Retreat where each participant is forbidden to use some specific constructions they are used to in everyday work like „for -> inside for -> inside for ” etc. which leads them to better Object Oriented Design. OOD – because most people choose Java as heir primary language.

There is CodeRetrat and there is professional life. In life given language bring limitations like :

  • Java - for last 20 years Java did Code Retreat with limitation „everything is an object” - what for 5% of programmers whom I know means good practices of OOP design – but for the rest – dozens of fors closed in „SomethingManager”.
  • Haskell – you are limited to good (or bad) Functional Programming design.

Maybe first language wins in one specific environments with specific problems and another can solve different class of problems? What if you have both classes of problem in you project?

To support this thesis lets look at a very interesting quote from a very nice book : Concepts, Techniques, and Models of Computer Programming

Adding a concept to a computation model introduces new forms of expression, making some programs simpler, but it also makes reasoning about programs harder. For example, by adding explicit state (mutable variables) to a functional programming model we can express the full range of object-oriented programming techniques. However, reasoning about object-oriented programs is harder than reasoning about functional programs"

Also Effective Java – more than 10 year old book now – promotes immutable structures as easier to operate and understand (Still whenever performance is at stake you will most likely end up with mutable structures.)

Everything is an object?

Lets start by looking at one interesting example of limitation which is only limitation in thinking. According to my current knowledge OOP focuses on solving problems by managing changes of mutable state. State is encapsulated in an object which is a noun . Noun from time to time is called Invoice, Customer or Money which makes easier to convince business people to this approach.

However OOP has no monopoly for nouns. In FP we can easily create new types.

data Customer= Customer {firstName::String, lastName::String} deriving (Show)

let c=Customer {firstName="Stefan", lastName:: "Zakupowy"}

Prelude > c
Customer {firstName = "Stefan", lastName = "Zakupowy"}

Polymorphism

There was a time when word „polymorphism” triggered „Pavlow reaction” in my brain : „polymorphism <←--> Inheritance”. How surprised I was when I discovered there is a lot more!

And for example this "ad hoc polymorphism" - it is something very exotic in java world. Lets assume we have some tomatoes, very clean and very pure domain tomatoes.

case class Tomato(weight:Int)

val tomatoes=Seq(Tomato(1),Tomato(4),Tomato(7))

How quickly calculate sum of all tomatoes? You can add Numeric nature to tomato ad hoc and then just use them as numbers :

implicit val numericNatureOfTomato=new Numeric[Tomato]{
  override def plus(x: Tomato, y: Tomato): Tomato =  Tomato(x.weight+y.weight)
  override def fromInt(x: Int): Tomato = Tomato(x)
  override def toInt(x: Tomato): Int = x.weight
  override def toDouble(x: Tomato): Double = ???
  override def toFloat(x: Tomato): Float = ???
  override def negate(x: Tomato): Tomato = ???
  override def toLong(x: Tomato): Long = ???
  override def times(x: Tomato, y: Tomato): Tomato = ???
  override def minus(x: Tomato, y: Tomato): Tomato = ???
  override def compare(x: Tomato, y: Tomato): Int = ???
}

tomatoes.sum  
//res0: Tomato = Tomato(12)
//def sum[B >: A](implicit num: Numeric[B]): B

In Java you would have to implement specific interface and add it to Tomato upfront. Then by extending/implementing you would add Numeric nature to you class. Scala implementation above may look like Strategy pattern at first - and here I can suggest to learn Haskell just for educational purposes to better understand those mechanisms.

And when we mentioned implementing/extending and inheritance overall - technically it is just a realization of more general mechanim called Single Dispatch and when you understand general context then maybe it will be easier to spot advantages and disadvantages of this mechanism - in contrast to popular example in OOP book - "Cat and Dog extends Animal"

Information Hiding

Usually when we think OOP we think Encapsulation (Pavlov again?) - and to be clear it is very good practice - but let's stop and think for a moment (again) to understand where it came from and if it is only characteristic for OOP or maybe it is more general and we don't need OOP to have it?

And again encapsulation is an example of more general concept : Information hiding which leads to better modularisation of source code. Haskell is far from OOP but it has modules and is able to hide information in those modules. Then you have an alternate mechanisms to "object method" for accessing those private data. One of such mechanisms which is considered mainly to be FP mechanism is Pattern Matching

First of all be aware that Data not always have to hide something - an example can be Scala's case class which only represent some record. Still Scala because of it FP+OOP nature allows you to connects Pattern Matching with object encapsulation very nicely.

So having a module to play card game :

trait Game[Card]{
  class Deck (private val cards:List[Cart]){
    def takeCard : (Card,Deck) = (cards.head,Deck(cards.tail:_*))
  }

  object Deck{
    def apply(cards: Card*)= new Deck(cards.toList)
    def unapply(deck: Deck):Option[Card] = Some(deck.cards.head)
  }
}

Structure used to store cards (List) is invisible from the outside. Next step is to implement given game with simple String representation.

object StringsGame extends Game[String]{
  def take(d:Deck)= d match {
    case Deck("ace") => "I have ace"
    case _ => "something else :("
  }
}

And now we can play

import StringsGame._
val deck1=Deck("ace","king","ten")
val (_,deck2)=deck.takecard

play(deck1) //res2: String = "I have ace"
play(deck2) //res3: String = "something else :("

Abstractions

Talking about abstractions can be very abstract itself (here it is a very good presentation about this topic Constraints Liberate, Liberties Constrain — Runar Bjarnason). To not "fly" to far from the topic lets focus on a good practice frequently used in Java -> programming towards abstract interface to postpone reveal of details and preserve freedom of choice. So For example very often Collection or List types are used in declaration to not reveal that we have LinkedList

Yet sometimes abstractions are not that obvious and to spot them you need to join known facts in a new way. Maybe there is something similar between Optional and CompleatableFuture when at first sight those mechanisms seems to be completely separated and designed for completely different things? It maybe the hardest thing to change someones "stabilized" opinions and views on what is and what is not good practice in given context to learn new approaches to a problem.

When Good is Bad?

Till now we mainly discussed about good sides of using multiple paradigms but are there some downsides? Sometimes borrowing a concept from one paradigm and using it in different context creates a disaster and then people tends to blame only a concept forgetting about the context.

For example Utils class in OOP context very often signalizes design smell because where we expect separated encapsulated states we came across at a "bag of loosely connected methods".

On the other hand there is org.apache.commons.lang.StringUtils which is... indeed... very convenient to use even in "hardcore" OOP context. What went good there? IN OOP there is a metric called LCOM4 which measure object cohesion by checking if there are some independent states within object state. Yet we can not use it for StringUtils - StringUtils doesn't have any state. StringUtils aren't also coupled with any external state and this is crucial. Custom "Utils" very often are procedures which operates on external state - StringUtils are independant. A pure function (without function keyword) in the middle of Object Oriented Paradigm.

When this difference is not understood correctly concept of "stateless class with static methods/functions" is always code smell

Hidden Paradigm

There is one special "paradigm" which is not well described in books about good practices - "paradigm" of writting f***ing unreadable code. It is a complex social phenomenon in which participate whole range of people from corporate ladder. This paradigm is like a logical gate in front of other paradigms. If it is present then we can not talk about correct OOP or correct FP approach because we just have "big ball of mud", "spaghetti" or just "f***ing bad code". Unless you remove this "bad paradigm" - discussion about other paradigms is just a science fiction.

Summary

Imagine that you have two paradigms of movement : walking and running. There can be endless discussion which is better : "When you are running you have a chance to flee from angry dog, - hej! but if you walk silently then maybe dog won't notice you". This example is of course quite stupid because "moving" is a very intuitive "thing" for us and through using it everyday we gain intuition about the context.

Programming is not that intuitive for us and we tend to judge different approaches to quickly. Very often we learn one approach with our first technology ecosystem and then we tend to reject any other. Maybe it is better to first learn in depth how to "walk" and "run" and only then compare those two way of movement? This is called "being an engineer" - to know when to use given tool. (You can of course earn w ton of money by being so called Technology Evangelist and promoting only one technology)

In JVM world for more than a decade there was only one "politically correct" paradigm so maybe to "equalize" concepts people from other paradigms have to be more aggressive than it is needed? I hope dear reader that through this article I was able to show you that different paradigm can enrich each other. Or maybe there is just "one" paradigm which only parts we can see? There is a good quote from one smart book about programming which I think matches perfectly thesis of this text and show how different approaches are connected - "(...)A function with internal memory is usually called an object(...)"

Sunday, 2 July 2017

Consequences of invariance

First step is curiosity. If you are lucky enough to use at least java8 and you have java.util.function.Function to your disposal then maybe you saw methods (yes "methods on function") like compose or andThen. If you follow implementation you will see that in declaration you have quite broad generics

default <V> Function<V, R> compose(Function<? super V, ? extends T> before)

default <V> Function<T, V> andThen(Function<? super R, ? extends V> after) 

And the point is that we will see such declarations more and more in our daily programming. In this article I will try to prove that technically only such declarations with "? super" and "? extends" has pragmatic sense and you will have to type those signatures each time you are passing function as a parameter. We will see what is the origin of this and that there was and there is maybe more convenient alternative.

When types have a type

Long, long time ego everything in Java was literally an Object. If you had a list then you had a list of Objects - always Objects.It was a time when CRT monitors would burn your eyes and an application started 15 minutes just to throw CastClassException just after start. Because you had to predict or just guess what type you were operating on. It was time when Java did not have generics.

When Generics came to Java5 those problems become marginal. However a new class of problems appeared because now assumption "everything is an Object" started generating some strange problems

Because if String is and Object and we have a "List of Strings" so technically list of objects because everything is an object - then is this list also an Object and finally can it bee also seen as a list of objects? Unfortunately - for java list - the last one is not true.

If this would be possible you could write code like this:

List<String> strings=new LinkedList<>();
List<Object> objects = strings; 
objects.add(1); //disaster!!

This would be disaster. But we case made a pact with compiler that we will not put anything "bad" there and write declaration like following one :

List<? extends Object> pactWithCompiler=strings;

Could have Java choose different approach? There is a convenient alternative which we are going to see soon. Still, looking at this piece of code the list declaration seems to be ok and prevents ClassCastException. This was year 2004. Ten years pass. Java8 is born with java.util.Function in it. And it is this new mechanism where we will see flaws of Java generics design.

Use Site Variance

So again let's take a look at java.util.Function to understand consequences of how generics are implemented in Java.

default <V> Function<V, R> compose(Function<? super V, ? extends T> before) {
        Objects.requireNonNull(before);
        return (V v) -> apply(before.apply(v));
}

Do extends and super have to there? Now focus :) - they had to be there because ... there is no point to not put them there! If you will not put them there you will limit function signature without any justified reason.

extends

What is the difference between two following functions?
public static <A,B> Collection<B> libraryMethod(Collection<A> c, Function<A,B> f){
        List<B> l =new ArrayList<>();
        for(A a: c){
            l.add(f.apply(a));
        }

        return l;
    }


public static <A,B> Collection<B> libraryMethod2(Collection<A> c, Function<? super A,? extends B> f){
        List<B> l =new ArrayList<>();
        for(A a: c){
            l.add(f.apply(a));
        }

        return l;
}

A difference occurs during invocation. Because having following class

class User{
    private String name;
    private Integer age;

    public User(String name, Integer age) {
        this.name = name;
        this.age = age;
    }

    public String getName() {
        return name;
    }

    public Integer getAge() {
        return age;
    }
}

We would like to execute following computation :

Collection<User>  users=new LinkedList<>();
Function<User,String> display=o->o.toString();
Collection<Object> res1=libraryMethod(users,display); // error
Collection<Object> res2=libraryMethod2(users,display);

Unfortunately we can not do it :( Without "super" and "extends" we introduced artificial limitation to our function so that we now can not return supertype of String. To remove this limitation additional effort from our side is needed. There is no justification for this limitation. And this will be popular "pattern" whenever you want to respect subtype polimorpohism

super

I hope need for "extends" is now explained. Situation with "super" is less intuitive. Let's try with following : "a User has subclass which is his specialization".

class SpecialUser extends User{

    private String somethingSpecial;

    public SpecialUser(String name, Integer age, String somethingSpecial) {
        super(name, age);
        this.somethingSpecial = somethingSpecial;
    }
}

And now we have another library function, this time for filtering our objects

public static <A> Collection<A> filter1(Collection<A> c, Function<A,Boolean> f){
        List<A> l =new ArrayList<>();
        for(A a: c){
            if(f.apply(a)) l.add(a);
        }

        return l;
}


public static <A> Collection<A> filter2(Collection<A> c, Function<? super A,Boolean> f){
        List<A> l =new ArrayList<>();
        for(A a: c){
            if(f.apply(a)) l.add(a);
        }

        return l;
}

And again there is no reason to prohibit functions which works on subtypes because after all SpecialUser IS-A type of User.

Function<User,Boolean> isAdult=user->user.getAge()>= 18;

Collection<SpecialUser>  specialUsers=new LinkedList<>();
// filter1(specialUsers,isAdult); //error because of no "super"
filter2(specialUsers,isAdult);

Now if you check how for example map is implemented on java.util.Stream you will see the same pattern again :

<R> Stream<R> map(Function<? super T, ? extends R> mapper);

because really other declarations doesn't have much sense. But in such case would it be possible to implement generics in a different way so that programmers could type less - and what's more important introduce less bugs (what if you forget about "extends") ? Yes, it is possible and it is actually working quite well. Java approach is called declaration site variance and the alternative is...

Definition Site variance

In other languages - instead of writing "extends" in 1000 declaration places we can actually write it one - on declaration . This way we can set "nature" of given construct once and for all. Let see how it is implemented then :

C#

interface IProducer<out T> // Covariant - "extends"
{
    T produce();
}
 
interface IConsumer<in T> // Contravariant - "super"
{
    void consume(T t);
}

Kotlin

abstract class Source<out T> {
    abstract fun nextT(): T
}

Scala

trait Function1[-T1,+R] extends AnyRef 

//now it is default behaviour of every function that it works as 
//'<super,extends>' with input and output types

But why?

Why java has use site variance. I don't know and I'm unable to find on google. Most likely this mechanism has a lot o sense in 2004 when it was created for mutable collections, IE had 90% market, people used tons of xml to share messages and no one thought about functions. In Scala mutable collections like Array are invariant and theoretically in this one place java gives more freedom because you can change construct nature when it is used. But it can actually raise more problems than benefits because now library users - not designers - are responsible for proper declaration. And when it was implemented this way in 2004 then it was also used this way in 2014 for functions - maybe this is an example of technical debt.

About few advantages and many flaws of "Use-Site Variance" you can read here -> https://en.wikipedia.org/wiki/Covariance_and_contravariance_(computer_science)#Comparing_declaration-site_and_use-site_annotations. . In general I hope this article shows clearly that declaration site variance is a lot better choice for Functions.

Links

  1. https://en.wikipedia.org/wiki/Covariance_and_contravariance_(computer_science)
  2. https://kotlinlang.org/docs/reference/generics.html#declaration-site-variance
  3. https://schneide.wordpress.com/2015/05/11/declaration-site-and-use-site-variance-explained/
  4. https://medium.com/byte-code/variance-in-java-and-scala-63af925d21dc#.lleehih3p

Tuesday, 15 December 2015

Build intuition around quality metrics

When you hear that a car drives at a speed of 250km/h or that a tree has 2 meters height you can intuitively classify it as fast/slow or short/tall.

When you hear that Cyclomatic complexity of you code is 2.7 or LCOM4 is 1.6 can you do the same? For many developers the answers is unfortunately - no - because in general there is no such real life intuition about programming. Many times I saw people surprised when they found four embedded ifs in their code. It looks like It appeared from nowhere.

We are not born with internal knowledge about surrounding code. Here you have first article returned by google which claims that we are actually born with some intuition about physics : Babies are born with 'intuitive physics' knowledge, says researcher. Some knowledge about world around us was quite useful for many generations - on the other hand Computers are relatively new thing in the history of human species.

So the idea of this exercise is to actually observe how complex code is created step by step and also to learn how some approaches helps keep value of this metric small. Code is used only for demonstration so some bugs may appear in it.

Cyclomatic Complexity A.K.A Hadouken code

Wikipedia says that cyclomatic Complexity can by defined as "It is a quantitative measure of the number of linearly independent paths through a program's source code." but maybe more intuitive for practitioners will be fact that when CC=2 you need to write 2 tests, CC=3 3 tests - etc.

Sounds trivial but now we are going to learn that Cyclomatic Complexity has non-linear nature which means that if you couple two pieces of code with CC=2 you may obtain code with CC>4. Let's see this on the battlefield.

We are going to start with Java because it has maybe the best tools to measure complexity and than we will see what we can measure in Scala Code.

Battlefield

Out experimental domain is very simple. Notice that at this stage we build just types with values. I used call those DTOs in my old Java days. Now I'd call them Records. Later we will see how nature of those classes will change during OOP refactoring.

class User{
    public final String name;
    public final Integer age;
    public final Gender gender;
    public final List<Product> products;

    public User(String name, Integer age, Gender gender, List<Product> products) {
        this.name = name;
        this.age = age;
        this.gender = gender;
        this.products = products;
    }
}

enum Gender{MALE,FEMALE}

class Product{
    public final String name;
    public final Category category;

    public Product(String name, Category category) {
        this.name = name;
        this.category = category;
    }
}

enum Category{FITNESS,COMPUTER,ADULT}

Laboratory

We are going to develop and measure a piece of code which simulates transformation of business object into it's text representation. It's more than enough for this experiment so let's see the first version.

  public String complexMethod(User u){
        String value="";

        if(u.age > 18){
            value="Adult : " +u.name;
        }else{
            value="Child : " +u.name;
        }

        return value;
    }

We can easily measure CC of this code and to do this we are going to use Sonarqube 4.5.6



And also "Metrics Reloaded" plugin will be usable in some places.

So after the first measurement we receiveCC=2 - we have just one if statement so we need two tests.

CC=4

Now let's add another conditional which execute different actions according to Gender property.

       if(u.age > 18){
            if(u.gender==Gender.MALE){
                value="Adult Male: " +u.name;
            }else{
                value="Adult Female: "+u.name;
            }
        }else{
            if(u.gender==Gender.MALE){
                value="Child Male: " +u.name;
            }else{
                value="Child Female: "+u.name;
            }
        }

Now let's try to guess what is CC of this code. Following path of execution are possible.

  1. IF -> IF
  2. IF -> ELSE
  3. ELSE -> IF
  4. ELSE -> ELSE

Sonar agrees.

CC=5

Our logic is expanding. Now we need to also add information about products...but only for User who is Adult Male.

public String complexMethod(User u){
        String value="";

        if(u.age > 18){
            if(u.gender==Gender.MALE){
                value="Adult Male: " +u.name;

                for (Product p: u.products) {
                   value+="has product "+p.name+",";
                }
            }else{
                value="Adult Female: "+u.name;
            }
        }else{
            if(u.gender==Gender.MALE){
                value="Child Male: " +u.name;
            }else{
                value="Child Female: "+u.name;
            }
        }

        return value;
    }

Technically sonar just counts number of for and ifs to calculate CC but we can understand result of CC=5 this way :

  • For empty collection nothing changes : CC+0
  • For non empty collection we have another possible path : CC+1

CC=6

Let's make things more interesting by adding filtering condition to products.

 public String complexMethod(User u){
        String value="";

        if(u.age > 18){
            if(u.gender==Gender.MALE){
                value="Adult Male: " +u.name;

                for (Product p: u.products) {
                    if(p.category!= Category.ADULT) {
/*HAAADUUUKEN  ~~~~@ */    value += "has product " + p.name + ",";
                    }
                }
            }else{
                value="Adult Female: "+u.name;
            }
        }else{
            if(u.gender==Gender.MALE){
                value="Child Male: " +u.name;
            }else{
                value="Child Female: "+u.name;
            }
        }

        return value;
    }

We have another single if in our code. Cyclomatic complexity is CC=6 which may seem small but it isn't. We already have Arrow Code Anti pattern. and context awareness of this piece of (sh...) code makes it almost impossible to reuse anywhere else.

Soon we will see how this way of coding rises complexity in more exponential than linear way. But first let's look at something called essential complexity.

Essential Complexity

What if we don't want to initialize mutable variable but we would like to return from withing embedded ifs?

public String complexMethod(User u){
        if(u.age > 18){
            if(u.gender==Gender.MALE){
                String value="Adult Male: " +u.name;

                for (Product p: u.products) {
                    if(p.category!= Category.ADULT) {
/*HAAADUUUKEN  ~~~~@ */    value += "has product " + p.name + ",";
                    }
                }

                return value;
            }else{
                return "Adult Female: "+u.name;
            }
        }else{
            if(u.gender==Gender.MALE){
                return "Child Male: " +u.name;
            }else{
                return "Child Female: "+u.name;
            }
        }

    }

Now Sonar will return CC=10 because technically complexity in sonar is not just Cyclomatic Complexity but CC + Essential Complexity (and maybe + something else). Here we will receive better measurement with metrics plugin.

So Complexity=CC+EC=6+4=10 . Essential Complexity - in my understanding it measures number of places where your logic execution can end. So because we have multiple returns it makes code more difficult to analyze. If this statements is correct is a matter of discussion but generally since I started learning Functional Programming and thinking in expressions I believe I haven't used such construction like return in the middle of the code. (BTW now I'm going to change syntax color to have a comparison what is better)

Ok we see the problem , now let's find a cure.

Refactoring OOP way

The easiest thing at the beginning is to move logic responsible for displaying product to a dedicated component. It's not easy to think about proper domain abstractions where actually we don't have any domain problem but let's try with something generic.

interface ProductPolicy{
    boolean isCensored(Category category);
}

interface PolicyFactory{
    ProductPolicy create(Collection<Category> forbiddenCategories);
}

interface ProductDisplayer{
    String display(Collection<Product> products);
}

So there is a policy which may be configured through factory. And we have interface for our displayer which may look like this:

class CensoredDisplayer implements ProductDisplayer{

    private ProductPolicy productPolicy;

    public CensoredDisplayer(ProductPolicy productPolicy) {
        this.productPolicy = productPolicy;
    }

    @Override
    public String display(Collection<Product> products) {
        String result="";
        for (Product p: products) {
            result+=addToDisplay(p);
        }
        return result;
    }

    private String addToDisplay(Product p){
        return productPolicy.isCensored(p.category)? "" : " has product "+p.name+",";
    }
}

Now let's take a look at our laboratory code.

 private ProductDisplayer productDisplayer;

    public String complexMethod(User u){
        String value="";

        if(u.age > 18){
            if(u.gender==Gender.MALE){
                value="Adult Male: " +u.name;
                value+= productDisplayer.display(u.products);
            }else{
                value="Adult Female: "+u.name;
            }
        }else{
            if(u.gender==Gender.MALE){
                value="Child Male: " +u.name;
            }else{
                value="Child Female: "+u.name;
            }
        }

        return value;
    }

Complexity of this code is CC=4 and complexity of Displayer is CC=1.7 so technically whole system is a little bit less complex already. (And "Beans" is the name of a class where I put all interfaces)

OOP data types

To move further we can change nature of data types from records into richer entities and use inheritance polimorphism to dispatch execution between specific pieces of code.

Check the code.

abstract class User{
    protected final String name;
    protected Integer age;
    protected final List<Product> products;

    public User(String name, Integer age,  List<Product> products) {
        this.name = name;
        this.age = age;
        this.products = products;
    }

    abstract String introduceYourself();

    abstract List<Product> showProducts();
}

class MaleUser extends User{

    public MaleUser(String name, Integer age, List<Product> products) {
        super(name, age, products);
    }

    @Override
    String introduceYourself() {
        return (age>18?"ADULT MALE":"CHILD MALE") + " : "+name;
    }

    @Override
    List<Product> showProducts() {
        return age>18? products: new LinkedList<>();
    }

}

class FemaleUser extends User{

    public FemaleUser(String name, Integer age, List<Product> products) {
        super(name, age, products);
    }

    @Override
    String introduceYourself() {
        return (age>18?"ADULT FEMALE":"CHILD FEMALE")+ " : "+name;
    }

    @Override
    List<Product> showProducts() {
        return new LinkedList<>();
    }
}

What is important here is to notice how context information was now moved to children of (now) an abstract class User. Context information is now encapsulated inside classes and this construction reduces Cyclomatic Complexity in the place where objects are used. Look at this:

 public String complexMethod(User u){
        String result=u.introduceYourself();
        result+=productDisplayer.display(u.showProducts());
        return result;
 }

So once again - we take control over context awareness of particular logic by moving this logic inside classes. Execution is controlled by polymorphic method dispatch.

Refactoring FP Way

We are going to start in similar way as we did in an OOP example. So at the beginning let's move logic responsible for filtering and displaying products into dedicated functions.

Function<Set<Category>,Function<Category,Boolean>> policy=
            forbiddenCategories -> category -> !forbiddenCategories.contains(category);

Function<Category,Boolean> adultPolicy=policy.apply(new HashSet<>(Arrays.asList(Category.ADULT)));
  

Function<Function<Category,Boolean>,Function<Collection<Product>,String>> displayProducts= policy-> ps ->
        ps.stream()
                .filter(p->policy.apply(p.category))
                .map(p->" has product "+p.name)
                .collect(Collectors.joining(","));

So we have Policy, Parametrized Policy and Product Displayer. Complexity of our lab method will be now reduced to CC=4

public String complexMethod(User u){
        String value="";

        if(u.age > 18){
            if(u.gender==Gender.MALE){
                value="Adult Male: " +u.name;

                displayProducts.apply(adultPolicy).apply(u.products);
            }else{
                value="Adult Female: "+u.name;
            }
        }else{
            if(u.gender==Gender.MALE){
                value="Child Male: " +u.name;
            }else{
                value="Child Female: "+u.name;
            }
        }

        return value;
    }

Now let's introduce some conditional logic into next functions mainly to check how good is sonar in measuring CC of Functions.

Function<User,String> ageLabel= u -> {
        if (u.age>18)
            return "ADULT";
        else
            return "CHILD";
    };

Function<User,String> introduce=u-> {
        String result="";
        switch(u.gender) {
            case MALE:
                result= ageLabel.apply(u) + " MALE" + " : " + u.name;
                break;
            case FEMALE: result= ageLabel.apply(u) + " FEMALE" + " : " + u.name;
        }
        return result;
    };
Function<User,Collection<Product>> getProducts = u-> {
        if(u.gender==Gender.MALE && u.age>18 )
            return u.products;
        else
            return new ArrayList<>();
    };

And finally let's see how all this functional machinery can help us!

//composition!!
Function<User, String> productDisplayer = getProducts.andThen(displayProducts.apply(adultPolicy));


public String complexMethod(User u){
     return introduce.apply(u) + productDisplayer.apply(u);
}

This looks wonderful - and now good and bed news.

We have reduced Complexity of our lab function to CC=1 but unfortunately Sonar is unable to measure complexity inside functions. I tried Sonar 4.X and Sonar 5.X - both without success. The only solution I found to have proper CC measurements is to use method references.

static String introduceMethod(User u){
        String result="";
        switch(u.gender) {
            case MALE:
                result= ageLabel.apply(u) + "pełnoletni" + " : " + u.name;
                break;
            case FEMALE: result= ageLabel.apply(u) + "pełnoletnia" + " : " + u.name;
        }
        return result;
    }
Function<User,String> introduce=BlogComplexityFP::introduceMethod;

We saw how to chandle complexity in Java both in OOP and FP way - now let's quickly check how to measure complexity in the Scala world.

Scala Measurements

I believe that Scala main quality tool is Scala Style. It has plugin for sonar but is not as rich as the one for Java. Generally in Scala Style we can set acceptable Cyclomatic Complexity level and if code exceeds it then a warning will be risen.

So if I have this ugly piece of code

def complexMethod(u: User): String = {
    if (u.age > 18) {
      if (u.gender == MALE) {
        var value: String = "pełnoletni : " + u.name
        for (p <- u.products) {
          if (p.category != Category.ADULT) {
            value += "i ma produkt " + p.name + ","
          }
        }
        value
      }
      else {
        "pełnoletna : " + u.name
      }
    }
    else {
      if (u.gender eq MALE) {
        "niepełnoletni : " + u.name
      }
      else {
        "niepełnoletnia : " + u.name
      }
    }
  }

Then I will only receive a warning

More info will be displayed in the sbt console

ProceduralExample.scala:26:6: Cyclomatic complexity of 6 exceeds max of 1

And finally Scala code with CC=1

object FPExample {

  object Gender extends Enumeration{
    type Gender=Value
    val MALE,FEMALE=Value
  }

  object Category extends Enumeration{
    type Category=Value
    val FITNESS,COMPUTER,ADULT=Value
  }

  import Gender._
  import Category._

  case class Product(name:String,category: Category)
  case class User(name:String,age:Int,gender:Gender,products:List[Product])


  val policy : Set[Category] => Category => Boolean =
    forbiddenCategories => category => !forbiddenCategories.contains(category)

  val adultPolicy = policy(Set(ADULT))

  def displayProducts(policy:Category=>Boolean)(ps:Seq[Product]):String=
    ps
      .filter(p=>policy(p.category))
      .map(" and have product " + _.name)
      .mkString(",")

  val agePrefix: User=>String= u =>
    if(u.age>17) "adult" else "child"

  val introduce : User=>String = _ match {
    case u @ User(name,_,MALE,_) => agePrefix(u) + " Male :" + name
    case u @ User(name,_,FEMALE,_) => agePrefix(u) + " Female :" + name
  }

  val getProducts: User=>Seq[Product]= _ match {
    case User(_,age,MALE,products) if age>18 => products
    case _ => Seq[Product]()
  }

  val productDisplayer=getProducts andThen displayProducts(adultPolicy)

  def complexMethod(u: User): String = introduce(u) + productDisplayer(u)
}

Is it a complex or a simple code? It's difficult to say because Cyclomatic Complexity tends to be not very helpful when lambdas enter the scene. In a couple paragraph below we will think about possible solution.

Non Linear Nature of Complexity

Ok after all those examples and exercises let's try to understand non linear nature of Complexity. We saw that when we are moving logic to external components or functions - we can make them context unaware to some degree. Then they are receiving information about context they are being used in by various types of parametrization.

So if for example if I have a Component A with CC=2 and I want to use this component in some Component B then this action does not raise CC of Component B. Of course I assume that Component A was properly tested and it behaves in predictable way

Now when I would take logic our from Component A and paste it directly into Component B then most likely I can not use this piece of code in Component C because it is already embedded in Context B. This way I raised CC by 4 with the same piece of code.

If it is not clear yet then let's return to this part :

if(age>18){
   if(MALE){...} //1
   else{...}
}else{
   if(MALE){...} //2
   else{...}
}
Both ifs //1 and //2 are similar but they are aware than one is in the context of age>18 and second one in age<=18. So every occurence of similar code will raise CC by 2.

OOP & FP

I hope I also managed to show that proper usage of both OOP or FP mechanism reduces complexity. According to my observation programmers generate complexity mainly not by using wrong paradigm but by using any paradigm in a wrong way - in shorter words - first learn OOP or FP properly and only then discuss FP vs OOP (and of course I should do this too).

Psychological Aspect

Till now we discussed how code complexity influence code itself but can we actually check how this complexity affects developers?

Actually there was an interesting experiment more than half century ago - an experiment which described borders of humans' cognitive limits. Experiment description can be found here : The magic number seven plus/minus two. According to researcher - George A. Miller - Human being can process 7 +/- 2 chunks of information at once. You can find more info on wikipedia to better understand what exactly is chunk of information

The answer to question "why developers have to write code with bugs" - because developers are (still) humans ;)

Summary

I hope that this "complexity adventure" will be helpful to those who are curious how their code obtain "arrow shape" and where complexity is lurking. Also we checked how usage of FP influences Cyclomatic Complexity and a limitation of procedural metrics in FP world. It opens up an interesting question about how to measure complexity of functional code. Very often functional constructs build solution without ifs and fors which are base for CC calculation.

What can we do?

  • build more intuition around scala style rules
  • research functional programming complexity measurements papers
  • Some materials for further investigation: