Luca Ongaro

Web Engineer

Scala Pattern Matching and Validation of Structured Input

Pattern Matching is for sure one of the most powerful features in Scala (kindly borrowed from Haskell). Nonetheless, developers approaching the language for the first time might sometimes wonder where it is actually useful in real life. In this post, I describe one particular real life scenario in which Pattern Matching shines: validation of structured input.

For the purpose of the argument, suppose we have an application that deals with bank accounts, expressed as IBAN codes (International Bank Account Numbers). IBAN can have different formats depending on the country, but they all follow certain rules:

  • They begin with a country code followed by two digits used for the checksum calculation
  • They must only contain digits and the 26 latin alphabetic characters from A to Z
  • It is possible to validate them by calculating a checksum

Imagine that our system needs to accept an IBAN from user input, validate it, and decompose it into its part so that we can use it. One naive way would be to create a wrapper class with different methods that takes care of validation and access to the various components of the IBAN code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// Suboptimal example.
// A better solution is explained after
class IBAN( code: String ) {
  def isValid: Boolean = {
    // Perform format validation
    // and checksum validation here...
  }

  def country = {
    code.take( 2 )
  }

  def account = {
    code.drop( 4 )
  }
}

This imperative style is quite brittle. Whenever we accept an IBAN we want to validate and decompose it before use, so this solution would lead to a big amount of repetition and the risk of forgetting to handle invalid codes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Assume userInput is a user provided string,
// like "GB82123876b87876b" or "invalidIBAN"
val iban = new IBAN( userInput )

if ( iban.isValid ) {
  val account = iban.account
  val country = iban.country

  if ( country == "DE" )
    // ...German IBAN, do something
  if ( country == "IT" )
    // ...Italian IBAN, do something else
  else
    // ...IBAN from other country
} else {
  // Handle invalid IBAN. What if we forget?
}

We can improve a lot on this by creating a custom extractor, enabling pattern matching on valid IBAN strings (explaining custom extractors is not in the scope of this post, but if needed you can read this awesome article):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
object IBAN {
  def unapply( code: String ): Option[(String, String)] = {
    // Match a Regex for format
    // validation and decomposing
    val IBANPattern = """([A-Z]{2})(\d{2})([A-Z0-9]{12,27})""".r

    code match {
      case IBANPattern( country, check, account ) =>
        if ( checksumIsValid( country, check, account ) )
          // The `check` part is only used for checksum
          // validation, no need to return it
          Some( (country, account) )
        else
          None
      case _ => None
    }
  }

  private def checksumIsValid(
    country: String,
    check:   String,
    account: String
  ): Boolean = {
    // Perform checksum validation here...
  }
}

This extractor makes it possible to match a string and, in case it is a valid IBAN, extract its components. Pattern matching in this situation makes handling of valid and invalid IBAN very explicit:

1
2
3
4
5
6
7
8
// Assume userInput is a user provided string,
// like "GB82123876b87876b" or "invalidIBAN"
userInput match {
  case IBAN( country, account ) =>
    // ...valid IBAN, do something with it
  case _ =>
    // ...invalid IBAN, provide proper feedback.
}

Note how the compiler would not have let us forget to handle the invalid case. Also, we can easily handle accounts from different countries separately:

1
2
3
4
5
6
7
8
9
10
userInput match {
  case IBAN( "DE", account ) =>
    // ...valid German IBAN
  case IBAN( "IT", account ) =>
    // ...valid Italian IBAN
  case IBAN( country, account ) =>
    // ...valid IBAN from another country
  case _ =>
    // ...invalid IBAN
}

The code that handles validation and decomposition stays in one single place, only responsible of these two operation that always go together. We also removed the need of a wrapping class, so we only work with plain strings. Freed from validation/decomposition concerns, the rest of the application code also becomes very legible, with a clear and symmetric way to handle the case of invalid input.

Comments