I recently started the scodec project to simplify handling of binary data in Scala. From the project page:

This library focuses on contract-first and pure functional encoding and decoding of binary data. The following design constraints are considered:

- Binary structure should mirror protocol definitions and be self-evident under casual reading
- Mapping binary structures to types should be statically verified
- Encoding and decoding should be purely functional
- Failures in encoding and decoding should provide descriptive errors
- Compiler plugin should not be used
As a result, the library is implemented as a combinator based DSL. Performance is considered but yields to the above design constraints.

This article will introduce some of the main abstractions in scodec and discuss some of the design decisions. Subsequent articles will discuss more interesting parts of scodec, including invariant functors and use of Shapeless.

## Terminology

*encoding*- converting a value in to its binary representation*decoding*- converting binary data in to a value*codec*- a value that can encode and decode values of a given Scala type

## Representing Binary

Binary data is passed around a lot in scodec. As such, a representation of binary data is needed. The representation must satisfy a number of constraints:

- Must be immutable
- Must support bit operations such as
`take(bitCount)`

,`drop(bitCount)`

, and typical bitwise operators (negation, and, or, xor, …) - Must support fast concatenation in order to handle encoding a number of values in to a single binary structure
- Must support fast
`take(bitCount)`

/`drop(bitCount)`

operations in order to allow fast decoding

Focusing on individual bits as the fundamental element instead of bytes is the most unique constraint. By focusing on bits instead of bytes or words, the combinator DSL is able to support a more natural description of bit fields.

`Array[Byte]`

is an unsuitable representation given that it meets none of the stated constraints. `java.nio.ByteBuffer`

does not provide a way to enforce immutable at the type level and fails on the other constraints. `Vector[Byte]`

satisfies all constraints except for providing bit operations.

To satisfy these constraints, scodec includes the `BitVector`

and `ByteVector`

data types and the `BitwiseOperation`

supporting trait. This approach allows the API provided to codec authors to be more focused than generic collections. Additionally, the backing implementation of `BitVector`

and `ByteVector`

can be changed for performance reasons without impacting users.

### BitwiseOperations

`BitwiseOperations`

is a supporting trait that’s extended by both `BitVector`

and `ByteVector`

. It provides bitwise shifting operators as well as bitwise negation, bitwise and, bitwise or, and bitwise xor.

```
trait BitwiseOperations[Repr] {
final def <<(n: Int): Repr = leftShift(n)
def leftShift(n: Int): Repr
final def >>(n: Int): Repr = rightShift(n, true)
final def >>>(n: Int): Repr = rightShift(n, false)
def rightShift(n: Int, signExtension: Boolean): Repr
final def unary_~(): Repr = not
def not: Repr
final def &(other: Repr): Repr = and(other)
def and(other: Repr): Repr
final def |(other: Repr): Repr = or(other)
def or(other: Repr): Repr
final def ^(other: Repr): Repr = xor(other)
def xor(other: Repr): Repr
}
```

### BitVector

`BitVector`

represents an immutable sequence of bits. By extending the `IndexedSeqOptimized[Boolean, BitVector]`

trait from the Scala collection library, a bit vector can be used as a collection of booleans. As a bonus, most of the implementation is provided by `IndexedSeqOptimized`

- albeit, sometimes inefficiently.

```
trait BitVector extends IndexedSeqOptimized[Boolean, BitVector] with BitwiseOperations[BitVector] {
...
def toByteVector: ByteVector
def toByteArray: Array[Byte]
def toByteBuffer: ByteBuffer
}
```

Constructing a bit vector can be done in a number of ways. The `apply`

method on the bit vector companion allows direct wrapping of a `ByteVector`

, `Array[Byte]`

, or `java.nio.ByteBuffer`

. To create a bit vector from literal bytes, the literal bytes can be passed directly to apply. For example, `BitVector(0x55, 0x2a)`

.

When decoding, many codecs need to take a certain number of bits and convert them to a value. For example, a codec for a signed 32-bit integer might take 32 bits and return those bits converted to an `Int`

along with the remaining bits in the input vector (via `input.drop(32)`

). `BitVector#take`

is insufficient for this task because taking `n`

bits from a `m`

bit vector where `n > m`

results in `m`

bits being returned. As a result, every codec would need similar error handling for this case.

To simplify this case, `BitVector`

has an `acquire`

method, which gets a bit vector of exactly `n`

bits or returns an error.

```
def acquire(n: Int): String \/ BitVector = {
if (size < n) \/ left s"cannot acquire $n bits from a vector that contains $size bits"
else \/ right take(n)
}
```

Note the use of Scalaz’s disjunction (i.e., `\/`

) data type, instead of `Option[BitVector]`

or throwing an exception. By providing a disjunction that has a descriptive error message, overall error reporting is improved. (In general, I think Scalaz’s disjunction is one of the most undervalued parts of Scalaz.)

The `acquire`

method only handles getting the necessary bits. Converting the acquired bits to a value may fail. Combining bit acquisition with failure checked type conversion can be refactored in to:

```
def consume[A](n: Int)(decode: BitVector => String \/ A): String \/ (BitVector, A) = for {
toDecode <- acquire(n)
decoded <- decode(toDecode)
} yield (drop(n), decoded)
```

### ByteVector

`ByteVector`

represents an immutable sequence of bytes.

```
trait ByteVector extends IndexedSeqOptimized[Byte, ByteVector] with BitwiseOperations[ByteVector] {
...
def toArray: Array[Byte]
def toByteBuffer: ByteBuffer = ByteBuffer.wrap(toArray)
def toBitVector: BitVector = BitVector(this)
def toHexadecimal: String
}
```

Unsurprisingly, `ByteVector`

is currently implemented by delegating to a `Vector[Byte]`

. There are a number of potential improvements, including:

- for vectors with length <= 8, use a special byte vector implementaiton backed by a single
`Long`

value - delegate to Akka’s
`ByteString`

- write a custom vector-like structure that’s manually specialized for bytes

Before any of these performance improvements are implemented, benchmarks will be developed in order to measure impacts.

## Codecs

With the basic data types out of the way, we can concretize the definition of a codec.

Encoding is represented by a function `A => String \/ BitVector`

. By modeling encoding this way, a failure to encode a value of type `A`

can be communicated without throwing an exception. For example, given a `Codec[Int]`

that encodes unsigned 24-bit integers, calling encode with a negative integer would return an error message on the left side of the disjunction.

Decoding is represented by a function `BitVector => String \/ (BitVector, A)`

. Similar to the rationale for encoding, a failure
to decode a value of type `A`

results in an error message on the left side of the disjunction. Successful decoding results in a tuple of the remaining (non-consumed) bits and the decoded value.

```
trait Codec[A] {
def encode(a: A): String \/ BitVector
def decode(b: BitVector): String \/ (BitVector, A)
}
```

### Tuple Codec

Consider a codec for a tuple `(A, B)`

:

```
class TupleCodec[A, B](codecA: Codec[A], codecB: Codec[B]) extends Codec[(A, B)] {
override def encode(t: (A, B)) =
Codec.encodeBoth(codecA, codecB)(t._1, t._2)
override def decode(buffer: BitVector) =
Codec.decodeBoth(codecA, codecB)(buffer)
}
```

Let’s implement `Codec.encodeBoth`

and `Codec.decodeBoth`

.

First, encoding:

```
def encodeBoth[A, B](codecA: Codec[A], codecB: Codec[B])(a: A, b: B): String \/ BitVector =
???
```

Encoding is fairly straightforward. First, `a`

is encoded, then `b`

is encoded, and then the resulting vectors are concatenated. Any error results in the overall encoding failing. This is easily accomplished with a for-comprehension:

```
def encodeBoth[A, B](codecA: Codec[A], codecB: Codec[B])(a: A, b: B): String \/ BitVector =
for {
encA <- codecA.encode(a)
encB <- codecB.encode(b)
} yield encA ++ encB
```

Second, decoding:

```
def decodeBoth[A, B]
(codecA: Codec[A], codecB: Codec[B])
(buffer: BitVector): String \/ (BitVector, (A, B)) = ???
```

Decoding is accomplished by first decoding with `codecA`

, then, assuming successful decoding, decoding the remaining bits with `codecB`

, and finally returning both decoded values in a tuple. Any error results in the overall decoding failing. This could be implemented directly:

```
def decodeBoth[A, B]
(codecA: Codec[A], codecB: Codec[B])
(buffer: BitVector): String \/ (BitVector, (A, B)) = {
val decA = codecA.decode(buffer)
decA flatMap { case (afterA, a) =>
val decB = codecB.decode(afterA)
decB map { case (afterB, b) => (afterB, (a, B)) }
}
}
```

We can do better than this though. Note the type signature of `Codec#decode`

: `BitVector => String \/ (BitVector, A)`

. The `scalaz.StateT`

monad transformer wraps functions of the form `S => F[(S, A)]`

for a monad `F`

. We can set `F`

to `String \/ ?`

resulting in `StateT[({type λ[+α] = Error \/ α})#λ, BitVector, A]`

. This pattern occurs often enough when working with codecs to warrant special support:

```
type DecodingContext[+A] = StateT[({type λ[+α] = Error \/ α})#λ, BitVector, A]
object DecodingContext {
def apply[A](f: BitVector => Error \/ (BitVector, A)): DecodingContext[A] =
StateT[({type λ[+α] = Error \/ α})#λ, BitVector, A](f)
}
```

Returning to `decodeBoth`

, we can use `DecodingContext`

to implement it simply:

```
def decodeBoth[A, B]
(codecA: Codec[A], codecB: Codec[B])
(buffer: BitVector): String \/ (BitVector, (A, B)) = {
for {
a <- DecodingContext(codecA.decode)
b <- DecodingContext(codecB.decode)
} yield (a, b)
}.run(buffer)
```

## Wrap up

This article presented the basic data types used in scodec. In part 2, we’ll look at the `Codec`

trait in more detail and investigate how to convert a `Codec[A]`

in to a `Codec[B]`

given some relations between `A`

and `B`

.