Das Miscellany

On The Countdown Numbers Game

2024-04-11T00:00:00+00:00

Game Shows

Ah, TV game shows. We all have our favorites, and one of mine is Countdown. Well, actually I much prefer 8 Out of 10 Cats Does Countdown really. It has all the good stuff from Countdown while also having so much more. It also introduced another game into the mix which I will mention in due course.

Countdown has two main games: The word game and the numbers game, both with two players. The first one is OK but I’ve never really been a word guy. My brain just doesn’t really work that well with words. Ah, but the numbers game … that’s the juice. The contestant in control chooses six of 24 shuffled face-down number tiles, arranged into two groups: 20 “small numbers” (two each of 1 to 10) and four “large numbers” (25, 50, 75 and 100). Then an off-stage computer (triggered via a suitably theatrical button) generates a three digit target number. Then it’s up to the contestants to combine the numbers, along with the standard arithmetic operators (+, -, *, /), in order to come up with an expression that will evaluate to the target number. They have 30 seconds to do this, enforced by the most famous countdown timer jingle in all of TV.

Sometimes the numbers game is easy but sometimes it can be fiendishly hard with a complicated solution. For example, this one …

target  = 952
numbers = 3 6 25 50 75 100

… has only two solutions, but even so, was famously solved by a clever chap in quite some style.

The TV game is a mental challenge but this is the sort of thing that naturally begs for a computer solver, and that is the subject of today’s post. Onward!

That Other Game

Oh yes, the other game, from Cats Does Countdown. It’s called Carrot in a Box and it’s ace. It was so good in fact that they played it again. And again.

Initial Approach and Attempts

There are no clever shortcut solutions here, this calls for a brute force search through the space of possible expressions with a focus on efficiency. I’ve been thinking about this for some time and had a few runs at coding up something, in Python.

My first attempt worked but was slow. I came up with the idea of writing a Reverse Polish Notation (RPN) calculator and then generating all possible RPN expressions from the six numbers and four arithmetic operators, evaluating each expression in turn and then seeing which ones matched the target number. I thought that RPN notation would be useful since it avoided any issues of operator precedence in the evaluation of the expressions. This implementation was very memory inefficient though.

Then I found some Countdown Numbers Game solvers online (e.g. this one) and I saw that they could solve the game almost instantly. That one was implemented in JavaScript and I could find the code with a little “view page source” action. It gave me some ideas for a different approach which I then ran with in order to come up with a much faster solver. And I learnt some things along the way.

My Solution

My new solution is still brute force but it involves a much simpler recursive search. Here’s the core function …

#
# The main recursive routine to search the space of possible expressions and find those that evaluate
# to the target
#
def _find_expressions_for_target(expressions, target, found_fn):
  first_expression = expressions[0]
  if _get_expression_value(first_expression) == target:
    if found_fn(first_expression):
      return 1
    
  for i in range(len(expressions) - 1):
    ei = expressions[i]
    for j in range(i + 1, len(expressions)):
      ej = expressions[j]
      for op_key in _operators.keys():
        valid, new_expression = _do_operation(op_key, ei, ej)
        if not valid: continue
        new_expressions = _new_expression_list(expressions, new_expression, i, j)
        if _find_expressions_for_target(new_expressions, target, found_fn):
          return 1

This assumes that the expressions argument contains a list of expressions where each expression is either a number or a tuple of the form (number, operator, lhs_expression, rhs_expression). Nested expressions are supported, and are actually required as part of the logic.

I follow the canonical structure of a recursive function …

Check for my base case, i.e. that the first expression in the list has a value equal to the target
Proceed to enumerate all the ways that I can “reduce” the expression list by one (combine two elements via an operation to create a new element and move everything else down) and call myself with each of those smaller lists

I abstracted out what the user wants to do with a found solution expression by requiring them to pass in a function that I will call with the solution expression as its argument. I also allow the user to control whether they are happy with the first solution expression found or whether they want to continue and find all possible solution expressions. If their function returns true (1, True, or any other value that Python considers “true”) then they are “one and done”, whereas if they return false then we keep going.

I store all the possible operators in a dict structure …

def _swap_args(a, b):
  return b, a

def _args(a, b):
  return a, b

#
# A dict to contain info about possible operators
#
_operators = {
  'ADD' : ['+', lambda a, b: a + b, _args],
  'SUB' : ['-', lambda a, b: a - b, _args],
  'SUB2' : ['-', lambda a, b: b - a, _swap_args],    # Subtraction with the args swapped
  'MUL' : ['*', lambda a, b: a * b, _args],
  'DIV' : ['/', lambda a, b: a / b, _args],
  'DIV2' : ['/', lambda a, b: b / a, _swap_args],    # Division with the args swapped
}

… and use a trick that I learnt from the online JavaScript solver that I found, to consider subtraction and division two ways (as “a - b” and “b - a”, and as “a / b” and “b / a”). I store the actual operation logic in the dict value as a lambda and also store a function that will swap the order of the operands if I am using one of my “reverse” sub/div operations.

Then my _do_operation function is this …

#
# Perform an operation, if valid.  Returns a bool (indicating whether the expression was valid) and
# a tuple containing (result, op, lhs, rhs).
#
def _do_operation(op_key, lhs, rhs):
  lhs_value = _get_expression_value(lhs)
  rhs_value = _get_expression_value(rhs)
  if not _is_valid_operation(op_key, lhs_value, rhs_value):
    return 0, None

  op_symbol, op_fn, args_fn = _operators[op_key]
  result = op_fn(lhs_value, rhs_value)
  lhs, rhs = args_fn(lhs, rhs)             # Potentially swap the args, for DIV2 and SUB2
  return 1, (result, op_symbol, lhs, rhs)

… which uses some other helper functions …

#
# Get the value from an expression tuple of the form (value, op, lhs, rhs)
#
def _get_expression_value(expression):
  return expression[0] if type(expression) is tuple else expression


#
# Is a given operation valid according to our use case?
#
def _is_valid_operation(op_key, lhs, rhs):
  # Skip division by zero or any division that results in a remainder
  if(op_key == 'DIV' and (rhs == 0 or lhs % rhs != 0)): return 0
  if(op_key == 'DIV2' and (lhs == 0 or rhs % lhs != 0)): return 0

  # Skip any subtraction that results in a negative number
  if(op_key == 'SUB' and rhs > lhs): return 0
  if(op_key == 'SUB2' and lhs > rhs): return 0

  # It's a legit operation
  return 1

The _new_expression_list function is the important one for recursion …

#
# Create a new list of expressions from the old one, a new expression and the indexes of the args
# to that new expression
#
def _new_expression_list(expressions, new_expression, i, j):
  # Replace the element at position i with the new expression and exclude the element at position j
  result = expressions[0:i] + [new_expression] + expressions[i + 1:j]
  # Concatenate any elements after position j
  if j < len(expressions):
    result += expressions[j + 1:]
  return result

Above I said that I …

… proceed to enumerate all the ways that I can “reduce” the expression list by one (combine two elements via an operation to create a new element and move everything else down) and call myself with each of those smaller lists

At a given level of recursion the expressions argument to the _find_expressions_for_target function references a list of expressions.

expressions = [ e0, e1, e2, ..., eN ]

We will enumerate all pairs of elements of this list with a double loop and then a loop over all possible operators …

  for i in range(len(expressions) - 1):
    ei = expressions[i]
    for j in range(i + 1, len(expressions)):
      ej = expressions[j]
      for op_key in _operators.keys():
        ...

The variable i takes values from 0 (the index of the first element in the list) through one less than the index of the last element in the list, and j takes values from i + 1 (the index of the next element in the list after the ith element) through the index of the last element in the list. As such (i, j) are the indexes of each pair of elements in the list. Then for each of these pairs we combine them using each of the possible operators.

For each ith expression (ei), jth expression (ej) and operator (op), we calculate result (r = lhs op rhs) where (lhs, rhs) could be (ei, ej) or (ej, ei) based on the operator (we have those SUB2 and DIV2 operators remember) and then create a new expression list by replacing ei with [r, op, lhs, rhs] and removing ej. Here’s some examples …

expressions     = [ e0, e1, e2, e3, e5, e6 ]
                    ^   ^
                    ei  ej
new_expressions = [ [ r, op, lhs, rhs ], e2, e3, e4, e5, e6 ]


expressions     = [ e0, e1, e2, e3, e4, e5, e6 ]
                    ^       ^
                    ei      ej
new_expressions = [ [ r, op, lhs, rhs ], e1, e3, e4, e5, e6 ]


expressions     = [ e0, e1, e2, e3, e4, e5, e6 ]
                            ^           ^
                            ei          ej
new_expressions = [ e0, e1, [ r, op, lhs, rhs ], e3, e4, e6 ]


expressions     = [ e0, e1, e2, e3, e4, e5, e6 ]
                            ^               ^
                            ei              ej
new_expressions = [ e0, e1, [ r, op, lhs, rhs ], e3, e4, e5 ]

In each case len(num_expressions) is one less than len(expressions). This facilitates our recursion.

How Does It Work?

It works pretty well. It’s practically instantaneous to find a single solution to any set of inputs and it’ll find all solutions in a couple of seconds.

Handling found solution expressions is delegated to the user of the module. Initially I wrote a simple CLI that just printed out the solution in each call to my found function. But this resulted in some duplicate expressions. E.g. …

[adavies@bob ~/countdown-numbers]$python3 ./countdown_numbers_solver_cli.py  952 3 6 25 50 75 100 -all
target = 952, numbers = [3, 6, 25, 50, 75, 100]
((((3 * 75) * (6 + 100)) - 50) / 25)
(((((3 + 100) * 6) * 75) / 50) + 25)
(((((3 + 100) * 75) * 6) / 50) + 25)
((((3 + 100) * (6 * 75)) / 50) + 25)
(((3 + 100) * ((6 * 75) / 50)) + 25)
((((3 + 100) * (6 * 75)) / 50) + 25)
(((3 + 100) * ((6 * 75) / 50)) + 25)
(((3 + 100) * ((6 * 75) / 50)) + 25)
((((3 * (6 + 100)) * 75) - 50) / 25)
((((3 * 75) * (6 + 100)) - 50) / 25)
(((3 * ((6 + 100) * 75)) - 50) / 25)
[adavies@bob ~/countdown-numbers]$

You can see in the above that solutions 1 and 10 are the same. Also solutions 7 and 8. Then, solutions 2 and 3, although not having exactly the same infix representation, are logically the same given the associative/commutative properties of multiplication.

How can we get exact duplicates? It’s because we can take different paths through the space to get to the same expression. Here’s an example using different numbers.

target = 251, numbers = [2, 4, 5, 6, 50]

Path 1
  level 1
  expressions: [2, 4, 5, 6, 50]
  i = 0, j = 3, op = SUB2, ei = 2, ej = 6
  2 SUB2 6 = 4
    level 2
    expressions: [[4, -, 6, 2], 4, 5, 50]
    i = 0, j = 1, op = DIV, ei = [4, -, 6, 2], ej = 4
    4 DIV 4 = 1
      level 3
      expressions: [[1, /, [4, -, 6, 2], 4], 5, 50]        <- Different: ((6 - 2) / 4) before (5 * 50)
      i = 1, j = 2, op = MUL, ei = 5, ej = 50
      5 MUL 50 = 250
        level 4
        expressions: [[1, /, [4, -, 6, 2], 4], [250, *, 5, 50]]
        i = 0, j = 1, op = ADD, ei = [1, /, [4, -, 6, 2], 4], ej = [250, *, 5, 50]
        1 ADD 250 = 251
          level 5
          expressions: [251, ADD, [[1, /, [4, -, 6, 2], 4], [250, *, 5, 50]]]

Path 2
  level 1
  expressions: [2, 4, 5, 6, 50]
  i = 0, j = 3, op = SUB2, ei = 2, ej = 6
  2 SUB2 6 = 4
    level 2
    expressions: [[4, -, 6, 2], 4, 5, 50]
    i = 0, j = 1, op = DEV, ei = [4, -, 6, 2], ej = 4
    4 DIV 4 = 1
      level 3
      expressions: [[4, -, 6, 2], 4, [250, *, 5, 50]]      <- Different: (5 * 50) before ((6 - 2) / 4)
      i = 1, j = 2, op = MUL, ei = 5, ej = 50
      5 MUL 50 = 250
        level 4
        expressions: [[1, /, [4, -, 6, 2], 4], [250, *, 5, 50]]
        i = 0, j = 1, op = ADD, ei = [1, /, [4, -, 6, 2], 4], ej = [250, *, 5, 50]
        1 ADD 250 = 251
          level 5
          expressions: [251, ADD, [[1, /, [4, -, 6, 2], 4], [250, *, 5, 50]]]

So, how can we deal with this? I don’t want these duplicates. Filtering out actual duplicate expression strings is easy but I don’t like the idea of writing code to apply associative/commutative rules to find the other kind of dupes.

But wait, someone must have written code to do this sort of thing already right? Yes indeed. Because of the depth of the Python community there happens to be a symbolic math library called SymPy, and it exposes functionality to parse expression strings into a standard expression tree form and also to format those trees in various ways. We can use this to create a canonical representatiion for each solution expression that can be used as the key in a map to remove duplicates. I love it when someone else solves my problems for me.

We can use it like this …

import countdown_numbers_solver as solver
import sys
import sympy
import os


_solutions = {}    # A dict to eliminate duplicate solution expressions


def _found_expression(expression, only_one):
  infix_expression = _format_expression_infix(expression)
  key = sympy.srepr(sympy.sympify(infix_expression, evaluate = False))    # Canonical expression as key
  if not key in _solutions:
    _solutions[key] = infix_expression
  return only_one


def _is_number(expression):
  return 0 if type(expression) is tuple else 1


def _format_expression_infix(expression):
  if _is_number(expression):
    return expression 
  op_symbol = expression[1]
  lhs = _format_expression_infix(expression[2])
  rhs = _format_expression_infix(expression[3])
  return f'({lhs} {op_symbol} {rhs})'


def main():
  if len(sys.argv) < 3:
    _print_usage()
    return 1
  
  target = int(sys.argv[1])
  numbers = [int(s) for s in sys.argv[2:] if s[0] != '-']
  only_one = 0 if '-all' in sys.argv[2:] else 1

  numbers.sort()    # Don't have to do this but let's be tidy shall we?

  print(f'target = {target}, numbers = {numbers}')

  solver.find_expressions_for_target(numbers, target, lambda e: _found_expression(e, only_one))

  for solution in _solutions.values():
    print(f'{target} = {solution}')
  return 0


if __name__ == "__main__":
  exit(main())

This is much better …

[adavies@bob ~/countdown-numbers]$python3 ./countdown_numbers_solver_cli.py 952 3 6 25 50 75 100 -all
target = 952, numbers = [3, 6, 25, 50, 75, 100]
952 = ((((3 * 75) * (6 + 100)) - 50) / 25)
952 = (((((3 + 100) * 6) * 75) / 50) + 25)
[adavies@bob ~/countdown-numbers]$

As I said above, there are only two solutions for this classic set of numbers, and my code can find both of them in ~3 seconds. Human players get 30.

I’m done with this game for now. I’m off to get two boxes and a carrot.

On Fizz Buzz

2024-03-30T00:00:00+00:00

Coding Interviews

Coding interviews are typically horrible. Rather than evaluating the useful knowledge and experience of a candidate, and their ability to contribute to real world projects, they’re invariably just invitations to write code to solve arbitrary algorithm and data structures puzzles, and under time and situational pressure to boot.

Now don’t get me wrong, such puzzles are fun and there is something to be said for asking a candidate to actually write code as part of an interview; but more often than not, the ability to write code - on demand - to reverse a linked list, is not an accurate indication of someone’s ability to contribute as a member of a dev team in most real world siutations. But I digress, this isn’t a post about how to interview more effectively - although I should write such a thing one day - it’s about coding puzzles, and more importantly, abstraction and code style.

Fizz Buzz

A popular, some might say hackneyed, coding interview puzzle is Fizz Buzz. Here’s the ask …

Write code that will print out the numbers 1 to 100, in order, with each number on a new line. However, if the number is divisible by 3 then print out the word ‘Fizz’ instead, if the number is divisible by 5 then print out the word ‘Buzz’ instead, and if the number is divisible by both 3 and 5 then print out the word ‘FizzBuzz’ instead.

Simple enough right? Let’s give it a try, in Python.

for n in range(1, 101):
  if n % 3 and n % 5 == 0:
    print('FizzBuzz')
  elif n % 3 == 0:
    print('Fizz')
  elif n % 5 == 0:
    print('Buzz')
  else:
    print(n)

This works but it’s not pretty. We repeat the strings to print and have three cases to test. Can we do it with simpler logic? First let’s try to remove the duplication and combine the cases.

for n in range(1, 101):
  if n % 3 == 0:
    print('Fizz', end = '')
  if n % 5 == 0:
    print('Buzz')
  else:
    print(n)

This doesn’t quite meet the spec though. It produces this output …

1
2
Fizz3
4
Buzz
Fizz6
7
8
Fizz9
Buzz
11
Fizz12
13
14
FizzBuzz
16

We need to code for printing a newline after ‘Fizz’ if the number is divisible by 3 but not divisible by 5, something like …

for n in range(1, 101):
  if n % 3 == 0:
    print('Fizz', end = '')
    if n % 5 != 0:
      print()
  if n % 5 == 0:
    print('Buzz')
  else:
    print(n)

This does away with the duplicate strings but still has more logic cases than I’d like. Let’s think about this a bit more. We want to print something for each of the numbers 1 through 100, so let’s abstract out that something.

def fizz_buzz(n):
  return f'Output for n = {n}'

for n in range(100):
  print(fizz_buzz(n + 1))

This has a slightly cleaner loop. we don’t need to use two parameters to the range function, just one will give us the numbers 0 to 99 but we can then pass n + 1 into our function. Also, the function doesn’t know or care that its result will be printed out so it doesn’t need to worry about newlines anymore.

We still need the modulo logic in the function though. Let’s see if we can factor out the 3 and 5, the logic is the same regardless of the divisor and the string to return. How do we factor that out? Another function!

def value_if_divisor(n, divisor, value):
  if n % divisor == 0:
    return value
  return ''

We can make this more concise.

def value_if_divisor(n, divisor, value):
  return value if n % divisor == 0 else ''

And use it like this …

def value_if_divisor(n, divisor, value):
  return value if n % divisor == 0 else ''

def fizz_buzz(n):
  ret = value_if_divisor(n, 3, value)
  ret += value_if_divisor(n, 5, value)
  return ret or str(n)

for n in range(100):
  print(fizz_buzz(n + 1))

But why limit ourselves to just two possible divisors? We could support an arbitrary list.

def value_if_divisor(n, divisor, value):
  return value if n % divisor == 0 else ''

def values_for_divisors_or_n(divisors, values, n):
  ret = ''
  for divisor, value in zip(divisors, values):
    ret += value_if_divisor(n, divisor, value)
  return ret or str(n)

for n in range(100):
  print(values_for_divisors_or_n((3, 5), ('Fizz', 'Buzz'), n + 1))

Note the use of the zip function to take two iterable collections and return a single iterator that gives you a pair of values for corresponding elements in turn. Zip is useful.

Do we always want to concatenate the replacement values? Could we abstract out the combination logic? Sounds like we need another function.

def value_if_divisor(n, divisor, value):
  return value if n % divisor == 0 else ''

def values_for_divisors_or_n(divisors, values, n, combine_fn):
  ret = ''
  for divisor, value in zip(divisors, values):
    ret = combine_fn(ret, value_if_divisor(n, divisor, value))
  return ret or str(n)

def concatenate(a, b):
  return a + b

for n in range(100):
  print(values_for_divisors_or_n((3, 5), ('Fizz', 'Buzz'), n + 1, concatenate))

So now we can use more complex combination logic without having to think about the rest of the logic.

...

def concatenate(a, b):
  sep = '' if a == '' or b == '' else '-'
  return a + sep + b

...

And for simple logic we can use a lambda.

...

for n in range(100):
  print(values_for_divisors_or_n((3, 5), ('Fizz', 'Buzz'), n + 1, lambda a, b: a + b))

This is pretty generic now but is it efficient? We need to allocate memory to hold the full list of divisors and replacement values. The range function returns an iterator so we don’t need to hold all the integers that we want to test in memory at the same time, we just hold each integer as we process it and then forget it.

For extra style points we could take advantage of the fact that print is a variadic function and use a generator expression along with the prefix ‘*’ unpack operator, like this …

print(*(values_for_divisors_or_n((3, 5), ('Fizz', 'Buzz'), n + 1, lambda a, b: a + b)
        for n in range(100)),
      sep = '\n')

… but this would then require all of the elements returned by the values_for_divisors_or_n function to be materialized in memory at the same time, so that we could then pass them all as parameters to the single call to the print function. This would not scale well with larger lists of integers.

So, let’s stick with calling print once for each integer. Here’s our final - generic and neat - Fizz Buzz solution.

def value_if_divisor(n, divisor, value):
  return value if n % divisor == 0 else ''

def values_for_divisors_or_n(divisors, values, n, combine_fn):
  ret = ''
  for divisor, value in zip(divisors, values):
    ret = combine_fn(ret, value_if_divisor(n, divisor, value))
  return ret or str(n)

for n in range(100):
  print(values_for_divisors_or_n((3, 5), ('Fizz', 'Buzz'), n + 1, lambda a, b: a + b))

So when the next tech interviewer asks you the Fizz Buzz question you can really impress them.

On Ants, Boxes and Straight Lines

2023-08-25T00:00:00+00:00

A Puzzle

I found this puzzle the other day, posted on Facebook by Wait But Why. It intrigued and entertained me, and so I wanted to share it here. Here’s the description …

“An ant is inside a box (see image above), 1cm from the bottom on the left face, and wants to go eat a drop of honey on the opposite side, 1cm from the top. Both the ant and the honey drop are exactly 6cm from the front and back walls. Crawling on the inside of the box, what’s the shortest distance the ant needs to travel to get to its prize?”

When presented with this most people start with a strategy like …

Walk 1 cm down to the middle bottom of the left face
Walk 30 cm across the middle of the bottom face
Walk 11 cm up the right face
Enjoy that lovely sweet honey dude

This makes for a total journey of 42cm. But this is not the shortest route, the ant can do better.

OK, how about some diagonals?

Walk 6cm across the left face, parallel to the lower face, to get to a point 1cm up from the front, lower, left corner
Walk diagonally across the front face to a point 1cm down from the front, top, right corner ($\sqrt{30^2 + 10^2}$ cm)
Walk 6cm across the right face, parallel to the lower face, to get to the honey
Honey, nom, nom!

This makes for a total journey of $12 + \sqrt{30^2 + 10^2} \approx 43.6$ cm. Hmm, that’s even longer.

This is all a bit random though, there must be a systematic way to think about this. Straight lines! The shortest distance between two points is a straight line. But that only works on “flat” surfaces. What’s a straight line on the inside of a box?

We need to transform our box into a flat surface. Let’s imagine that it was a real cardboard box so that we could cut it along some of its edges and then “unroll” it to get …

Aha! Now there is an obvious straight line path from the ant to the honey …

This path is $\sqrt{32^2 + 24^2} = 40$ cm long. Better! And this is indeed the shortest path.

So what does this look like on the box? Well, it looks like this …

And how would we write down a description of the route? The strategy is essentially to “cut corners”, but exactly how much? We can easily calculate the coordinates of the way points, and then just tell the ant to walk in a straight line to each point in turn, but something tells me that he won’t really listen and will just do his own thing. There’s no telling these ants.

Anyway, I thought this was a neat little exercise that required some “outside of the box” thinking, pun very much intended.

On Macs

2022-10-09T00:00:00+00:00

A New Macbook Pro

I used to like Macs. I still do, but I used to too. I was a PC guy for a long time (it was just what I was used to) but I finally took the plunge and have loved Macs ever since.

My first was a 2015 Macbook Pro 13. That was the one that hooked me with its sublime trackpad. Oh the trackpad … it was so smooth. Then I traded up to a 2017 Macbook Pro 15, the first with the Touch Bar. That was a gimmick that never really seemed that useful to me. In fact I think it was a huge mistake that Apple got rid of the physical function keys and especially the escape (Esc) key at the same time. Plus the keyboard wasn’t that good. The keys just weren’t tactile enough.

Keyboards are important because they are an essential part of the interface between the user and the computer, along with the mouse, trackpad, nub or whatever. If you type a lot, and we developer/admins do type a lot, then you care about the quality and feel of your primary typing tool and will want the best. People, myself included, pay good money for custom external keyboards that feel (and sound) just right.

I didn’t love that Macbook Pro but it was still way better than any PC laptop that I saw around. It was still serving me well but then I started seeing the new Apple Silicon products and I was intrigued. I got a new M1 Macbook Air for my son, who had been happily using the old 2015 13” Macbook Pro for school for some time, and I saw that it was good. When they launched the 14” and 16” Macbook Pros with M1 chips I was about ready to consider an upgrade. And here I am typing this post on my new Space Grey 16” M1 Macbook Pro.

It’s nice.

The keyboard is so much better. It’s tactile and feels good. There’s no more Touch Bar and we have real function/Esc keys again. Power users want function keys! The trackpad is still sublime, the battery life is good and the overall performance of the computer is just great. This is the best laptop that you can buy. I can see myself using this as my primary personal computer for some time.

Now, this isn’t the only computer that I will use. I will continue to use a Windows desktop as my main work computer and I will also continue to use many remote servers, both Windows and Linux. And this brings me to another point about the human computer interface and keyboards, namely that they are all subtly different. A Mac keyboard is not the same as a PC keyboard and also the OS keyboard conventions are different too, as are conventions in the Linux shell and cmd/PowerShell. We power users use keyboard shortcuts to navigate through all sorts of environments: the shell, a text editor, apps (e.g. Excel), etc. Moving from one environment to another requires challenging muscle memory gymnastics. I try to set things up so that the keyboard environments are as similar as possible. It’s impossible to get things to work the same everywhere but we can come up with a good compromise that I can live with.

How to get there is something that deserves its own post.

On Getting Back on the Horse ... Again

2022-09-15T00:00:00+00:00

I’ve written about about this before but it seems that I need to write about it again. It takes discipline to keep up a blog and clearly I don’t have any. I could blame COVID but given all that time in lockdown one would imagine that I would have had “more” time to write. I guess I got too used to binge watching shows on streaming services and drinking. Sigh.

OK, let’s get this thing started up again. I’ve got a bunch of topics in mind to write about, and I’ll try not to get distracted by anything; like my long put off project to build a big wooden badger …

On Perspective

2021-08-22T00:00:00+00:00

A long time ago in a galaxy far, far away (Tokyo, to be precise) my wife and I welcomed our first born into the world. His name is Edward but from the moment he first entered our lives he somehow became Ted/Teddy and such he remains to this day. For my birthday that year My Lovely Wife^(tm) gave me one of the best presents I have ever received, a little homemade photo album called “The Tao of Ted”, a collection of cute pictures of the boy accompanied by quotes from all sorts of people on the nature of life and happiness. I don’t think she knew just exactly how profound it was at the time but it encapsulates a collection of simple truths (perspectives) that we should all embrace.

It popped back up on my desk the other day after some tidying up around the house and I was reminded of just how special it is. It deserves to be held up and for its wisdom to be shared. Listen to The Ted, he knows what he’s talking about.

The Tao of Ted

Refections on a simple, happy life
by Ted, aged 6 1/2 months

Here's looking at you Dad!

My father used to play with my brother and me in the yard.
Mother would come out and say "You're tearing up the grass!"
"We're not raising grasss", Dad would reply. "We're raising boys"
~ Harmon Killebrew

Nothing quite beats a nap
~ Ted

Once my fingers were the most fascinating thing I'd ever felt
~ Ted

A father carries pictures where his money used to be
~ Unknown

When there's nothing but mess all around you, it helps to smile
~ Ted

My riches consist not in the extent of my possessions but in the fewness of my wants
~ J. Brotherton

Things turn out the best for people who make the best out of the way things turn out
~ Art Linkletter

To be upset over what you don't have is to waste what you do have
~ Ken Keys Jr

We are all in the gutter but some of us are looking at the stars
~ Oscar Wilde

He didn't tell me how to live; he lived, and let me watch him do it
~ Clarence Budington Kelland

If you don't get everything you want, think of the things that you didn't get that you didn't want
~ Unknown

A happy person is not a person with a certain set of circumstances but rather a person with a certain set of attitudes
~ Hugh Downs

I am an optimist. It does not seem too much use being anything else.
~ Winston Churchill

Enjoy the little things, for one day you may look back and realize they were the big things
~ Robert Brault

If you can't see things clearly, try looking at them updside down
~ Ted

Let us rise up and be thankful, for if we didn't learn a lot today, at least we learnt a little; and if we didn't learn a little, at least we didn't get sick; and if we got sick, at least we didn't die; so let us all be thankful.
~ The Buddha

Surprises can be fun!
~ Ted

It's all about perspective
~ Ted

"It's snowing still" said Eeeyore gloomily.
"So it is"
"And freezing"
"Is it?"
"Yes," said Eeyore. "However," he said, brightening up a little, "We haven't
had an earthquake lately."
~ A. A. Milne

On a Tricky Derivative

2021-04-27T00:00:00+00:00

If I have a function, f, defined as …

\[f(x) = x^x\]

Then what is its derivative $f'(x)$?

Well, the answer is $f'(x) = x^x(1 + ln(x))$ but how would we figure that out?

Tools

How to start here? None of our basic rules for derivatives apply. This isn’t a polynomial; it doesn’t involve triganometric functions; it’s not a traditional exponential with a constant base; it doesn’t involve logarithms.

Well, the answer does involve logarithms and we’ll soon see why. Logatithms invariably show up when deadling with exponential things. What other tools are in our toolbox for calculating derivatives? The product and quotient rules, the chain rule, etc. Do these help? They do actually and we can see how by introducing some other functions.

Approach #1

As an aside, let’s consider two new functions, g and h …

\[g(x) = ln(x)\] \[h(x) = g(f(x))\]

We know the derivative of the natural logarithm function and we can apply the chain rule to calculate the derivative of the composition of two functions. So …

\[g'(x) = \frac{1}{x}\] \[h'(x) = g'(f(x))f'(x)\]

Combining the two we get …

\[h'(x) = \frac{f'(x)}{f(x)} = \frac{f'(x)}{x^x}\]

We also know, from its definition, that …

\[h(x) = ln(x^x)\]

And we can apply one of the rules of logarithms to rewrite this as …

\[h(x) = xln(x)\]

Now we can apply the product rule for derivatives, namely, that if $c(x) = a(x)b(x)$ then …

\[c'(x) = a'(x)b(x) + a(x)b'(x)\]

To calculate $h'(x)$ as the following, using $a(x) = x$ and $b(x) = ln(x)$, …

\[h'(x) = 1\cdot{ln(x)} + x\cdot{\frac{1}{x}}\]

Which simplifies to …

\[h'(x) = ln(x) + 1\]

Now let’s put things together. We have …

\[h'(x) = 1 + ln(x)\]

And

\[h'(x) = \frac{f'(x)}{x^x}\]

And so …

\[\frac{f'(x)}{x^x} = 1 + ln(x)\]

And finally …

\[f'(x) = x^x(1 + ln(x))\]

QED

Approach #2

Let’s rewrite $f(x)$ as …

\[f(x) = e^{ln(x^x)} = e^{xln(x)}\]

Now we can apply the chain rule directly with $f(x) = g(h(x))$, $g(x) = e^x$ and $h(x) = xln(x)$ to get …

\[f'(x) = g'(h(x))h'(x)\] \[g'(x) = e^x\]

And via the product rule …

\[h'(x) = 1 + ln(x)\]

So …

\[f'(x) = e^{xln(x)}(1 + ln(x)) = x^x(1 + ln(x))\]

QED

Comments on the function

The function $f(x) = x^x$ is an interesting one. If we think of it as a function from the real numbers to the real numbers then it’s domain obviously includes $x \in (0, \infty)$. For all such real values it has a real value. Clearly $\lim_{x\to\infty} f(x) = \infty$ and its minimum value is where $f'(x) = 0$, i.e. where $x^x(1 + ln(x)) = 0$, $ln(x) = -1$, $x = \frac{1}{e}$. At this value of $x$, $f(x) = (\frac{1}{e})^{\frac{1}{e}} \approx 0.6922$. For $x \in (0, \frac{1}{e})$ $f'(x)$ is negative and so the value of $f(x)$ is decreasing.

So $f(x)$ decreases as $x$ increases over $x \in (0, \frac{1}{e})$ to a minimum value of $(\frac{1}{e})^{\frac{1}{e}}$ for $x = \frac{1}{e}$, and then increases as x increases over $x \in (\frac{1}{e}, \infty)$.

But what about $x = 0$? What is the value of $0^0$? Or more precisely what is $\lim_{x \to 0+} x^x$?

This is the same as asking what is $\lim_{x \to 0+} e^{xln(x)}$?

Because $e^x$ is such a well behaved continuous function we can say that …

\[\lim_{x \to 0+} e^{xln(x)} = e^{ \lim_{x \to 0+} xln(x) }\]

So let’s focus on the limit in the exponent now, i.e. $\lim_{x \to 0+} xln(x)$. We can rewrite this as …

\[\lim_{x \to 0+} xln(x) = \lim_{x \to 0+} \frac{ln(x)}{\frac{1}{x}}\]

Now, recall L’Hôpital’s rule that states that for sufficiently well behaved functions $f(x)$ and $g(x)$ …

\[\lim_{x \to 0+} \frac{f(x)}{g(x)} = \lim_{x \to 0+} \frac{f'(x)}{g'(x)}\]

We can apply this to our limit above and differentiate the numerator and denominator to see that …

\[\lim_{x \to 0+} \frac{ln(x)}{\frac{1}{x}} = \lim_{x \to 0+} \frac{\frac{1}{x}}{\frac{-1}{x^2}} = \lim_{x \to 0+} -x = 0\]

So …

\[\lim_{x \to 0+} x^x = \lim_{x \to 0+} e^{xln(x)} = e^{ \lim_{x \to 0+} xln(x) } = e^0 = 1\]

And finally we can extended the domain of our function to include $x = 0$ and say that $f(0) = 1$.

This is summarized in the chart from the top of this post, shown again below …

Now It All Gets a Bit Complex …

But what about values of $x^x$ for $x$ less than $0$? For the negative integers the function’s value is still real …

\[\frac{1}{(-1)^1}, \frac{1}{(-2)^2}, \frac{1}{(-3)^3}, \frac{1}{(-4)^4}, \frac{1}{(-5)^5}, ...\]

I.e.

\[-1, \frac{1}{4}, -\frac{1}{27}, \frac{1}{256}, -\frac{1}{3125}, ...\]

Or more generally, for $n \in [1, \infty)$ …

\[(-1)^n\frac{1}{n^n}\]

But what about for non-integer negative values of $x$? Here the function is not defined over the real numbers but it is defined over the complex numbers. When we step into the complex plane we see a beautiful continuous spiral appear with the complex value of $x^x$ rotating clockwise around the negative real axis for increasing negative real values of $x$. The top and bottom points on this spiral correspond to a zero imaginary component and the real values that we saw above for negative integer values of $x$.

The following chart illustrates this …

Quite beautiful.

On a Cool Logarithm Identity

2020-05-14T00:00:00+00:00

What is the value of this sum?

\[\frac{1}{log_{2}(100!)} + \frac{1}{log_{3}(100!)} + \frac{1}{log_{4}(100!)} + \ldots + \frac{1}{log_{100}(100!)}\]

Well, it turns out that the answer is just $1$. Isn’t that cool. Let’s prove it.

Prep

Let’s recall some basic logarithmic identitities. First the relationship between logs of different bases …

\[\log_{b}(a) = \frac{log_{c}(a)}{\log_{c}{b}}\]

And then the relationship between the log of the product of two numbers and the logs of those numbers …

\[\log(ab) = \log(a) + \log(b)\]

Proof

We are asked to show that …

\[\frac{1}{log_{2}(100!)} + \frac{1}{log_{3}(100!)} + \frac{1}{log_{4}(100!)} + \ldots + \frac{1}{log_{100}(100!)} = 1\]

We know that …

\[\log_{b}(a) = \frac{log_{c}(a)}{\log_{c}{b}}\]

So …

\[\log_{2}(100!) = \frac{log(100!)}{\log{2}}\]

Where $log$ here represents log to the base $e$.

Therefore …

\[\frac{1}{\log_{2}(100!)} = \frac{\log{2}}{log(100!)}\]

Similarly …

\[\frac{1}{\log_{3}(100!)} = \frac{\log{3}}{log(100!)}\] \[\frac{1}{\log_{4}(100!)} = \frac{\log{4}}{log(100!)}\] \[\dots\] \[\frac{1}{\log_{100}(100!)} = \frac{\log{100}}{log(100!)}\]

So the left hand side of the original expression is …

\[\frac{log(2) + log(3) + log(4) + \ldots + \log(100)}{log(100!)}\]

We also know that …

\[\log(a) + \log(b) = \log(ab)\]

So …

\[log(2) + log(3) + log(4) + \ldots + \log(100) = \log(2 \cdot 3 \cdot 4 \cdot \ldots \cdot 100) = \log(100!)\]

Plugging this back into the left hand side of the original expression we get …

\[\frac{\log(100!)}{\log(100!)}\]

Which is equal to $1$. Therefore …

\[\frac{1}{log_{2}(100!)} + \frac{1}{log_{3}(100!)} + \frac{1}{log_{4}(100!)} + \ldots + \frac{1}{log_{100}(100!)} = 1\]

QED

On a Neat Excel Formula

2020-05-06T00:00:00+00:00

Tools

Excel is one of the greatest pieces of software ever written. There, I said it. I love Excel and use it all the time, even for things that it probably shouldn’t be used for. It was once said that “The solution to any problem is an Excel spreadsheet” and I sometimes think I hew too closely to that addage. You can do all sorts of things in Excel even if it isn’t immediately obvious how to do it.

Searching for occurrences of a substring inside a string

Excel has a nifty FIND function that you can use to find the index of the “first” occurrence of a substring inside a string starting from a given index into the string. I use it all the time.

FIND(<find_text>, <within_text>, [<start_index>])

Returns the index (1-based) of the first occurance of <find_text> within <within_text> starting from the
character at index <start_index>.  If <find_text> is not found then it returns #VALUE!.

<find_text>      Required.  The text to find.
<within_text>    Required.  The text to search.
<start_index>    Optional.  A (1-based) index offset into <within_text> to start the search for <find_text>
                 from.  The default is the first character.

Note that because Excel was written for business users as opposed to programmers, the indexes are all 1-based as opposed to 0-based. Once we get past that little annoyance then we can use it to do neat things. For example, to find the index of the first occurrence of the string value in cell A2 within the string value in cell A1, we would use this …

=FIND(A2, A1)

Nice. However, Excel does not have an equivalent function to find the index of the “last” occurrence of a substring inside a string. It can search from the left but it can’t search from the right. This is annoying. However with some inventive use of a few other Excel functions we can do it. Here’s how …

=IF(ISERROR(FIND(A2, A1)),
    #VALUE!,
    FIND(REPT("~", LEN(A2)),
         SUBSTITUTE(A1, A2, REPT("~", LEN(A2)), (LEN(A1) - LEN(SUBSTITUTE(A1, A2, ""))) / LEN(A2))))

OK, that’s clearly not as simple as FIND, but it works. How does it work though? Let’s break it down.

The outer enclosing IF statement is there to mimic the behavior of the FIND function if the substring isn’t found in the string at all. We use FIND to test for that. If FIND returns #VALUE!, an error, then we return #VALUE! too. Let’s remove that logic and look at what’s left …

FIND(REPT("~", LEN(A2)),
     SUBSTITUTE(A1, A2, REPT("~", LEN(A2)), (LEN(A1) - LEN(SUBSTITUTE(A1, A2, ""))) / LEN(A2)))

Let’s break this down some more …

FIND(REPT("~", LEN(A2)),
     SUBSTITUTE(A1, A2, REPT("~", LEN(A2)), (LEN(A1) - LEN(SUBSTITUTE(A1, A2, ""))) / LEN(A2)))

We are now using FIND to search for a different substring in a modified version of the string. What’s this all about? The modified version of the string is the result of a call to Excel’s SUBSTITUTE function …

SUBSTITUTE(<text>, <old_text>, <new_text>, [<instance_num>])

<text>           Required.  The text within which you want to substitute characters.

<old_text>       Required.  The text you want to replace.

<new_text>       Required.  The text you want to replace <old_text> with.

<instance_num>   Optional.  Specifies which occurrence of <old_text> you want to replace with <new_text>.  If
                 you specify <instance_num>, only that instance of <old_text> is replaced.  Otherwise, every
                occurrence of <old_text> in <text> is changed to <new_text>.

We use SUBSTITUTE to create a version of the string that has the last instance of the substring replaced with a different (hopefully unique) substring of the same length. It’s this different substring that we can then search for using the outer FIND call. Note that the outer FIND call is now searching for REPT(“~”, LEN(A2)) as opposed to just A2.

We hope that a sequence of repeated ‘~’ characters will be a unique substring within the string. This isn’t perfect but it’ll work most of the time. Good enough for government work as they say. Depending on the nature of the strings that you are searching through in a real world application you could change this to more likely be a unique substring.

So how do we replace the last instance of the substring in the string? Well, SUBSTITUTE has that optional fourth argument that allows us to specify which instance of <old_text> that we want to replace with <new_text> within <text>. If we know how many instances of substring are in our string then that’s the number that we need to pass as <instance_num>. The formula that we pass to this argument is …

(LEN(A1) - LEN(SUBSTITUTE(A1, A2, ""))) / LEN(A2)

This does indeed calculate the number of instances of substring there are in our string. But how?

We use SUBSTITUTE again. This time to replace “all” instances of our substring with the empty string. We then calculate the length of this modified string and subtract it from the length of the original string. The result is the total number of characters in the n instances of the substring in the string. But we know the length of each substring so we can divide that into this total to get the number of substring instances that there were. Voila!

And there you have it ladies and gentlemen. Excel hackery at its best.

Let’s work through some examples to demonstrate how this works.

Example 1

A1: C:\some\quite\nested\file\path\foo.txt
A2: \

We can find the index of that last ‘\’ with …

A3: =IF(ISERROR(FIND(A2, A1)),
        #VALUE!,
        FIND(REPT("~", LEN(A2)),
             SUBSTITUTE(A1, A2, REPT("~", LEN(A2)), (LEN(A1) - LEN(SUBSTITUTE(A1, A2, ""))) / LEN(A2))))

And we can extract the filename with …

A4: =MID(A1, A3 + 1, LEN(A1) - A3)      -> foo.txt

Let’s work this through from the inside out …

SUBSTITUTE(A1, A2, "")                                              -> C:somequitenestedfilepathfoo.txt
LEN(SUBSTITUTE(A1, A2, ""))                                         -> 32
LEN(A1) - LEN(SUBSTITUTE(A1, A2, ""))                               -> 6
(LEN(A1) - LEN(SUBSTITUTE(A1, A2, ""))) / LEN(A2)                   -> 6
SUBSTITUTE(A1, A2, REPT("~", LEN(A2)), 6)                           -> C:\some\quite\nested\file\path~foo.txt
FIND(REPT("~", LEN(A2)), SUBSTITUTE(A1, A2, REPT("~", LEN(A2)), 6)) -> 31

Since the substring that we were searching for was only one character long we could have simplified this a bit to …

A3: =IF(ISERROR(FIND(A2, A1)),
        #VALUE!,
        FIND("~",
             SUBSTITUTE(A1, A2, "~", (LEN(A1) - LEN(SUBSTITUTE(A1, A2, ""))) / LEN(A2))))

Example 2

Of course the substring will not always be just one character.

A1: foo|||bar|||baz|||boof|||bang|||bung
A2: |||

We can find the index of that last ‘|||’ with …

A3: =IF(ISERROR(FIND(A2, A1)),
        #VALUE!,
        FIND(REPT("~", LEN(A2)),
             SUBSTITUTE(A1, A2, REPT("~", LEN(A2)), (LEN(A1) - LEN(SUBSTITUTE(A1, A2, ""))) / LEN(A2))))

And we can extract the last token with …

A4: =MID(A1, A3 + 1, LEN(A1) - A3)      -> bung

Let’s work this through from the inside out …

SUBSTITUTE(A1, A2, "")                                              -> foobarbazboofbangbung
LEN(SUBSTITUTE(A1, A2, ""))                                         -> 21
LEN(A1) - LEN(SUBSTITUTE(A1, A2, ""))                               -> 15
(LEN(A1) - LEN(SUBSTITUTE(A1, A2, ""))) / LEN(A2)                   -> 5
SUBSTITUTE(A1, A2, REPT("~", LEN(A2)), 6)                           -> foo|||bar|||baz|||boof|||bang~~~bung
FIND(REPT("~", LEN(A2)), SUBSTITUTE(A1, A2, REPT("~", LEN(A2)), 6)) -> 30

QED

On Windows Auth and Kerberos

2020-03-17T00:00:00+00:00

Authenticate You I Will. But How?

When you configure a SQL Server instance one choice is what authentication modes it will support. There are two choices: “Windows Authentication mode” and “SQL Server and Windows Authentication mode”, otherwise known as mixed mode. Windows Authenticatian is always preferred for applications but sometimes you need to support legacy apps or clients that do not natively support Windows Auth. But what is Windows auth? Actually, what is authentication?

Authentication and Authorization

Two terms are often thrown around when it comes to how applications connect to a server/service: authentication and authorization. They are similar but different concepts. Authentication means confirming identity, whereas authorization means confirming access. In even more simpler terms authentication is the process of verifying who someone is, while authorization is the process of determining what someone should have access to.

Authentication is about validating credentials and establishing identity. A system takes the given credentials and then somehow checks whether you are who/what you say you are. These credentials are typically a username and password although there are various other types of credentials and mechanisms of authentication as well.

Authorization occurs after identity has been established. Based on the confirmed identity the system will then grant appropriate access to resources such as databases, files, records, accounts, funds, etc. Not all identities will have access to all resources. For example a system may have the notion of roles that user identities map to: customer, staff, manager, admin, etc.

Different Types of Authentication

Single-Factor Authentication

This is the simplest form of authentication scheme. In order to establish identity for potential access to a system a user presents a username (a statement of identity) and a secret token (commonly known as a password). The system takes these two pieces of information and compares the secret token to stored information for that username in order to determine that the user is who they claim to be. If the secret matches the saved secret then the claimed identity is accepted.

A secure system does not store the secret tokens in plaintext in a datastore, rather it stores hashed versions of the secret tokens. By “hashed” I mean the result of passing the secret token through a one-way transformation that reliably maps inputs to outputs (the hash of the input) but for which it is prohibitively hard to take a given output hash value and convert it back into the input that generated that output. The system stores the hashed secret tokens in its data store. When a user presents their username and secret token, the system calculates the hash of that secret token and compares it to the saved value for the presented username. If the values match then identity is established.

Two-Factor Authentication

This scheme requires a two-step verification process which not only requires a username and password, but also an additional piece of information only the user knows or is in possession of. This commonly involves some sort of “callback” to a previously registered device (e.g. a cell phone) with a token that the user then presents back to the system, or a request for a token from a previously configured token generator scheme (e.g. the Google Authenticator app), typically a known pseudo random number generation algorithm with a coordinated seed. Such schemes are becoming increasingly common.

Multi-Factor Authentication

This is the most advanced method of authentication which requires three or more levels of security from independent categories of authentication in order to establish identity. These “factors” are independent of each other in order to minimize the risk of data exposure.

SQL Server Authentication

SQL Server auth is a simple single-factor authentication scheme. Once a client program has estalbished its initial connection to the SQL Server process (e.g. a TCP socket) then the username and password are sent over the wire for processing. SQL Server hashes the password, compares it to the saved hash for the username and confirms identity if they match. This is not ideal since the application has to have access to both the username and password in order to present them everytime it needs to connect to the database server. The application is wholely responsible for securing the username and password. Often this leads to passwords being stored in app configuration files in an unsecure way.

Windows Authenticaion

Windows authentication can actually uses various schemes. The two main ones are NTLM and Kerberos. When you connect to SQL Server using “Windows authenticatian” then you might use either scheme depending on the context. NTLM will be used for systems configured as a member of a workgroup and for local logon authentication on non-domain controllers. In a domain, Kerberos is the default scheme but its use requires that the SQL Server service is running as an account that has the appropriate permissions to domain objects and that the configuration in the domain is correct.

NTLM

NTLM credentials consist of the domain name, the username and a one-way hash of the user’s password. NTLM uses an encrypted challenge/response protocol to authenticate a user without sending the user’s password over the wire. Instead, the system requesting authentication must perform a calculation that proves it has access to the secured NTLM credentials. More details are beyond the scope of this post.

Kerberos

Kerberos is more complex and a complete treatment is definitely beyond the scope of this post. Ultimately it provides a mechanism for mutual authentication between entities before a secure network connection is established. It assumes that transactions between clients and servers take place on an open network where machines are not physically secure, and packets can be monitored and modified at will. It is much more secure than NLTM and should always be preferred.

Did my SQL Server connection Use Kerberos or NTLM?

When you logon to a remote SQL Server instance using Windows Authentication how do you know what authentication scheme was used? You can check this by querying an appropriate dynamic management view (DMV).

SELECT auth_scheme FROM sys.dm_exec_connections WHERE session_id = @@SPID;

This will return either “NTLM” or “KERBEROS”. Now if you think that you should be using Kerberos (because you are connecting to a remote SQL Server instance in a domain environment) but you are not then there are a few things to check.

Kerberos auth requires the registration of appropriate Service Principal Names (SPNs) on appropriate objects in Active Directory. These are service-specific key-value pairs saved as part of the servicePrincipalName attribute on the AD computer object of the machine that is running SQL Server [See below to an exception to this when using AD managed service accounts]. If these are not set correctly then Kerberos can’t be used and SQL Server will fall back to using NTLM.

If SPNs are not set correctly you may also see other errors when trying to connect to a SQL Server instance, e.g. “Cannot Generate SSPI Context”.

You can check the SPN configuration using the ADSI Edit app in Windows (adsiedit.msc). Navigate to the CN record for the SQL Server machine then right click and select Properties to bring up the Properties dialog box. In the “Attribute Editor” tab scroll down and find the “servicePrincipalName” entry. Click on “Edit” or “View” (depending on your access) and you should see a collection of values something like this …

Values
...
HOST/servername
HOST/servername.mycompany.com
MSSQLSvc/servername.mycompany.com
MSSQLSvc/servername.mycompany.com:1433
...

You should see lines prefixed with “MSSQLSvc”. These are the SPNs for the SQL Server service running on this machine. If they are missing then something is wrong.

These entries should be automatically added by the SQL Server service when it starts up. However this assumes that the account that the SQL Server service is logging on with has the approprate rights to set these attributes in Active Directory. If someone has changed the account that the service will run as (e.g. to an incorrectly configured domain service account) then this might be an issue. Check the SQL Server error log for entries that say something like this …

The SQL Server Network Interface library could not register the Service Principal Name (SPN)
[MSSQLSvc/servername.mycompany.com:1433 ] for the SQL Server Service.  Windows return code:
0x2098, state: 15.  Failure to register an SPN might cause integrated authentication to use
NTLM instead of Kerberos.  This is an informational message.  Further action is only required
if Kerberos authentication is required by authentication policies and if the SPN has not been
manually registered.

The account running the SQL Server service has to have the appropriate permissions to write the SPN to AD. The default SQL Server service account (NT Service\MSSQLSERVER) can do this. If you want to use a service account - and in a domain environment you should - then you will need to use an appropriately configured managed service account. See here for more on that.

On the subject of using group managed service accounts, if you are doing that then the SQL Server SPNs will actually be registered on a different object in AD. As opposed to being registered on the CN object for the computer on which a given SQL Server instance is running, any SQL Server instance configured to logon as a group managed service account will actually register its SPN on the CN for the group managed service account. So, using the same terminology from my other blog post, about managaged service accounts, if we have three machines (sqlnj01, sqlnj02 and sqlca01) all configured to run the SQL Server service as the managed service account gmsa-sqlag01 then we will not find MSSQLSvc SPNs on the CN objects for sqlnj01, sqlnj02 and sqlca01 in AD. Rather, on the CN object for gmsa-sqlag01 we will see these values under the servicePrincipalName attribte …

Values
...
MSSQLSvc/sqlnj01.mycompany.com
MSSQLSvc/sqlnj01.mycompany.com:1433
MSSQLSvc/sqlnj02.mycompany.com
MSSQLSvc/sqlnj02.mycompany.com:1433
MSSQLSvc/sqlca01.mycompany.com
MSSQLSvc/sqlca01.mycompany.com:1433
...

Happy authenticating!

References

On AD Managed Service Accounts

2020-03-16T00:00:00+00:00

What Account Should a Windows Service Run As?

Best practice suggests that a given service running on a Windows host should be configured to logon as a dedicated account that has the minimum privileges required to do what the service is supposed to do. Windows has built in accounts that can be used as the principal for services (such as Local System, Local Service, Network Service, etc.) but some of these have elevated access to the local machine and so pose a risk if the service were to be compromized via some attack. You could create a dedicated local machine account to own the service but a more manageable option is to create a service account in the Windows domain. Windows Active Directory actually has special features to manage such domain service accounts including the ability to centrally manage passwords and to automatically change them on a configured schedule. Configuring such accounts involves several steps and there are some wrinkles along the way. I just went through this learning process and I wanted to document my experiences for future reference.

Creating and Configuring a Group Managed Service Account

I don’t believe that you can do all the following steps using the Windows UI. I did all of this using PowerShell.

The first thing to do is logon to a convenient Windows workstation/server that is a member of our domain using an account that has the appropriate admin access to create/configure objects in the domain. Next we need to start a PowerShell session (with elevated privileges) and install a few additional modules …

PS C:\Users\username> Import-Module ServerManager
PS C:\Users\username> Add-WindowsFeature RSAT-AD-PowerShell,RSAT-AD-AdminCenter

If this is the first time you are creating any group managed service accounts you will first have to generate a new root key for the Microsoft Group Key Distribution Service (KdsSvc), like so …

PS C:\Users\username> Add-KdsRootKey -EffectiveImmediately

For the purposes of this example let’s assume that we are creating a service account to own the SQL Server service instances that will be art of a new Availability Group. There will be three machines, two in a data center in NJ and one in a data center in CA; let’s call them sqlnj01, sqlnj02 and sqlca01. Collectively these will be part of an Availability Group that we will call ag01. So let’s call the service account sqlag01.

First we need to create a new security group to contain the names of these servers, the computers that will be allowed to use the group managed service account.

PS C:\Users\username> New-ADGroup `
>>> -Name grp-sqlag01 `
>>> -GroupCategory Security `
>>> -GroupScope Global `
>>> -Path "OU=Service Account Groups,OU=Groups,DC=MyCompany,DC=Com" `
>>> -Description "Computers for the gmsa-sqlag01 service account"

Note that we specified the OU where we want the group to live.

Next we add the relevant machine names to the group, like so …

PS C:\Users\username> Add-ADGroupMember -Identity grp-sqlag01 -Members sqlnj01$,sqlnj01$,sqlca01$

Note that you have to add the $ postfix to each machine name.

We can then check the group membership, like so …

PS C:\Users\username> Get-ADGroupMember -Identity grp-sqlag01

Next we create the service account, like so …

PS C:\Users\username> New-ADServiceAccount `
>>> -Name gmsa-sqlag01 `
>>> -DNSHostName gmsa-sqlag01.mycompany.com `
>>> -PrincipalsAllowedToRetrieveManagedPassword grp-sqlag01 `
>>> -Path "OU=Managed Service Accounts,DC=MyCompany,DC=Com"

Note that we specified the OU where we want the service account to live.

Installing a Group Managed Service Account on a Computer

Next we need to install the service account on the three machines where it is going to be used.

Note that if your environment consists of geographically distributed, replicated domain controllers, and you edited the group managed service account settings on a computer in a location far from the computer where you will be installing the service account, then you will have to wait until AD replication has brought the new settings to the domain controller closest to that computer before you execute the following. If any of the following commands raise an error just wait a while and try again.

We can install the service account from the machine where we have been working already, via PowerShell …

PS C:\Users\username> Enter-PSSession -ComputerName <hostname>

Or we can RDP to each machine in turn and start a PowerShell session (with elevated privileges). Either way we need to run this command on each machine …

PS C:\Users\username> Install-ADServiceAccount -Identity gmsa-sqlag01

Now, the above command could return an “Access denied” error. The basic problem is that Windows machines cache the membership of AD security groups. If a machine has cached the membership of the grp-sqlag01 group from before its name was added to the group, then when we try to install the new service account Windows will not know that the machine is allowed to retrieve the managed password. Thus the “access denied” error.

Before we can install the new service account we need to get each machine to reread the associated security group membership from AD. Some articles I found online said that to fix this you had to reboot the server. Now that will work but it seems a little bit extreme, especially if you are trying to install the new service account on a production server. There are better ways. One way is to wait. The cached local security groups will eventually time out. We don’t have to wait though. With some help from this article I learnt about the klist command which is used to list/manage currently cached Kerberos tickets. We can use this command to purge the tickets and by doing that we also clear the locally cached security groups.

So we can run this …

PS C:\Users\username> klist -lh 0 -li 0x3e7 purge

And then rerun the above Install-ADServiceAccount command which should work. We can then test the installation using …

PS C:\Users\username> Test-ADServiceAccount -Identity gmsa-sqlag01

… which should return True.

Configuring the SQL Server Service to Logon as the Service Account

By default, the SQL Server setup program will configure the SQL Server service to logon as a special virtual account called “NT Service\MSSQLSERVER”. In order to change this we need to launch the “Services” app on the machine, right click on the “SQL Server (MSSQLSERVER)” entry in the services list and then click on Properties. We click on the “Log On” tab and then click on the “Browse…” button to launch the “Select User or Service Account” dialog. We click on the “Locations…” button and change the option to “Entire Directory”; we click on the “Object Types…” button and ensure that “Service Accounts” is selected; and then we Enter “MyCompany\gmsa-sqlag01$” in the “Enter the object to select” text box. We click “Check Names” to verify that the account can be found and then we click the OK button to return to the “Log On” tab. We clear out the “Password” and “Confirm password” fields like so …

And then finally we click the “OK” button.

We should be informed that “The account MyCompany\gmsa-sqlag01$ has been granted the Log On As A Service right” and then then that “The new logon name will not take effect until you stop and restart the service”. Restarting the service is the final step.

We repeat this for the other machines in the group and we are done.

Service Dependencies

One consequence of having a service logon as a managed service account is that a domain controller must be accessible when the service starts up in order for Windows to retrieve the managed password from AD. This means that the network must be up and fully configured in order for a service to start. This makes sense but unfortuntelay it leads to a problem when a computer reboots since the Service Control Manager will likely try to start services before the network service is itself fully started. This will lead to “automatic” services failing to start after a reboot. Not good.

However, we can fix this by configuring service dependencies.

Run regedit.exe and then navigate to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ folder. Then open the folder for the service that you have configured to logon as a service account (in this example it was MSSQLSERVER) and look for a REG_MULTI_SZ key/value called DependOnService. Create it if it doesn’t already exist and then edit the value to add Win32Time and Netlogon with each value on a new line. Like this …

Click OK and then close regedit. With this done the Windows Service Control Manager will not try to start this service until the Netlogon and W32Time services are started which should ensure that AD is accessible.

On the Structure of Pi

2019-11-03T00:00:00+00:00

Number

Let’s start with a question … What is a number? It’s an innocuous little inquiry but one that has a lot of depth. The answer could be various things: A quantity of stuff, a measure of something, a metric ascribed to some property. At it’s most basic it could be considered to be a count of things, and that’s probably where we (humble members of species Homo sapiens) first got started with the concept. Gog wanted to communicate to Stig how many rocks he had and so held up a number of fingers (ever present conveniently indicative counting devices) to show him. Thus the concept of number could have pre-dated language.

Conceptualizing a count of things is a natural way to start thinking about numbers. This could start symbolically, by holding up three fingers to indicate three things, but grow into other symbolic representations of a count: a word or an inscribed symbol. Of course there could be various different sounds/symbols that become associated with a given number, each born from their own cultural context, but there is also something universal about what those symbols represent; an overarching three-ness that transcends use of the word three or the symbol 3; or indeed the words “trois” (French), “tres” (Spanish) and “drei” (German); or the symbols III (Roman Numerals), 三 (Kanji) or 11 (binary). Some cultures even use different words to count different types of things, e.g. in Japanese you would say “mittsu” (三つ) when just arbitrarily counting but “san’nin” (三人) to indicate three people. Complex stuff and all born of the different context within which different human groups evolved language and writing. At the end of the day though the concept of three is universal, at least mathematically so.

Number Systems, Base and a Bias for 10

Let’s take a moment to think about the symbolic representation of numbers though. Those of you reading this blog (if there ever are any readers of this blog …) are probably English speaking and educated in the modern Western tradition of the positional decimal numeral system using Arabic numerals to represent the digits 0 to 9. I’ve written about this before as part of some articles on floating point numbers.

That system is a very useful one that makes it somewhat easy to do arithmetical operations by hand. It includes concepts that were once radical and even heretical; such as using a separate symbol (0) to represent “nil”, “nothing”, “none”; and the corresponding idea of using place/position to indicate a count of a base number, in this case 10. It always amazes me to think that the concept of a symbol for “none” was considered un-holy and that its use could be outlawed by the church. That says so much about the nature of religion. Sigh.

Why base ten though? I guess because we humans, on the whole, have ten fingers. The word digit, used both to refer to a finger and also a symbol in a place within a number, lends some credence to this idea. Not all cultures used 10 though. The Babylonians, the first to utilize a positional number system, used base 60 and had symbols for the digits 0 to 59. Hah, I used the word digit there again, to refer to a symbol indicating one of a specific set of numbers in a place within a positional number system. Could I have used a different word? Not one as concise, that would communicate the concept I wanted. Oh the limitations of language.

The Babylonian’s choice of 60 lives on in some form in the modern world. It’s why we have 60 seconds in a minute, 60 seconds in an hour and 360 degrees in a circle. 60 is also a great number for commerce because it has a lot of factors. If you have 60 things then you can easily split them in to equal batches of 2, 3, 4, 5, 6, 10, 12, 15, 20 and 30. To some degree this is why 12 (a dozen) was often a popular number for many uses because - again - it has a lot of factors (2, 3, 4 and 6). 10 only has 2 and 5.

Anyway, I just want to point out that base ten isn’t special, it just turned out to be popular enough to ultimately become a de-facto standard for the modern world.

Other Types of Number

So far we’ve only thought about numbers for counting whole things (0, 1, 2, 3, …). Such numbers are generally referred to as the Natural Numbers. Over time we humans (and mathematicians) have innovated and extended the concept of number in other directions, to usefully describe and work with other concepts.

Negative numbers were invented to track the idea of magnitude along with a binary direction (e.g. credit or debit in ledgers and banking). Numbers born of debt one could say. These numbers were subsequently codified as the Integers.

In order to deal with pieces of a whole (half, a third, etc.) and proportions, humans came up with fractions, later codified as the Rational Numbers, and rather than deal directly with fractions we extended the idea of the positional decimal numeral system via the introduction of the “decimal point” and the use of digits to the right of it to indicate quantities of 1/10, 1/100, 1/1000, etc. that were to be included in the implicit overall aggregate value that the chain of symbols represented (E.g. 25.5 to indicate twenty five and a half).

Next up we have numbers born more from mathematical exploration as opposed to common utility. Firstly we have the Irrational Numbers. It was the Greeks who first had a run in with these fellas when they considered the length of the hypotenuse of a unit right triangle, the quantity otherwise known as $\sqrt{2}$. Such a number cannot be written as a fraction and so is not a rational number. For a neat proof of this see here. Thus it was discovered that there are numbers beyond the rationals and lo the irrationals were born.

Then we have the Real Numbers, the Algebraic Numbers, the Trancendental Numbers and the Complex Numbers. I’m not going to go into any real detail on these but they are all very important little beasties.

For now I want to focus back on decimal numbers and certain decimal representations in particular.

Special Decimal Numbers

$\pi$ is probably the first “special” number that we meet during our mathematical education and very quickly we learn that it has an infinite decimal representation …

\[3.1415926535897932384626433862 ...\]

Perhaps you tried to memorize the first N digits of $\pi$ once. Then perhaps you had a life and you didn’t. Either way it’s definitely true that we humans have imbued these digits with some significance. Why though? They are not fundamentally special and instead are tied inextricably to the choice of base and the decimal positional number system. We could write $\pi$ using various bases as I discussed here. It’s worth reminding ourselves of this …

The choice of base is arbitrary and the sequence of digits is equally arbitrary.

However, there does exist a more fundamental infinite representation of $\pi$ but before we can talk about it we have to understand continued fractions.

Continued Fractions

A Continued Fraction is an expression obtained via an iterative process of representing a number as the sum of its integer part and the reciprocal of another number. Well, that’s the technical definition. Let’s start with a few examples and think about rectangles. Hey, geometry!

The continued fraction for a natural number is, trivially, just the natural number. Let’s consider the rational number $\frac{45}{16}$ and think of the numerator and denominator as the lengths of the sides of a rectangle.

Now lets try to decompose this rectangle into squares. We can think of it as being composed of two squares (of side $16$) and another smaller rectangle (with sides $16$ and $13$).

Then we can think of this smaller rectangle as being composed of one square (of side $13$) and another yet smaller rectangle (with sides $13$ and $3$).

Then we can think of this yet smaller rectangle as being composed of four squares (of side $3$) and a rectangle (with sides $3$ and $1$).

Finally we can think of this tiny rectangle as being composed of three squares (of side $1$) which completes the decomposition.

This is the same as observing that $\cfrac{45}{16} = \cfrac{(2 \times 16) + 13}{16} = 2 + \cfrac{13}{16}$

that $\cfrac{16}{13} = \cfrac{(1 \times 13) + 3}{13} = 1 + \cfrac{3}{13}$

that $\cfrac{13}{3} = \cfrac{(4 \times 3) + 1}{3} = 4 + \cfrac{1}{3}$

By inverting the above we can see that $\cfrac{13}{16} = \cfrac{1}{1 + \cfrac{3}{13}}$

and that $\cfrac{3}{13} = \cfrac{1}{4 + \cfrac{1}{3}}$

Substituting these back into our first equation we can finally get …

\[\cfrac{45}{16} = 2 + \cfrac{1}{1 + \cfrac{1}{4 + \cfrac{1}{3}}}\]

… where the numbers are the counts of squares in the decomposition of the 45 by 16 rectangle.

Continued Fractions for Irrational Numbers

A rational number will always have a terminating continued fraction but what about an irrational number? Let’s think about the first irrational number that people usually learn about, namely $\sqrt{2}$.

If we start with the obvious identity $\sqrt{2} = 1 + (\sqrt{2} - 1)$ then we can further see that $\sqrt{2} = 1 + (\sqrt{2} - 1)\cfrac{(\sqrt{2} + 1)}{(\sqrt{2} + 1)}$ and then multiply out to get $\sqrt{2} = 1 + \cfrac{(2 - \sqrt{2} + \sqrt{2} - 1)}{(1 + \sqrt{2})} = 1 + \cfrac{1}{1 + \sqrt{2}}$

This recursive identity for $\sqrt{2}$ can be expanded …

\[\sqrt{2} = 1 + \cfrac{1}{1 + 1 + \cfrac{1}{1 + \sqrt{2}}}\]

and expanded …

\[\sqrt{2} = 1 + \cfrac{1}{2 + \cfrac{1}{1 + 1 + \cfrac{1}{1 + \sqrt{2}}}}\]

We can see that this results in an infinite continued fraction with the following pattern …

\[\sqrt{2} = 1 + \cfrac{1}{2 + \cfrac{1}{2 + \cfrac{1}{2 + \cfrac{1}{2 + \cfrac{1}{2 + \ddots}}}}}\]

The sequence of numbers in a continued fraction like this is often written more concisely as …

\[\sqrt{2} = [1:2,2,2,2,2,\dots]\]

It is true that all irrational numbers will have an infinite continued fraction. Here are some other examples …

\[\sqrt{19} = [4;2,1,3,1,2,8,2,1,3,1,2,8,2,1,3,1,2,8,2,\dots]\] \[e = [2;1,2,1,1,4,1,1,6,1,1,8,1,1,10,1,1,12,1,\dots]\] \[\phi = [1;1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,\dots]\] \[\pi = [3;7,15,1,292,1,1,1,2,1,3,1,14,2,1,1,2,\dots]\]

These sequences are, respectively, A040000, A010124, A000012, A003417 and A001203 in the wonderful Online Encyclopedia of Integer Sequences.

There’s some structure here too. The sequence for $\sqrt{19}$ repeats the sub-sequence $2,1,3,1,2,8$ indefinitely with period $6$. The sequence for $e$ repeats the sub-sequence $1,n,1$ indefinitely with $n$ starting as $2$ and then increasing by $2$ each time. The sequence for $\phi$ is probably the simplist of all. However there is no currently known structure to the sequence for $\pi$ which makes it all the more special.

The Continued Fraction for $\pi$

I propose that the sequence of integers in $\pi$’s continued fraction is a much better thing for people to focus on than the digits of its decimal representation. This sequence has nothing to do with a choice of base and is truly indicative of the fundamental structure of probably the most famous number in all of mathematics.

The Continued Fraction for $\sqrt{x}$

We can generalize what we did above for $\sqrt{2}$ …

\[\sqrt{x} = 1 + (\sqrt{x} - 1)\] \[\sqrt{x} = 1 + (\sqrt{x} - 1)\cfrac{(\sqrt{x} + 1)}{(\sqrt{x} + 1)}\] \[\sqrt{x} = 1 + \cfrac{(x - \sqrt{x} + \sqrt{x} - 1)}{(1 + \sqrt{x})} = 1 + \cfrac{x - 1}{1 + \sqrt{x}}\]

Thus …

\[\sqrt{x} = 1 + \cfrac{x - 1}{2 + \cfrac{x - 1}{2 + \cfrac{x - 1}{2 + \cfrac{x - 1}{2 + \cfrac{x - 1}{2 + \ddots}}}}}\]

The recursive relation above makes for a very nice recursive algorithm for the calculation of square roots to a given level of precision.

Rational Approximations to Irrational Numbers

We can use these sequences to construct successively better rational approximations to an irrational number and, in the case of $\pi$, some familiar rational approximations fall out. We do this by truncating the continued fraction at successive levels and assuming that the remaining fractional part is zero. Starting at the top we have …

$\pi$

\[\pi = 3\]

The most basic approximation and not a very good one. $\pi - 3 = -1.42 \times 10^{-1}$, an error of $-4.5\%$.

\[\pi = 3 + \cfrac{1}{7} = \cfrac{(3 \times 7) + 1}{7} = \cfrac{22}{7}\]

A familiar old approximation and reasonably accurate. $\pi - \frac{22}{7} = 1.26 \times 10^{-3}$, an error of $0.04\%$. We can do better though.

\[\pi = 3 + \cfrac{1}{7 + \cfrac{1}{15}} = 3 + \cfrac{1}{\cfrac{(7 \times 15) + 1}{15}} = 3 + \cfrac{15}{106} = \cfrac{(3 \times 106) + 15}{106} = \cfrac{333}{106}\]

$\pi - \frac{333}{106} = -8.32 \times 10^{-5}$, an error of $0.00265\%$. Let’s keep going.

\[\pi = 3 + \cfrac{1}{7 + \cfrac{1}{15 + \cfrac{1}{1}}} = 3 + \cfrac{1}{7 + \cfrac{1}{16}} = 3 + \cfrac{1}{\cfrac{(7 \times 16) + 1}{16}} = 3 + \cfrac{16}{113} = \cfrac{355}{113}\]

A particularly good approximation for $\pi$ since the presence of the number $292$ as the next element in the sequence ensures that the residue after truncating the continued fraction at this point is quite small. $\pi - \frac{355}{113} = 2.67 \times 10^{-7}$, a tiny error of $0.0000085\%$ and basically zero for all practical purposes.

$\phi$

The continued fraction for $\phi$ (The Golden Ratio) may seem trivially simple but actually it makes this number all the more special. It can be considered the most irrational of irrational numbers because it is the most difficult to approximate with a rational number.

On Flowers

2019-11-01T00:00:00+00:00

Inspiration

I recently came upon a gem of a video on YouTube from the Numberphile channel. If you don’t follow Numberphile then shame on you, it’s a wonderful series of videos on all matters mathematic and numerical. There’s a sibling channel called Computerphile that is equally as good but focuses on all matters computery. Now you may be wondering if that is a real word but it is! Isn’t it amazing how all kinds of terms ultimately become legitimized via sufficient adoption. Computery can refer things of or pertaining to computers or it can sometimes refer to the quality of a person too. Anyway, I digress.

The subject of the inspirational video was that of irrational numbers and specifically how some of them are more (or less) irrational than others. This was illustrated via a discussion of how seeds are packed into the head of a flower. The video contained various illustrations and animations that were very neat and enlightening and after seeing them I decided that I wanted to write some code to render those animations myself. And that, dear reader, is the point of this post.

A Seed Renderer

Here’s a little JavaScript app that will render seeds according to various parameters. The basic parameters are presented just below the image and are 1) the width (in pixels) of a seed, 2) the fraction of a clockwise turn at which to render the next seed (this can be entered as a fraction or a decimal), and 3) an outward radial spacing factor which controls how much each seed is moved out away from the origin per turn (a value of 1 will ensure that after one full turn around the origin the next seed will be placed one seed width further out). Some initial values are plugged in but you can change them and then click the ‘Draw Flower’ button.

Just below the basic parameters is an input that allows you to dial in a delta to the rotation-factor and two buttons that will then re-render the flower after having incremented or decremented the rotation-factor by the delta amount. This allows you to see how the layout of the seeds changes as the rotation-factor changes. Try it.

seed-width: rotation-factor: out-factor:

rotation-factor-delta:

Animation

Here’s another JavaScript app that will animate the changes in the seed rendering pattern as the rotation-factor changes. Just dial in a starting rotation-factor, a delta for the factor between each render and a delay (the number of milliseconds to wait between renders). The ‘Start’ button will start the animation. I bet you can guess what the ‘Stop’ button does.

rotation-factor:

start-rotation-factor: rotation-factor-delta: delay (ms):

On How to Fold a Fitted Sheet

2019-10-13T00:00:00+00:00

Another video post! It only took me 48 years to learn how to fold a fitted sheet. Never be satisfied with the fact that you don’t know how to do something. It’s never too late to learn something new or to reconsider something old, as I showed here.

On Trees and Paths

2019-05-01T00:00:00+00:00

The Question

This post is inspired by a question that I was asked in an interview once. Here it is …

Assume that you have an unordered binary tree where the values stored at the nodes are positive integers. Write code to find the length of the longest path through the tree that consists of nodes whose values are in strict increasing or decreasing order. A path proceeds from node to node via the obvious parent-child relationships and can go up and down.

An example is in order. In this tree …

… the longest path is of length four. The path is [10, 20, 30, 40] (starting from the lowest 10 node, proceeding up to the 30 node and then down to the lowest 40 node). We could also list this path in descending order, the order of traversal doesn’t matter.

Solution Strategies

So how would we go about solving this for a general binary tree? Well, most tree algorithms involve some sort of recursive traversal while maintaining state. The traversals are pretty generic, it’s the state you track along the way and how you use that state that will be key.

Binary Tree Traversals

There are various types of tree traversal. The first distinction is between breadth-first and depth-first. In the former the tree is fully explored left to right at each level before proceeding to the next level, whereas in the latter the tree is fully explored root to leaf by following the left child at each node before then jumping up and following the right node.

With a depth-first search there are three further types of traversal based on the order in which the root node and children (sub-trees) of a given tree are processed. These are pre-order (the root is processed first then the children), in-order (the left is processed first, then the root, then the right) and post-order (the children are processed first and then the root).

Generic Code for Tree Traversal

To solve this problem we are going to need a depth-first, post-order traversal, i.e. one where we process all the children before we process the root. This represents an exhaustive, bottom up search. Let’s write some code to do that.

namespace GinjaSoft.TreeStuff
{
  using System;

  public class Node<T>
  {
    public Node(T value) { Value = value; }

    public T Value { get; set; }
    public Node<T> Left { get; set; }
    public Node<T> Right { get; set; }
  }

  public static class BinaryTreeTraversals
  {
    public static void DepthFirstPostOrder(Node<T> root, Func<Node<T>, TResult, TResult, TResult> fn)
    {
      TResult leftResult = default;
      TResult rightResult = default;

      if(root.Left != null) leftResult = DepthFirstPostOrder(root.Left, fn);
      if(root.Right != null) rightResult = DepthFirstPostOrder(root.Right, fn);

      return fn(root, leftResult, rightResult);
    }
  }
}

This is a generic code. What we actually “do” at each node is factored out into a function that the client code will supply. This function has to adhere to a given prototype, it needs to take three parameters (a reference to the node to process and two instances of a result type - that will be generated from the processing of the left and right children respectively) and it needs to return a result type that will represent the result of processing the node.

Specific Code for the Problem at Hand

Here’s how we might write some client code to use the generic traversal code. First we need to define the result type (TResult) that will represent the result of processing a node. This type will be used to hold information about the paths that exist in each sub-tree within a tree as we recursively traverse all the nodes in a depth-first post-order manner.

namespace GinjaSoft.TreeStuff
{
  public partial class LongestPath
  {
    //
    // This type contains information about the paths that exist within a given sub-tree
    //
    public class NodeInfo
    {
      // The value of the tree's root node
      public uint RootValue { get; set; }

      // The length of the longest path of nodes (up through the root) with consecutively increasing values 
      public uint MaxIncPathLenToRoot { get; set; }

      // The length of the longest path of nodes (up through the root) with consecutively decreasing values
      public uint MaxDecPathLenToRoot { get; set; }

      // The length of the longest path of nodes with either consecutively increasing or decreasing values.
      // This path does not have to include the root node and also doesn't have to extend exclusively up the
      // tree towards the root.  The path can start from a given node, proceed upwards through other nodes
      // and then downwards again through child nodes.  This is the value of interest.
      public uint MaxPathLen { get; set; }
    }
  }
}

Here’s how we’ll use this type with the DepthFirstPostOrder method …

namespace GinjaSoft.TreeStuff
{
  public partial class LongestPath
  {
    //
    // This function is the solution to the interview question
    //
    public static uint GetMaxPathLength(Node<uint> tree)
    {
      var nodeInfo = GetNodeInfo(tree);
      return nodeInfo.MaxPathLen;
    }

    //
    // Implementation
    //

    internal static NodeInfo GetNodeInfo(IBinaryNode<uint> tree)
    {
      // Unfortunately I can't pass ProcessNode directly to DepthFirstPostOrder.  C# won't do the implicit
      // cast from method group to Func and so a local variable is required.  Sigh.
      Func<Node<uint>, NodeInfo, NodeInfo, NodeInfo> fn = ProcessNode;
      return BinaryTreeTraversals.DepthFirstPostOrder(tree, fn);
    }

    private static NodeInfo ProcessNode(Node<uint> root, NodeInfo left, NodeInfo right)
    {
      // ...
    }
  }
}

The ProcessNode function is the meat of the solution. It will be called recursively for each node (sub-tree) in the tree. Each call will contain the following parameters: the root node of the current sub-tree and two NodeInfo objects representing the result of the calls to ProcessNode for the left and right child nodes respectively.

As with most recursive functions we need to identify the base case, for which we return the base result, and otherwise generate a result based on the state passed on the call stack. The logic for the latter case is as follows …

If the root value is greater than the root value for a given child then MaxIncPathLenToRoot through the new root will be one larger than that of the child. Conversely if the root value is less than the child root value then the new MaxDecPathLenToRoot will be one larger than that of the child. The final Max(Inc/Dec)PathLenToRoot values will be the larger of the results for the left and right children.

Then we need to update MaxPathLen. The new value will start out as the larger of MaxPathLen from the two children. Then we will check the new Max(Inc/Dec)PathLenToRoot values and take the larger of the two if it is larger than the current MaxPathLen. Finally we check for an up/down path through the current node, i.e. whether the current root value lies between the child root values, in which case we can calculate a new candidate MaxPathLen as the sum of the Max(Inc/Dec)PathLenToRoot values from the children plus one. If this value is greater than the current MaxPathLen then we use it.

Here’s the actual code …

namespace GinjaSoft.TreeStuff
{
  public partial class LongestPath
  {
    private static NodeInfo ProcessNode(Node<uint> root, NodeInfo left, NodeInfo right)
    {
      // Base result for a leaf node
      var returnValue = new NodeInfo {
        RootValue = root.Value,
        MaxIncPathLenToRoot = 1,
        MaxDecPathLenToRoot = 1,
        MaxPathLen = 1
      };

      // If there are no children to process then we are done
      if(left == null && right == null) return returnValue;

      // Process children ...

      if(left != null) {
        // Update the min/max PathLenToRoot based on the current root value and left child root value
        if(root.Value > left.RootValue) returnValue.MaxIncPathLenToRoot = left.MaxIncPathLenToRoot + 1;
        else if(root.Value < left.RootValue) returnValue.MaxDecPathLenToRoot = left.MaxDecPathLenToRoot + 1;
      }

      if(right != null) {
        // Update the min/max PathLenToRoot based on the current root value and right child root value
        if(root.Value > right.RootValue) {
          var pathLen = right.MaxIncPathLenToRoot + 1;
          returnValue.MaxIncPathLenToRoot = Math.Max(returnValue.MaxIncPathLenToRoot, pathLen);
        }
        else if(root.Value < right.RootValue) {
          var pathLen = right.MaxDecPathLenToRoot + 1;
          returnValue.MaxDecPathLenToRoot = Math.Max(returnValue.MaxDecPathLenToRoot, pathLen);
        }
      }

      // The initial new MaxPathLen is the max of the children ...
      var leftMaxPathLen = left != null ? left.MaxPathLen : 0;
      var rightMaxPathLen = right != null ? right.MaxPathLen : 0;
      returnValue.MaxPathLen = Math.Max(leftMaxPathLen, rightMaxPathLen);

      // Take either of the two new max path lengths through the root if larger ...
      returnValue.MaxPathLen = Math.Max(returnValue.MaxPathLen, returnValue.MaxIncPathLenToRoot);
      returnValue.MaxPathLen = Math.Max(returnValue.MaxPathLen, returnValue.MaxDecPathLenToRoot);

      // Now check for paths that go up and down through the root ...
      if(left != null && right != null) {
        if(root.Value > left.RootValue && root.Value < right.RootValue) {
          var upDownPathLen = left.MaxIncPathLenToRoot + right.MaxDecPathLenToRoot + 1;
          returnValue.MaxPathLen = Math.Max(returnValue.MaxPathLen, upDownPathLen);
        }
        else if(root.Value < left.RootValue && root.Value > right.RootValue) {
          var upDownPathLen = left.MaxDecPathLenToRoot + right.MaxIncPathLenToRoot + 1;
          returnValue.MaxPathLen = Math.Max(returnValue.MaxPathLen, upDownPathLen);
        }
      }

      return returnValue;
    }
  }
}

Solution Validation

Let’s verify that our code will solve the example that we provided before …

namespace GinjaSoft.TreeStuff.Tests.LongestPathTests
{
  using Xunit;
  using Xunit.Abstractions;

  public class GetNodeInfoTests
  {
    private readonly ITestOutputHelper _output;

    public GetNodeInfoTests(ITestOutputHelper output)
    {
      _output = output;
    }

    [Fact]
    public void ThroughPathDoesNotIncludeRoot()
    {
      //          10
      //         /  \
      //       30    40
      //      /  \
      //    40    20
      //            \
      //             10

      var tree = new Node<uint>(10) {
        Left = new Node<uint>(30) {
          Left = new Node<uint>(40) { },
          Right = new Node<uint>(20) {
            Right = new Node<uint>(10) { }
          }
        },
        Right = new Node<uint>(40) { }
      };

      var nodeInfo = GetNodeInfo(tree);

      Assert.Equal(10u, nodeInfo.RootValue);
      Assert.Equal(1u, nodeInfo.MaxIncPathLenToRoot);
      Assert.Equal(3u, nodeInfo.MaxDecPathLenToRoot);
      Assert.Equal(4u, nodeInfo.MaxPathLen);
    }
  }
}

Woot! The test passes.

Resources

GitHub repo with code and full unit test suite

On an Interesting 9 Digit Number

2019-04-15T00:00:00+00:00

A Puzzle

This post is inspired by a puzzle that was posed by Mr. James Grime on his YouTube channel, singingbanana. The puzzle can be stated thus …

Find a nine digit number, using the digits 1 to 9 without repeats, such that the first two digits form a number divisible by 2; the first three digits form a number divisible by 3; the first four digits form a number divisible by 4; and so on up through 9. Trivially, the first digit will always be a number divisible by 1.

How many such numbers have this property?

Solution Strategies

Clearly we could write some code to implement a brute-force solution. We could generate all the possible 9 digit numbers and test for the requisite properties of the sub-numbers. That would be inelegant though. Surely we can use some clever mathematics to solve this, or at least to significantly pare down the space of possible solutions such that we could solve it by hand. Let’s try.

A Mathematical Approach

Let’s use the letters $a$ through $i$ as labels for the digits in the number that we seek. So the number is $abcdefghi$ and we can describe its required properties using the following expressions …

A: $10a + b \equiv 0 \bmod 2$

B: $100a + 10b + c \equiv 0 \bmod 3$

C: $1000a + 100b + 10c + d \equiv 0 \bmod 4$

D: $10000a + 1000b + 100c + 10d + e \equiv 0 \bmod 5$

E: $100000a + 10000b + 1000c + 100d + 10e + f \equiv 0 \bmod 6$

F: $1000000a + 100000b + 10000c + 1000d + 100e + 10f + g \equiv 0 \bmod 7$

G: $10000000a + 1000000b + 100000c + 10000d + 1000e + 100f + 10g + h \equiv 0 \bmod 8$

H: $100000000a + 10000000b + 1000000c + 100000d + 10000e + 1000f + 100g + 10h + i \equiv 0 \bmod 9$

Now, since the coefficients of $a, b, c, …$ can be reinterpretted as values modulo the appropriate number (2, 3, 4, …), we can simplify the above expressions. For example, $10 \equiv 0 \bmod 2$ and so expression A becomes $b \equiv 0 \bmod 2$. Also $100 \equiv 1 \bmod 3$ and $10 \equiv 1 \bmod 3$, so expression B becomes $a + b + c \equiv 0 \bmod 3$. Overall …

A: $b \equiv 0 \bmod 2$

B: $a + b + c \equiv 0 \bmod 3$

C: $2c + d \equiv 0 \bmod 4$

D: $e \equiv 0 \bmod 5$

E: $4a + 4b + 4c + 4d + 4e + f \equiv 0 \bmod 6$

F: $a + 5b + 4c + 6d + 2e + 3f + g \equiv 0 \bmod 7$

G: $4f + 2g + h \equiv 0 \bmod 8$

H: $a + b + c + d + e + f + g + h + i \equiv 0 \bmod 9$

We also know that all of the values $a, b, c, …$ have to be in $\{1, 2, 3, 4, 5, 6, 7, 8, 9\}$ and must be unique. So clearly $e = 5$ and …

$b$ is even (since $b \equiv 0 \bmod 2$)

$d$ is even (since $2c + d$ is even and $2c$ is even)

$f$ is even (since $4a + 4b + 4c + 4d + 4e + f$ is even and $4 \times$ anything is even)

$h$ is even (since $4f + 2g + h$ is even and $4f$ and $2g$ are even)

And …

$a$ is odd and not $5$

$c$ is odd and not $5$

$g$ is odd and not $5$

$i$ is odd and not $5$

… since all the even numbers are taken by $\{b, d, f, h\}$ and $e = 5$.

Now, let’s look at expression C above, i.e. $2c + d \equiv 0 \bmod 4$. We know that $c \in \{1, 3, 7, 9\}$ so …

$c = 1$: $2 + d \equiv 0 \bmod 4 \implies d \equiv 2 \bmod 4$, or

$c = 2$: $6 + d \equiv 0 \bmod 4 \implies d \equiv 2 \bmod 4$, or

$c = 3$: $14 + d \equiv 0 \bmod 4 \implies d \equiv 2 \bmod 4$, or

$c = 4$: $18 + d \equiv 0 \bmod 4 \implies d \equiv 2 \bmod 4$

We already know that $d \in \{2, 4, 6, 8\}$. Of these, the only ones that satisfy $d \equiv 2 \bmod 4$ are $\{2, 6\}$. So $d \in \{2, 6\}$.

Now, let’s look at expression G above, i.e. $4f + 2g + h \equiv 0 \bmod 8$. Since $4f + 2g + h$ has to be a muliple of 8 it also has to be a multiple of 4 too. So $4f + 2g + h \equiv 0 \bmod 4$ which implies that $2g + h \equiv 0 \bmod 4$. Now, by the same logic as we applied above for $2c + d \equiv 0 \bmod 4$ where $c \in \{1, 3, 7, 9\}$ we can take $2g + h \equiv 0 \bmod 4$ where $g \in \{1, 3, 7, 9\}$ and deduce that $h \in \{2, 6\}$.

The fact that $d \in \{2, 6\}$ and $h \in \{2, 6\}$ means that no other digit can be a 2 or a 6. So now we know …

$a \in \{1, 3, 7, 9\}$

$b \in \{4, 8\}$

$c \in \{1, 3, 7, 9\}$

$d \in \{2, 6\}$

$e = 5$

$f \in \{4, 8\}$

$g \in \{1, 3, 7, 9\}$

$h \in \{2, 6\}$

$i \in \{1, 3, 7, 9\}$

Now, recall that expression E states that $4a + 4b + 4c + 4d + 4e + f \equiv 0 \bmod 6$. But $e = 5$ so this becomes $4a + 4b + 4c + 4d + 20 + f \equiv 0 \bmod 6$ which implies that $4a + 4b + 4c + 4d + (2 \bmod 6) + f \equiv 0 \bmod 6$ or $4a + 4b + 4c + 4d + f \equiv 4 \bmod 6$.

Expression B says $a + b + c \equiv 0 \bmod 3$ so $4(a + b + c) + 4d + f \equiv 4 \bmod 6$ gives us $4(0 \bmod 3) + 4d + f \equiv 4 \bmod 6$ which imples that $(0 \bmod 12) + 4d + f \equiv 4 \bmod 6$ which implies that $4d + f \equiv 4 \bmod 6$.

We know that $d \in \{2, 6\}$ …

$d = 2 \implies 8 + f \equiv 4 \bmod 6 \implies 2 \bmod 6 + f \equiv 4 \bmod 6 \implies f \equiv 2 \bmod 6 \implies f = 8$

$d = 6 \implies 24 + f \equiv 4 \bmod 6 \implies 0 \bmod 6 + f \equiv 4 \bmod 6 \implies f \equiv 4 \bmod 6 \implies f = 4$

So either $def = 258$ or $def = 654$

Let’s assume $def = 258$. So then $b$ must be $4$ and so $abc$ could be $143$, $147$, $149$, $341$, $347$, $349$, $741$, $743$, $749$, $941$, $943$ or $947$. But $abc$ must be divisible by $3$ which eliminates several of these and leaves us with just $147$ or $741$.

If $def = 654$ then $b$ must be $8$ and $abc$ could be $183$, $187$, $189$, $381$, $387$, $389$, $781$, $783$, $789$, $981$, $983$ or $987$. But, again, $abc$ must be divisible by $3$ which leaves us with just $183$ or $189$, $381$, $387$, $783$, $789$, $981$ or $987$.

So now we know that $abcdef$ can be …

$147258$

$741258$

$183654$

$189654$

$381654$

$387654$

$783654$

$789654$

$981654$

$987654$

Enumerating the possible values of $ghi$ gives us …

$147258369$ - $14725836$ is not divisible by $8$

$147258963$ - $1472589$ is not divisible by $7$

$741258369$ - $7412583$ is not divisible by $7$

$741258963$ - $7412589$ is not divisible by $7$

$183654729$ - $1836547$ is not divisible by $7$

$183654927$ - $1836549$ is not divisible by $7$

$189654327$ - $1896543$ is not divisible by $7$

$189654723$ - $1896547$ is not divisible by $7$

$381654729$

$381654927$ - $3816549$ is not divisible by $7$

$387654129$ - $3876541$ is not divisible by $7$

$387654921$ - $3876549$ is not divisible by $7$

$783654129$ - $7836541$ is not divisible by $7$

$783654921$ - $78365492$ is not divisible by $8$

$789654123$ - $7896541$ is not divisible by $7$

$789654321$ - $7896543$ is not divisible by $7$

$981654327$ - $9816543$ is not divisible by $7$

$981654723$ - $9816547$ is not divisible by $7$

$987654123$ - $9876541$ is not divisible by $7$

$987654321$ - $9876543$ is not divisible by $7$

So, only one number isn’t eliminated. The unqiue answer to the puzzle is $381654729$.

On Parabolas and Multiplication

2019-01-14T00:00:00+00:00

I visited the National Museum of Mathematics in New York City with my family just recently. It’s a neat little museum that can be experienced in a few hours and I highly recommend it. One of the exhibits there is called “String Product”. It’s a large model of a paraboloid that sits in the middle of a spiral staircase between floors and illustrates an interesting property of the simple parabola $y = x^2$. If you take two positive numbers $a$ and $b$ and draw two vertical lines, parallel to the y-axis, from $x = -a$ and $x = b$, and then draw a line through the two points where these vertical lines cross the parabola, then that line will meet the y-axis at the value $a * b$. So this gives a nice geometric trick for multiplication.

Try it out …

a: , b: , a * b =

Proof

Why is this so? Let’s see.

Points on the parabola are parameterized by the coordinates $(x, x^2)$. So if we take the two positive integers $a$ and $b$ and then look at the vertical lines $x = -a$ and $x = b$, the points where these cross the parabola are $(-a, a^2)$ and $(b, b^2)$.

The general equation of a line is $y = mx + c$ where $m$ is the slope and $c$ is the value at which the line crosses the y-axis. To find $m$ and $c$ we can use the two points that we know are on the line, namely $(-a, a^2)$ and $(b, b^2)$. So …

1) From point $(-a, a^2): a^2 = -ma + c$

2) From point $(b, b^2): b^2 = mb + c$

We can eliminate c from these two equations by subtracting them …

\[b^2 - a^2 = mb + ma + c - c\] \[b^2 - a^2 = m(b + a)\]

Note that $b^2 - a^2 = (b + a)(b - a)$ and so …

\[(b + a)(b - a) = m(b + a)\]

By canceling, we get $m = (b - a)$ and then substituting back in equation #2 above we get …

\[b^2 = (b - a)b + c\] \[b^2 = b^2 - ab + c\] \[c = ab\]

So the value at which the line crosses the y-axis is equal to $ab$.

QED.

On Primes

2018-02-18T00:00:00+00:00

A prime number is a positive integer (a natural number) that is only evenly divisible by itself and one. The number one itself, by convention, is not considered a prime. A natural number, greater than one, that is not prime is said to be composite.

Primes can be considered as the building blocks of all positive numbers. A statement that is more formally expressesd by the fundametal theorem of arithmetic which states that every natural number, greater than one, is either a prime or can be factorized as a product of primes that is unique except for their order.

Some interesting questions come to mind …

How many primes are there?
How common are primes?
How do we determine whether a given number is prime?
What is the billionth prime?

I already answered #1 in my post about proof. There are infinitely many.

The distribution of primes within the natural numbers can be statistically modelled. The prime number theorem formalizes the intuitive idea that primes become less common as they become larger and introduces the prime counting function, $\pi(N)$, defined as the number of prime numbers less than or equal to N. The use of $\pi$ as a function here is unrelated to the number $\pi$.

The prime number theorem states that …

\[\pi(n) \sim \frac{n}{\log n}\]

Or more formally …

\[\lim_{n\to\infty} \frac{\pi(n)}{\frac{n}{\log n}} = 1\]

This means that for large enough N, the probability that a random integer not greater than N is prime is very close to $\frac{1}{\log N}$.

How to determine that a given number, N, is prime

Such a test is known as a primality test.

A simple brute force algorithm would be to enumerate all of the natural numbers less than N and see whether any of them evenly divide N. If any of them do then N is composite, otherwise it is prime.

Actually, we only have to test potential divisors less than or equal to $\sqrt N$. This is because if N is composite then at least one of its factors must be less than or equal to $\sqrt N$. To justify this assume that N is composite and $N = a \times b$. If both $a$ and $b$ were greater than $\sqrt N$ then $a \times b$ would be greater than N. This is clearly impossible and so at least one of the factors must be $<= \sqrt N$.

Ideally we would enumerate only the prime numbers less than N but this presupposes that we know all such primes. We can limit the test divisors somewhat by noting that all primes > 3 can be written as $6n - 1$ or $6n + 1$ for $n = 1, 2, ...$ so so we only have to enumerate test factors of that form.

To see why this is so note that we can enumerate all natural numbers (> 5) as …

$6n, 6n + 1, 6n + 2, 6n + 3, 6n + 4, 6n + 5$ for $n = 1, 2, 3, ...$

The numbers of the form $6n, 6n + 2$ and $6n + 4$ are all divisible by 2 and therefore are not prime. The numbers of the form $6n$ and $6n + 3$ are all divisible by 3 and therefore are not prime either. This just leaves those of the form $6n + 1$ and $6n + 5$ as candidate primes. The $6n+5$ numbers are equivalent to $6n - 1$.

Here’s an initial implementation of the brute force primality test in C# …

namespace PrimeNumbers {
  using System;

  public static class NumberExtentions {
    public static bool IsPrime(this ulong number)
    {
      if(number < 2) return false;        // 0 & 1 are not prime
      if(number < 4) return true;         // 2 & 3 are prime
      if(number % 2 == 0) return false;   // 4, 6, 8, 10, 12, 24, ... are composite
      if(number % 3 == 0) return false;   // 9, 15, 21, 27, 33, ... are composite

      // Now test for factors of the form 6n - 1 and 6n + 1 for n = 1, 2, 3, ...
      //  6n - 1 : 5,    11,     17,     23,     29, ...
      //  6n + 1 :    7,     13,     19,     25,     31, ...
      // ... up through floor(sqrt(number))

      var max = (ulong)Math.Sqrt(number);

      // We will get here for number = 5, 7, 11, 13, 17, 19, 23, 25, 29, 31, 35, ...
      // Those numbers with floor(sqrt(number)) < 5 will not go through this loop at all
      // That's OK though since all of those numbers are prime: 5, 7, 11, 13, 17, 19, 23

      // 6n - 1 and 6n + 1 for n = 1, 2, 3, ...
      // is equivalent to n and n + 2 for n = 6m + 5 where m = 0, 1, 2, ...
      for(ulong n = 5; n <= max; n += 6) {
        if(number % n == 0) return false;
        if(number % (n + 2) == 0) return false;
      }

      return true;
    }
  }
}

We can also leverage known primes. If N is small then we can simply do a lookup into a list of all known primes up to some value.

There are also probabilistic methods for testing primality, e.g. the Miller-Rabin test. This sort of method takes the number to be tested along with a factor indicating the required accuracy. Methods like this are efficient and useful when testing very large numbers for primality but for moderately sized numbers their performance lags significantly behind the exhaustive test methods.

The nth Prime

Being able to test a given number for primality is one thing but enumerating the primes in sequence is something else. In order to determine what is the billionth prime we need to generate, and count, prime numbers from 2 up to one billion. What’s an efficient way to do this?

Well, let’s start with brute force and then try to optimize from there. We’ll start by looking for the millionth prime, a little less ambitious than the billionth to start with.

Method 1

Loop over all the natural numbers, test each for primality using the above IsPrime function, count the primes and stop when we get to the millionth one. Do this all in a single thread.

Results …

With JIT compiler optimized C# code running on my 2017 MacBook Pro Core i7 @ 2.9GHz this takes, on average, 7.17 seconds to complete and then reports that the millionth prime is $15,485,863$. According to the Interwebz that is the correct answer.

We can do better. We don’t need to test every natural number for primality. As in the IsPrime test function we can iterate over the superset of the primes defined by $6n-1$ and $6n+1$.

Method 2

As method 1 but looping over the smaller set of candidate primes.

Results …

A tiny bit better. Runtime is now 7.03 seconds on average. No significant improvement though.

We need a different approach.

The Sieve of Eratosthenes

We will look to a polymath from antiquity for inspiration, a man named Eratosthenes of Cyrene. He was a Greek mathematician, geographer, poet, astronomer and music theorist who ultimately become the chief librarian of the Library at Alexandria. He is credited with the invention of a simple simple algorithm for finding all the prime numbers up to any given limit. This algorithm is now known as the Sieve of Eratosthenes.

The algorithm iteratively marks as composite the multiples of each number up to a given limit.

Example: To Find All the Primes Less Than or Equal to 30

First list all the natural numbers from 2 to 30 …

\[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30\]

The first number in the list is $2$. Go through the list and cross out all the multiples of $2$ other than $2$ itself..

\[2, 3, \color{red}{4}, 5, \color{red}{6}, 7, \color{red}{8}, 9, \color{red}{10}, 11, \color{red}{12}, 13, \color{red}{14}, 15, \color{red}{16}, 17, \color{red}{18}, 19, \color{red}{20}, 21, \color{red}{22}, 23, \color{red}{24}, 25, \color{red}{26}, 27, \color{red}{28}, 29, \color{red}{30}\]

The next uncrossed number in the list is $3$. Go through the list and cross out all the multiples of $3$ other than $3$ itself …

\[2, 3, \color{lightgrey}{4}, 5, \color{lightgrey}{6}, 7, \color{lightgrey}{8}, \color{red}{9}, \color{lightgrey}{10}, 11, \color{lightgrey}{12}, 13, \color{lightgrey}{14}, \color{red}{15}, \color{lightgrey}{16}, 17, \color{lightgrey}{18}, 19, \color{lightgrey}{20}, \color{red}{21}, \color{lightgrey}{22}, 23, \color{lightgrey}{24}, 25, \color{lightgrey}{26}, \color{red}{27}, \color{lightgrey}{28}, 29, \color{lightgrey}{30}\]

The next uncrossed number in the list is $5$. Go through the list and cross out all the multiples of $5$ other than $5$ itself …

\[2, 3, \color{lightgrey}{4}, 5, \color{lightgrey}{6}, 7, \color{lightgrey}{8}, \color{lightgrey}{9}, \color{lightgrey}{10}, 11, \color{lightgrey}{12}, 13, \color{lightgrey}{14}, \color{lightgrey}{15}, \color{lightgrey}{16}, 17, \color{lightgrey}{18}, 19, \color{lightgrey}{20}, \color{lightgrey}{21}, \color{lightgrey}{22}, 23, \color{lightgrey}{24}, \color{red}{25}, \color{lightgrey}{26}, \color{lightgrey}{27}, \color{lightgrey}{28}, 29, \color{lightgrey}{30}\]

The next uncrossed number in the list is $7$. Multiples of $7$ will result in no more exclusions since all such numbers have already been crossed out. We should note that this will be the case as soon as the first uncrossed out number in a given pass through the array is greater than $\sqrt{N}$ where N is the size of the array, $30$ in our example.

Note that all the even numbers will be crossed out as we walk through the array eliminating multiples of $2$ so there’s really no point writing them down in the first place. We can just write down the odd numbers and save half the space. Of course we have to remember that there is one even prime though, namely $2$.

Another thing to note is that for each starting number, $n$, all the multiples of that number less than the square of $n$ will have already been crossed out, so we can start crossing out multiples of $n$ from $n^2$.

At this point we are done. All of the remaining uncrossed numbers are the primes.

We can write an implementation of this algorithm in C# …

namespace PrimeNumbers {
  using System;

  public static class Numbers {
    public static IEnumerable<uint> PrimesLessThan(uint maxNumber)
    {
      if(maxNumber < 3) yield break;

      // Allocate an array of flags
      // We don't need to consider even numbers so we only need maxNumber / 2
      var numberIsComposite = new bool[maxNumber / 2];

      yield return 2;  // 2 is the only even prime, output that one by default

      // For odd n, 3 -> sqrt max, output primes and mark all prime multiples (from n^2) as composite
      uint n = 3;
      var sqrtMaxNumber = (uint)System.Math.Sqrt(maxNumber);
      for(; n <= sqrtMaxNumber; n += 2) {
        if(numberIsComposite[n / 2]) continue;
        yield return n;
        for(ulong m = n * n; m < maxNumber; m += 2 * n)  // m is ulong to avoid overflow
          numberIsComposite[m / 2] = true;
      }

      // Continue to walk through the rest of the array and output each prime number
      for(; n < maxNumber; n += 2)
        if(!numberIsComposite[n / 2]) yield return n;
    }
  }
}

This will generate each prime up to and including maxNumber.

We can use this algorithm to find the millionth prime. The only tricky part is that we need to pass in a value for the largest number to consider. In other words we need to come up with an estimate for how big the millionth prime will be. We will address this limitation in due course but for now let’s just get a rough benchmark on the performance of this strategy.

Method 3

Use the Sieve of Eratosthenes with an upper bound of $15,500,000$ for the internal array of numbers. Count the resulting primes and output the millionth one. Here’s the code …

namespace ScratchConsoleApp
{
  using System;
  using System.Diagnostics;
  using PrimeNumbers;

  class Program
  {
    static void Main(string[] args)
    {
      const uint maxNumber = 15500000;   // Magically chosen max value ...
      uint count = 0;
      uint nthPrime = 0;

      var sw = new Stopwatch();
      sw.Start();
      foreach(var prime in Numbers.PrimesLessThan(maxNumber)) {
        ++count;
        if(count == n) nthPrime = prime;
      }
      sw.Stop();

      Console.WriteLine($"The {n}th prime is {nthPrime}");
      Console.WriteLine($"Elapsed = {sw.Elapsed}");
    }
  }
}

Results …

With JIT compiler optimized C# code running on my 2017 MacBook Pro Core i7 @ 2.9GHz this takes, on average, 0.13 seconds to complete and then reports that the millionth prime is $15,485,863$, which we know to be correct.

Wow! That’s much faster than before.

This implementation has additional scope for improvement as well.

Support finding primes larger than uint.MaxValue
Create a simpler interface to allow us to request the Nth prime and remove the need to specify an upper bound for the value of the prime
Chunking, to achieve better locality of memory access (to improve CPU L1 and L2 cache hit rates)
Use threads to leverage multiple CPU cores in parallel

There are some simple tweaks too. It turns out that the IEnumerable interface and the yield return construct have some overhead. If we change the code to …

namespace PrimeNumbers {
  using System;

  public static class Numbers {
    public static void ForEachPrimeLessThan(uint maxNumber, Action<uint> primeFn)
    {
      if(maxNumber < 3) return;

      // Allocate an array of flags
      // We don't need to consider even numbers so we only need maxNumber / 2
      var numberIsComposite = new bool[maxNumber / 2];

      primeFn(2);  // 2 is the only even prime, output that one by default

      // For odd n, 3 -> sqrt max, output primes and mark all prime multiples (from n^2) as composite
      uint n = 3;
      var sqrtMaxNumber = (uint)System.Math.Sqrt(maxNumber);
      for(; n <= sqrtMaxNumber; n += 2) {
        if(numberIsComposite[n / 2]) continue;
        primeFn(n);
        for(ulong m = n * n; m < maxNumber; m += 2 * n)  // m is ulong to avoid overflow
          numberIsComposite[m / 2] = true;
      }

      // Continue to walk through the rest of the array and output each prime number
      for(; n < maxNumber; n += 2)
        if(!numberIsComposite[n / 2]) primeFn(n);
  }
}

… and use it like this …

namespace ScratchConsoleApp
{
  using System;
  using System.Diagnostics;
  using PrimeNumbers;

  class Program
  {
    static void Main(string[] args)
    {
      const uint maxNumber = 15_500_000;   // Magically chosen max value ...
      uint count = 0;
      uint nthPrime = 0;

      var sw = new Stopwatch();
      sw.Start();
      Numbers.ForEachPrimeLessThan(maxNumber, prime => {
        ++count;
        if(count == n) nthPrime = prime;
      });
      sw.Stop();

      Console.WriteLine($"The {n}th prime is {nthPrime}");
      Console.WriteLine($"Elapsed = {sw.Elapsed}");
    }
  }
}

… then the results are slightly better.

A Better Sieve

Let’s make our sieve implementation better. First we want to support finding larger primes which means that we need to use 64 bit integers (ulong) as opposed to 32 bit (uint). For the smaller primes this will be a waste of space but we will need the extra capacity in order to work with primes up to and including the billionth one.

Our existing sieve implementation uses a single working array of bool where the index of each element corresponds to a number that we want to test for primality. This means that the maximum number that we can test is the array size less one. Now, the .Net Framework imposes a limit of Int32.MaxValue ($2^{31}-1$) on the number of elements in an array and therefore also imposes a limit on the size of the maximum prime that we can find. We optimize our use of the working array by ignoring even numbers (> 2) and associating index $n$ with the number $2n + 1$ but this still caps us out at 4 billion or so. To test larger numbers we will need a different approach.

Sieving in Blocks

We can search for primes in blocks and reuse our working array for each block. Let’s look at an example of how this will work. Say we want to limit the size of our working array to 30 elements but we want to find all the primes up to 300.

Recall that we only ever need to eliminate multiples of primes from our working array where the primes are less than or equal to the square root of the largest number in the array. Now, floor(sqrt(300)) = 17 and so first we need to find all the primes up to 17. To do this we need a working array with 9 elements. We assume that we have a working array with 30 elements so we have enough for this first task.

Array index (n):                0  1  2  3  4  5  6  7  8  9 10 ... 29
Corresponding number (2n + 1):  1  3  5  7  9 11 13 15 17 19 21 ... 59

We run the basic sieve (as before) on this working array to generate the primes: 2, 3, 5, 7, 11, 13 & 17.

Now with this list of primes saved we will proceed to examine the numbers up to 300 in blocks of 60 (2 * working array size). Here’s the first block …

block #1
startNumber (s) = 0
n:       0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
s+2n+1:  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59

We eliminate all multiples of the primes that we determined before. For each prime $p$ we eliminate $p^2 + 2kp$ for $k = 0, 1, 2, ...$ until we pass the end of the block.

block #1
startNumber (s) = 0
n:       0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
s+2n+1:  1  3  5  7    11 13    17 19    23       29 31       37    41 43    47       53    57

Now the next block …

block #2
startNumber (s) = 60
n:       0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19  20  21  22  23  24  25 ...  29
s+2n+1: 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 ... 119

… which becomes the following after eliminating multiples of our primes …

block #2
startNumber (s) = 60
n:       0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19  20  21  22  23  24  25 ...  29
s+2n+1: 61       67    71 73       79    83       89          97    101 103     107 109     ...

Let’s look at a later block …

block #5
startNumber (s) = 240
n:        0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 ...  29
s+2n+1: 241 243 245 247 249 251 253 255 257 259 261 263 265 267 269 271 273 275 277 279 281 ... 299

Here the first number in the block is 241. As we enumerate our primes we end up skipping over many multiples of them before we get to a number that is in the block.

p   p^2 +2p +2p +2p +2p +2p +2p +2p +2p +2p +2p +2p +2p +2p +2p +2p +2p +2p +2p +2p +2p +2p +2p ...
   9  15  21  27  33  39  45  51  57  63  69  75  81  87  93  99 105 111 117 123 129 135 141 ...
  25  35  45  55  65  75  85  95 105 115 125 135 145 155 165 175 185 195 205 215 225 235 245 ...
  49  63  77  91 105 119 133 147 161 175 189 203 217 231 245 259 273 287 301 ...
121 143 165 187 209 231 253 275 297 319 ...
169 195 221 247 273 299 ...
289 ...

Rather than having to iterate over all of these redundant prime multiples we can skip ahead with some appropriate arithmetic (see below).

Here’s some C# code to sieve a block …

private static void SieveBlock(bool[] isPrime,
                               IReadOnlyList<ulong> primes,
                               uint startIndex,
                               ulong startNumber,
                               ulong endNumber)
{
  foreach(var p in primes) {
    if(p == 2) continue;   // 2 is a special case not covered by our working array

    // Start eliminating elements of the working array from p^2
    var n = p * p;
    if(n > endNumber) break;

    // In the for loop below, n = p*p + 2*k*p for +ve integers k = 0, 1, 2, ...
    // We don't need to do anything until n is >= startNum so fast forward until that is true
    if(n < startNumber) {
      // Find the smallest k such that n + 2*k*p >= startNum
      //   k >= (startNum - n) / (2 * p)
      //   k >= a / b for a = startNum - n and b = 2 * p
      //   k = a / b if a % b == 0 otherwise k = (a / b) + 1
      var a = startNumber - n;
      var b = 2 * p;
      var k = a % b == 0 ? a / b : (a / b) + 1;
      n = n + (2 * k * p);
    }

    // Mark all multiples of this prime as composite in the current range of the working array
    for(; n < endNumber; n += 2 * p) {
      var index = startIndex + (n - startNumber) / 2;
      isPrime[index] = false;
    }
  }
}

Using block processing like this we can come up with a sieve that will work for much larger numbers.

Method 4

Use a block processing Sieve of Eratosthenes with a block size of 500,000 and an upper bound of 15,500,000. Count the resulting primes and output the millionth one.

Results …

c:\src\eratosthenes\BlockSieve.exe -blockSize=500000 -maxNumber=15500000 -nthPrime=1000000
The 1,000,000th prime is 15,485,863
Elapsed = 00:00:00.0622126

What about the 100 millionth prime? Well, our implementation can do that now.

Results …

c:\src\eratosthenes\BlockSieve.exe -blockSize=500000 -maxNumber=2100000000 -nthPrime=100000000
The 100,000,000th prime is 2,038,074,743
Elapsed = 00:00:09.6867032

What about the 500 millionth prime?

Results …

c:\src\eratosthenes\BlockSieve.exe -blockSize=500000 -maxNumber=11100000000 -nthPrime=500000000
The 500,000,000th prime is 11,037,271,757
Elapsed = 00:00:57.2004362

But what about that billionth prime?

Results …

c:\src\eratosthenes\BlockSieve.exe -blockSize=500000 -maxNumber=23000000000 -nthPrime=1000000000
The 1,000,000,000th prime is 22,801,763,489
Elapsed = 00:02:05.9492586

According to the Interwebz that is indeed the correct answer for the billionth prime.

But wait, we’re not done …

Parallel Block Processing

Once we have determined our intial list of primes then we can eliminate composite numbers from blocks in parallel.

private static void ParallelSieveBlocks(ulong endNumber,
                                        uint blockSize,
                                        Func<ulong, bool> primeFn,
                                        bool[] isPrime,
                                        IReadOnlyList<ulong> primes,
                                        uint numTasks,
                                        ulong startNum)
{
  var tasks = new List<Task>((int)numTasks);

  for(uint m = 0; m < numTasks; ++m) {
    // Init the start index of the section of the working array that this task will use and the
    // bounds of the numbers that it will process
    var taskStartIndex = m * blockSize;
    var taskStartNum = startNum + (m * 2 * blockSize);
    var taskEndNum = taskStartNum + (2 * blockSize);

    if(taskStartNum >= endNumber) break;                // Short circuit if there's no more work to
    if(taskEndNum > endNumber) taskEndNum = endNumber;  // Adjust endNum if we are the last block

    tasks.Add(Task.Run(() => SieveBlock(isPrime, primes, taskStartIndex, taskStartNum, taskEndNum)));
  }

  Task.WaitAll(tasks.ToArray());
}

How many parallel tasks can we spawn though? There’s really no point in using more than the number of cores on the local system. Thus we will end up with a partially parallel but still somewhat serial algorithm.

public static void ParallelSieve(ulong endNumber,
                                 Func<ulong, bool> primeFn,
                                 uint blockSize = 600000,
                                 uint numTasks = 0)
{
  // Save the value of floor(sqrt(endNumber)) and use it as a cap on the initial set of primes
  var sqrt = (ulong)Math.Sqrt(endNumber);

  if(numTasks == 0) numTasks = (uint)Environment.ProcessorCount;

  // We allocate a working array of bool with blockSize elements for each task
  var arraySize = blockSize * numTasks;

  // We will use this same array for the initial serial sieve to find the primes up to 'sqrt' so
  // it needs to be at least big enough to hold all the numbers up to that value.  Remember that
  // we are only allocating half the values though (the / 2 optimization) so we need to account
  // for that.
  var serialSieveSize = (sqrt + 1) / 2;
  if(arraySize < serialSieveSize) arraySize = (uint)serialSieveSize;

  // Allocate and init the Working array.  This will be reused by the tasks during each loop.
  var isPrime = new bool[arraySize];
  InitArray(isPrime, 0);

  // Allocate enough space to store 'sqrt' primes
  var primes = new List<ulong>((int)sqrt);

  // First serially determine all the primes up to and including 'sqrt'
  Sieve(isPrime, sqrt + 1, p => primes.Add(p));

  // We will now proceed in parallel blocks of size 'blockSize'.  Each block will be for a subset
  // of the numbers up to endNumber.

  // We have to process all the numbers up to endNumber.  How many blocks is this?  Don't forget
  // remainders.
  var numElements = endNumber / 2;
  var numBlocks = (uint)(numElements % blockSize == 0 ?
                  numElements / blockSize :
                  (numElements / blockSize) + 1);

  // We will process the blocks numTasks at a time.  How many loops will we require?  Don't
  // forget remainders.
  var numLoops = numBlocks % numTasks == 0 ? numBlocks / numTasks : (numBlocks / numTasks) + 1;

  // Main loop
  for(uint n = 0; n < numLoops; ++n) {
    InitArray(isPrime, n);  // Re-initialize the array for reuse

    // Init the bounds of the numbers that this loop will be working with
    var startNum = (ulong)n * 2 * blockSize * numTasks;
    var endNum = startNum + (2 * blockSize * numTasks);
    if(endNum > endNumber) endNum = endNumber;

    // Launch tasks to process each block of the working array in parallel and wait for them to
    // complete
    ParallelSieveBlocks(endNumber, blockSize, isPrime, primes, numTasks, startNum);

    // All tasks done, enumerate over the working array and output primes
    if(n == 0) primeFn(2);  // 2 is a special case
    var totalBlockSize = numTasks * blockSize;
    for(uint m = 0; m < totalBlockSize; ++m) {
      if(!isPrime[m]) continue;
      var prime = startNum + (2 * m) + 1;
      if(primeFn(prime)) return;   // Return if the client code indicates they are done
    }
  }
}

Method 5

Use a parallel block processing Sieve of Eratosthenes with a block size of 500,000 and various upper bounds, count the resulting primes and output the millionth, 100 millionth, 500 millionth and the billionth ones.

Results …

c:\src\eratosthenes\ParallelBlockSieve.exe -blockSize=500000 -maxNumber=15500000 -nthPrime=1000000
The 1,000,000th prime is 15,485,863
Elapsed = 00:00:00.0558394

c:\src\eratosthenes\ParallelBlockSieve.exe -blockSize=500000 -maxNumber=2100000000 -nthPrime=100000000
The 100,000,000th prime is 2,038,074,743
Elapsed = 00:00:07.4914955

c:\src\eratosthenes\ParallelBlockSieve.exe -blockSize=500000 -maxNumber=11100000000 -nthPrime=500000000
The 500,000,000th prime is 11,037,271,757
Elapsed = 00:00:41:1140395

c:\src\eratosthenes\ParallelBlockSieve.exe -blockSize=500000 -maxNumber=23000000000 -nthPrime=1000000000
The 1,000,000,000th prime is 22,801,763,489
Elapsed = 00:01:23.4179141

Not bad.

I experimented with different block sizes but ultimately found that results were best when the block size was chosen to ensure that each block of working array (in use by the threadpool thread running each task) was small enough that it would stay in L2/3 cache for a core. For larger blocks we may be able to do more work in parallel, and thus ultimately execute fewer loops, but we pay more in memory access latency than we gain in saved serial calculations.

Results Comparison

Here’s a summary of the runtimes of our different methods (All times are minutes, seconds and milliseconds).

nth Prime	Sequential IsPrime	Sieve	Block Sieve	Parallel Block Sieve
1,000,000	07:03.000	00:00.130	00:00.062	00:00.056
100,000,000	n/a	n/a	00:09.687	00:07.491
500,000,000	n/a	n/a	00:57.200	00:41.114
1,000,000,000	n/a	n/a	02:05.949	01:23.418

On Proof

2018-02-12T00:00:00+00:00

Proof. An interesting word with several meanings …

noun

Evidence or argument establishing or helping to establish a fact or the truth of a statement
A trial print of something, in particular
A measure of the content of ethanol in an alcoholic beverage

adjective

Able to withstand something damaging; resistant
Denoting a trial impression of a page or printed work

verb

Make (fabric) waterproof
Make a proof of (a printed work, engraving, etc.)

Although I very much like the third noun usage, today we are going to be discussing proof in the context of mathematics and in particular, the various forms in which it can be achieved.

The Mathematical Edifice

I like to think of mathematics as a structure and a tall and impressive one at that, which only ever gets taller over time. Each level of the structure builds on the layers below and at the very bottom is a foundation upon which everything depends. These foundations are known as axioms things which are assumed to be true and require no proof of themselves. Above the axioms are theorems, statements which are shown to be true based on the logical application of the axioms and theorems that lie below them. The truth of each level of the structure depends on the truth of the layers below. In any lower level theorem is proven to be false then the truth of all the theorems above is thrown into doubt.

Geometry is a classic example of an edifice built on a set of axioms where the type of structure that you end up with depends critically on which axioms are assumed and not assumed, the so called parallel postulate being a ready example. Eucliden Geometry is the study of geometry that satisfies all of Euclid’s axioms, including the parallel postulate. A geometry where the parallel postulate does not hold is termed non-Euclidean and there are several types thereof (hyperbolic, elliptic, etc.)

The process of building the edifice depends wholy on the nature of mathematical proof.

Mathematical Proof

Formally, in mathematics, a proof is an inferential argument for a mathematical statement. In the argument, other previously established statements, such as theorems, can be used. In principle, a proof can be traced back to self-evident or assumed statements, known as axioms, along with accepted rules of inference. Axioms may be treated as conditions that must be met before the statement applies. Proofs are examples of exhaustive deductive reasoning or inductive reasoning and are distinguished from empirical arguments or non-exhaustive inductive reasoning (or “reasonable expectation”). A proof must demonstrate that a statement is always true (occasionally by listing all possible cases and showing that it holds in each), rather than enumerate many confirmatory cases. An unproved proposition that is believed to be true is known as a conjecture. Proofs employ logic but inevitably include some amount of natural language which can admit some ambiguity. In fact, the vast majority of proofs in written mathematics can be considered as applications of rigorous informal logic.

Pause. Breathe.

Informally, proofs are things that are required in order to pass examinations in mathematics but which many students consider not that important in the grand scheme of things and sometimes even a load of dingo’s kidneys.

The presentation of a mathematical proof generally follows a format. The proposition to be proved is first stated along with any related assumptions about the elements involved in the proposition. What follows is a logical progression of statements, the truth of each of which is inferred from the statements that came before it coupled with other things (definitions, axioms and theorems) that we already know to be true. This progression of true statements proceeds until the original proposition is stated, at which point the proof is complete. Finally, and traditionally, the term QED appears which is an acronym for the Latin phrase quod erat demonstrandum meaning “what was to be demonstrated” or “what was to be shown”. An alternative, although slightly inaccurate, translation is “thus it has been demonstrated”.

Types of Proof

Mathematical proofs fall into various categories. Each of these can be considered an approach to a proof, or a tool that can be employed in pursuit of a proof. Many theorems can be prooved in different ways using different techniques. For example, the classic theorm of Pythagoras is said to have at least 370 known distinct proofs.

Proof by Deduction

A proof by deduction is a technique in which the succession of true statements is achieved via deductive reasoning. We proceed from statement A to statement B via an implication inferred from the assumed truth of A conjoined with other known truths (definitions, axioms and theorems).

The direction of the implication of truth between two statements must be carefully considered. For two related statements A and B we can have.

A if B

“B is true” implies that “A is true” (B => A)
“A is false” implies that “B is false” (!A => !B)
This is the same as B only if A

A only if B

“B is false” implies that “A is false” (!B => !A)
“A is true” implies that “B is true” (A => B)
This is the same as B if A

A if and only if B

“A is true” implies that “B is true” (A => B)
“A is false” implies that “B is false” (!A => !B)
“B is true” implies that “A is true” (B => A)
“B is false” implies that “A is false” (!B => !A)
The two statements are logically equivalent (A <=> B)
Example 1:*
Statement A: x > 10
Statement B: x > 5

B if A (equivalently A => B) but not A if B. B does not imply A. E.g. x = 6 satisfies B but not A.

Example 2:

Statement A: n is an even number
Statement B: n = 2k for some integer value k

A if and only if B (equivalently A <=> B)

Proof by Induction

A proof by induction is just like an ordinary proof in which every step must be justified. However, it employs a neat trick which allows you to prove a proposition about an arbitrary positive integer n by first proving it is true when n = 1 and then assuming it is true for n = k and showing (via other deductive reasoning) that it is also true for n = k + 1. By doing this we can then say that since the proposition is true for n = 1 then it must also be true for n = 2 (since true for k implies true for k + 1); and since it’s true for n = 2 then it must also be true for n = 3 and so on and so forth ad infinitum. Thus the proposition must be true for all positive integer values of n.

Proof by Contradiction

Proof by contradiction starts out by assuming that the opposite of the proposition is true, and then shows that such an assumption leads to a contradiction. Thus the original assumption (that the proposition is false) cannot be true and so we can deduce that the proposition must be true. This style of proof is a particular kind of the more general form of argument known as reductio ad absurdum.

G. H. Hardy described proof by contradiction as “One of a mathematician’s finest weapons”, saying “It is a far finer gambit than any chess gambit: a chess player may offer the sacrifice of a pawn or even a piece, but a mathematician offers the game.”

Other

Other styles of proof exist, such as proof by construction and proof by exhaustion but the main tools in a mathematician’s toolbox are those outlined above. Another style, although not formal, is the visual proof or proof without words. A great example of this would be the following visual representation of Pythagoras’ Theorem.

Example Proofs

Let’s consider some interesting mathematical propositions and then use the different types of proof to turn those propositions into theorems.

1089

I’m not sure if this one counts as a theorem but it’s a fun mathematical trick that we can demonstrate to be true via deductive reasoning based on the initial conditions.

Proposition …

Pick any three digit positive decimal integer where the first digit is greater than the last. Reverse the digits to form another positive three digit integer and subtract this new integer from the first. The result will be another three digit positive integer. Reverse the digits of this new number and then add the resulting number to the result from the first operation. The answer will be 1089 and will always be 1089 regardless of the initial choice of number.

Examples …

$N = 321$; $N_{rev} = 123$; $N - N_{rev} = M = 198$; $M_{rev} = 891$; $M + M_{rev} = 1089$
$N = 780$; $N_{rev} = 087$; $N - N_{rev} = M = 693$; $M_{rev} = 396$; $M + M_{rev} = 1089$
$N = 500$; $N_{rev} = 005$; $N - N_{rev} = M = 495$; $M_{rev} = 594$; $M + M_{rev} = 1089$

Proof (by deduction) …

Let’s assume that the digits of our initial number, $N$, are $a$, $b$ & $c$.

We know: $a \in [1, 9]$, $b \in [0, 9]$, $c \in [0, 9]$, $a > c$ and $N = 100a + 10b + c$.

We know that $a$ cannot be zero because $a > c$.

Now, $N_{rev} = 100c + 10b + a$ and therefore $N - N_{rev} = M = (100a + 10b + c) - (100c + 10b + a)$.

Simplifying, we have $M = 100a - a + 10b - 10b - 100c + c = 99a - 99c = 99(a - c)$.

We know from our initial assumptions that $a - c > 0$ and $a - c \in [1, 9]$ so $M$ is a positive multiple of $99$ and further, $M \in \{ 099, 198, 297, 396, 495, 594, 693, 792, 891, 990 \}$.

For each of these possible values of $M$, the middle digit is $9$ and the first and last digits sum to give $9$. If we write the digits of M as $a$, $9$ & $c$ then …

\[M + M_{rev} = (100a + 90 + c) + (100c + 90 + a) = 100(a + c) + 180 + a + c\]

But we know that $a + c = 9$ and therefore $M + M_{rev} = 900 + 180 + 9 = 1089$.

QED

A Formula for the Sum of the First N Positive Integers

Proposition …

$1 + 2 + 3 + 4 + ... + n = \frac{n(n + 1)}{2}$ for all positive integers $n$.

Proof (by induction) …

Assume that the proposition is true for $n = N$. So we can take this as true …

\[1 + 2 + 3 + 4 + ... + N = \frac{N(N + 1)}{2}\]

Add $N + 1$ to both sides to give …

\[1 + 2 + 3 + 4 + ... + N + (N + 1) = \frac{N(N + 1)}{2} + (N + 1)\]

We can rearange the right hand side to give …

\[\frac{N(N + 1)}{2} + (N + 1) = \frac{N(N + 1) + 2(N + 1)}{2} = \frac{N^2 + 3N + 2}{2} = \frac{(N + 1)(N + 2)}{2}\]

and so …

\[1 + 2 + 3 + 4 + ... + N + (N + 1) = \frac{(N + 1)(N + 2)}{2}\]

which is the same as the original proposition but with $n = N + 1$.

So the proposition being true for $n = N$ logically implies it is also true for $n = N + 1$.

It is trivially true for $n = 1$ since $1 = \frac{1(2)}{2}$ is true.

Therefore it must also be true for $n = 2, 3, 4, ...$.

Therefore it is true for all positive integers.

QED

The Infinitude of the Primes

Definition: A prime number is a positive integer that is only evenly divisible by itself and one.

Proposition …

There are infinitely many prime numbers.

Proof (by construction/induction) …

Let’s take any finite set of prime numbers $\{p_1, p_2, p_3, ..., p_n\}$ and then let’s form the number $N = (p_1 \times p_2 \times p_3 \times ... \times p_n) + 1$.

$N$ must have a prime factor but that factor can’t be any of $\{p_1, p_2, p_3, ..., p_n\}$ because, by construction, there will always be a remainder of $1$ when $N$ is divided by any of the primes in our set.

Therefore there must be another prime that is not in our set.

But we chose our initial finite set of primes arbitrarily. Thus for any finite set of primes there must be another that is not in the set.

Therefore there are infinitely many prime numbers.

QED

The Irrationality of the Square Root of Two

Definition: A rational number is one that can be expressed as the quotient $\frac{p}{q}$ of two integers, a numerator $p$ and a non-zero denominator $q$. An irrational number is a real number that is not a rational number.

Proposition …

The square root of two is an irrational number.

Proof (by contradiction) …

Assumption: $\sqrt{2}$ is rational

Thus $\sqrt{2}$ can be expressed as $\frac{p}{q}$ where $p$ and $q$ are integers with no common factors (other than $1$) and where $q$ is non-zero.

$\sqrt{2} = \frac{p}{q}$ therefore $2 = \frac{p^2}{q^2}$ and so $2q^2 = p^2$

Thus $p^2$ must be an even number since it is an integer multiple of $2$.

Now, if $p$ was odd, i.e. $p = 2k + 1$ for some integer $k$ then $p^2 = (2k + 1)^2 = 4k^2 + 4k + 1$ which is also odd. If $p$ was even, i.e. $p = 2k$ for some integer $k$ then $p^2 = (2k)^2 = 4k^2$ which is also even. So $p^2$ odd <=> $p$ odd and $p^2$ even <=> $p$ even.

So, $p$ is even, i.e. $p = 2k$ for some integer k.

Substituting this back into $2q^2 = p^2$ gives $2q^2 = (2k)^2 = 4k^2$ and thus $q^2 = 2k^2$ which means that $q^2$ is even as well (since it’s a multiple of $2$) and so, by the same argument as above, $q$ must be even too.

So, $p$ and $q$ are both even.

But this contradicts our initial assumption that $p$ and $q$ have no common factors other than $1$. Thus our initial assumption must be false and so $\sqrt{2}$ can’t be rational. Thus it is irrational.

QED

On Mathematics and its Teaching

2017-12-01T00:00:00+00:00

I love mathematics. It’s logical, universal and beautiful. I loved mathematics as a child and was naturally good at it in school. In fact it was one of the things that defined me as a kid, I was good at math and science. I wasn’t into sports or any other significant activities back then. Yes, I was one of those kids. Mathematics just came naturally to me and I wanted to learn more and more. I wanted to know everything.

I grew up in a nondescript family in a nondescript town and went to (I realize now, looking back) nondescript, mediocre schools. I wasn’t driven to achieve; there was no expectation that I should aspire to attend an elite university that would act as a stepping stone for me to then move on to a “career”. I just went to school and learnt what I was presented with.

I did have the benefit of having some great, fun math teachers though; in middle school (what, in the UK, we called high school - Redmoor High School in Hinckley to be precise) and then in high school (sixth form college in the UK, specifically the John Cleveland College, also located in the same Leicestershire idyll).

Now I described these teachers as “great” and at the time that’s what I thought they were. Looking back now though, with the benefit of hindsight, I can say that they didn’t do that great a job of preparing me to study mathematics at a higher level (more on this below). However, they were encouraging and they made the subject fun, and that is an essential aspect of mathematical pedagogy; and something that was noticeably absent from my experience of university-level teaching.

I was streamed ahead in middle school and then again in high school; and for the last two years there I was in a tiny class (with only five kids in total if I recall correctly) where everyone was motivated to learn and so the teachers could just focus on teaching without having to do general crowd control as well. There was also a competitive dynamic between three of us in the class that really drove us on. I recall it fondly.

One day my teachers suggested that my fellow math geeks and I should apply to Oxbridge (a portmanteau used to collectively refer to the two ancient elite universities in the UK, Oxford and Cambridge). This was an idea that would never have occurred to me or my family ourselves. As I said before, I was not raised to aspire to such things. The high school didn’t have a grand tradition of sending students to Oxbridge either but they clearly had an eye for it and we were considered worthy. From that point (towards the end of my penultimate year of school) our math class shifted to focus on the relevant entrance exams and preparing us for interview. We were successful too. Ultimately I and two of my fellow classmates were accepted, two of us to Oxford and one to Cambridge. And lo, I got to read mathematics at St. John’s College, Oxford. And boy was it different.

A few things changed when I got to Oxford. First, I went from being one of the brightest kids in school (without really having to try) to being middle of the pack at best. This forced me to reevaluate my sense of who I was and what defined me. Also, many of my fellow students studying mathematics, as well as the lecturers and tutors, were - how can I say this - really nerdy and not cool. Now God knows I was never cool myself in high school but I aspired to be, and I think I had the core ability to be social and funny, I just had to grow into it and that process really started once I was at university. I remember having to spend time together with some of my fellow mathematicians and tutors (one in particular) and feeling really awkward in their company. The tutor hardly spoke. I absolutely felt that these were “not my people” and thus began the process where I started to drift away from mathematics. I couldn’t love the subject anymore because to do maths at this level was to be like these people.

I gravitated more to some of the students who were studying engineering, physics and chemistry. They were still smart and geeky (geeky is NOT the same as nerdy) but more fun to be around. I also very much enjoyed the company of those who were studying a whole variety of completely different things (history, english, geography, psychology, philosophy, law, …) as I was exposed to these people on a daily basis. More on this below.

Now one of the great things about being at Oxford is the fact that you are a member of a college as well as a member of the university. There are 38 colleges that collectively make up the University of Oxford with each college being an independent institution with its own history. My college (St. John’s) was founded in 1555 and housed about 400 undergraduate students, 250 postgraduate students and 100 academic staff. The students and staff are drawn from all academic disciplines and so life in college is a wonderful mix of personalities, ideas and intellectual points of view. The college bar, when evaluated purely as a bar, was bloody aweful but it was the place where this wonderful mix of people hung out and where you could find people discussing and debating all aspects of intellectual pursuit as well as a variety of trivia and stupidity (which was just as awesome). It also served the absolute best cheese and ham toasties, and baked potatoes with cheese and beans. Then there was the Junior Common Room (or JCR) which would hold regular meetings and debates, well lubricated by booze provided by the JCR Pratt (an official position on the JCR Committee who’s responsibilities included being a pratt and getting the beer in for JCR meetings). I recall one debate in particular where great energy was expended in resolving the matter of whether Mr Nathan Byers (the JCR Pratt at the time) should change his name (by deed poll) to Nathan-Madonna Byers in honor of the “Queen of Pop”. We supported the motion and he did actually go ahead and change his name. What a stand-up fellow.

Anyway, back to the main point of this post. Just being at St. John’s was much more interesting than studying mathematics. Not to say that I didn’t want to study and learn, I did. Which brings me to my next point.

Undergraduate teaching at Oxford is done in two parts. I’ve already described the college system and the fact that when you are at Oxford you are a member of a college as well as a member of the university. The university as a whole still exists of course and there are university departments for each subject area. All tutors (professors) are a member of a college but also a member of a university department. The departments organize and teach the curriculum for a subject area and set exams. It’s your performance in these final exams that ultimately determine your class of degree. Teaching consits of group lectures at the department level augmented by small group tutorials (in my case it was two students to one tutor) with your local college professors within college. This all sounds great and it would be if all of the lecturers and tutors were good teachers. Unfortunately, in my experience and especially in Mathematics, often they were not.

An aspect of the teaching of mathematics at the undergraduate level is that it is presented in a fundamentally different way than in high school. A degree of rigor is introduced that represents a step change in the way things are taught. There is much less of a focus on the student developing an intuitive understanding of the concepts and instead things are presented in terms of a series of “definition, theorem, proof” cycles. This was something that I had not seen before and was quite hard for me to come to terms with. This is what I meant when I said that my high school teachers didn’t prepare me to study mathematics at a higher level. I learnt from several of my fellow students at Oxford that they had been introduced to some degree of rigor, and this style of presentation, in school and it really helped them. I guess that’s the difference between a state school in a nondescript provincial town and a fee-paying private school. Sigh.

I stuggled to grasp the new presentation of the subject. Often the rigor seemed pointless (“But that’s just obvious!”) and othertimes it obfuscated and hid the concept that was being introduced.

Given time it became more familiar and looking back now I absolutely understand and appreciate the focus on rigor and the logical construction of things. At the time though I was frustrated and didn’t feel like I could ask for help in getting up the learning curve since my tutors felt unapproachable. It was yet another factor that tainted my orignal love of the subject.

I ultimately graduated from Oxford and earned degree in mathematics with a good enough grade (a 2:1). The competitive high school kid I used to be would never have been satisfied with anything less than a first but I wasn’t that kid anymore. I just wanted to get on with the next phase of my life and that I did.

Some years later I read a book on mathematics written for the layman and I once again saw something in it that was beautiful. I read more, and especially read a lot more about the history of mathematical discovery and the iconic figures who developed much of what we take for granted today. The historical context was fascinating and the presentation, geared to the non-academic, emphasised the concepts and the consequences first and foremost. I ultimately came to the conclusion that you can’t study math in isolation, it’s too sterile and without meaning. By taking a leisurely tour through its history and landscape I rediscovered my love for it. I’ve since been motivated to try to re-learn much of what I was first exposed to back at university and also I now feel a personal mission to help others to learn math the “right way”, so that they can see it and grok it while they also learn it properly.

I commit to writing posts about math as I continue my journey of rediscovery through its gentle rolling hills.

On Concurrent UPSERTS

2017-11-30T00:00:00+00:00

Introduction

It’s a very common usecase to have to either INSERT a new row into a table or UPDATE an existing row depending on whether the row already exists. This logic is commonly referred to as an UPSERT. Let’s see how we can handle this in T-SQL.

We assume that we have this table …

CREATE TABLE dbo.Foo (
  fooId int          NOT NULL,
  stuff varchar(256) NOT NULL

  CONSTRAINT Foo__PK CLUSTERED (fooId)
);

Some naive T-SQL code to handle the UPSERT might be …

IF EXISTS ( SELECT * FROM dbo.Foo WHERE fooId = @fooId )
  UPDATE dbo.Foo
     SET stuff = @stuff
   WHERE fooId = @fooId;
ELSE
  INSERT dbo.Foo ( fooId, stuff )
  VALUES ( @fooId, @stuff );

This will work fine for a single connection but with multiple connections and high concurrency this will start to fail frequently with primary key violations.

There are various ways to handle this badly. If you are interested in these then I encourage you to read several of the articles listed in the further reading section below. This one in particular summarizes things very well. However, if all you care about is how to do it well then I offer the following solution (from here).

Solution

I recommend using a MERGE statement With HOLDLOCK.

MERGE dbo.Foo WITH (HOLDLOCK) AS target
USING (SELECT @fooId AS fooId, @stuff AS stuff) AS source
ON    source.fooId = target.fooId

WHEN MATCHED THEN UPDATE
  SET target.stuff = source.stuff

WHEN NOT MATCHED THEN
  INSERT ( fooId, stuff )
  VALUES ( @fooId, @stuff );

No transaction is required here since MERGE is an atomic statement. MERGE takes out a key update lock by default so we don’t need to use an UPDLOCK hint (as is the case with some other possible solutions). We do need a HOLDLOCK hint though in order to ensure that SQL Server doesn’t release the key update lock before the INSERT.

References and Further Reading

On Things I Hate

2017-11-29T00:00:00+00:00

“Not that I condone vandalism, or any -ism for that matter. -Isms in my opinion are not good. A person should not believe in an -ism, he should believe in himself. I quote John Lennon, ‘I don’t believe in Beatles, I just believe in me.’ Good point there. After all, he was the walrus. I could be the walrus. I’d still have to bum rides off people.”

Anonymous

On Floating Point 2.0

2017-11-29T00:00:00+00:00

I recently got hold of a copy of a very interesting book all about the development of the early PC game Wolfenstein 3D. Rather than being a history of ID Sofware, and the team that went on to develop the seminal games Doom and Quake (for which see here), it’s more of a review of the Wolfenstein 3D codebase (which is open source and available on GitHub) and a presentation of the challenges that were overcome in developing a groundbreaking 3D game for the PC with typical specs as of the time: i386, VGA graphics, Soundblaster sound card, running MS-DOS with perhaps 2MB RAM.

A PC of that era had significantly more CPU horsepower (in terms of MIPS) than any of the contemporary games consoles and that was what made it appealing as a target platform for a 3D game. The nature of the segmented memory model that DOS imposed (as a consequence of being tied to Real Mode on the 386), the architecture of the VGA graphics card and the lack of hardware floating point on the 386 presented lots of challenges to the project, and the strategies that Carmack and co. came up with to make it all work are ingenius and very interesting. It’s a good read … for those of a more geeky persuasion anyway. People like me, and - I suppose - you too dear reader.

All this is very interesting but not really the subject of this blog post. This post is a follow up to a previous one I wrote about the floating point representation of real numbers and was inspired by a description of floating point from this book that was unlike any that I have seen before. It represented a different way to think about that particular numeric format and, I think, is a very helpful way to try to visualize (and understand) how it was designed. I also realized that I’d left out descriptions of the “special” floating point values in my previous article and I needed to remedy that. So here we go. Onward!

The Traditional View

The traditional description of floating point representation involves the presentation of the following pattern of the 32 bits in a regular float in memory …

… and the following expression involving a sum of powers of 2 …

\[value = (-1)^{s} \times (1 + m) \times 2^e\]

where $ s = b_{31} $, $ e = e_{biased} - 127 $, $ e_{biased} = \sum_{i=23}^{30}{b_i2^{i-23}} $ and $m = \sum_{i=22}^{0}{b_i}2^{i-23}$.

It then proceeds to inform you that one bit of memory is used to indicate the sign of the number (the value $s$ in the expression, where 0 indicates positive and 1 indicates negative), eight are used to represent the biased exponent (the value $e_{biased}$) and 23 are used to store the digits of what is known as the mantissa where there is an assumed leading 1 bit and the stored bits are assumed to be the part to the right of the binary point (the equivalent of the decimal point in the decimal representation of a real number). These are bits $b_i$ for $i$ in $[22, 0]$.

The Alternative View

An alternative spin on things is to observe that the floating-point representation of a number is an approximation to a given real number $R$ as follows …

First we note whether $R$ is positive or negative and save that in a value (which we will call $s$) where positive will be stored as $0$ and negative will be stored as $1$
Then we determine which two consecutive powers of $2$ bound $|R|$ (the absolute value of $R$) above and below. Let’s denote these two values as $2^e$ and $2^{e+1}$.
Next we divide the difference between $2^e$ and $2^{e+1}$ into $N$ equal parts and find the number $n$ such that $2^e + \frac{n}{N} <= |R| < 2^e + \frac{n + 1}{N}$
The floating point approximation to $R$ is parameterized by the values $s$, $e$, $N$ and $n$

So we basically map out a comb of $N$ discrete values between each two consecutive powers of $2$ (within a range of such powers, $2^{e_{min}}$ and $2^{e_{max}}$) and approximate $|R|$ as the closest tooth of the comb equal to or less than $|R|$.

I hope you can see that the accuracy of this approximation will depend on $N$. The larger the value, the finer grain the divisions between the bounding powers of two will be and the closer we can get to the actual value of $|R|$. I also hope you can see that the range of possible values we can approximate (from smallest to largest) will depend on $e_{min}$ and $e_{max}$.

Examples

Let’s look at some examples to illustrate things. We’ll assume that $N$ is 8 for now and look at how we would represent the real numbers $0.05$, $6$ and $50$.

First, $0.05$. This number falls between $2^{-5} = \frac{1}{32} = 0.03125$ and $2^{-4} = \frac{1}{16} = 0.0625$ and the gap between the teeth of our comb will be $\frac{2^{-5}}{8} = \frac{1}{256} = 0.00390625$.

We can see that we can’t represent $0.05$ perfectly and end up using $\frac{3}{64} = 0.046875$ as an approximation.

Next, 6. This number falls between $2^2 = 4$ and $2^3 = 8$ and the gap between the teeth of our comb will be $\frac{2^2}{8} = \frac{1}{2} = 0.5$.

In this case we can represent $6$ perfectly and so we do.

Finally, 50. This number falls between $2^5 = 32$ and $2^6 = 64$ and the gap between the teeth of our comb will be $\frac{2^5}{8} = 4$.

Again, we can’t represent $50$ perfectly. The best we can do, within the bounds of the parameters we have chosen, is to use $48$ as an approximation.

Mapping Between the Two Views

How does this view map to the traditional view of floating point and to the bit representation that we saw before?

Well, the sign flag $s$ should be obvious; it maps to the sign bit.

The value $N$ from the new view is equal to the maximum value that can be stored in the mantissa bits of the bit representation. In a regular 32 bit float there are 23 bits to store the mantissa. The maximum (unsigned) integer value that you can store in 23 bits is $2^{23} = 8{,}388{,}608 $ and so, in a 32 bit float, the range between consecutive powers of two is quantized into $8{,}388{,}608$ parts.

Finally we need to think about the range of possible values for the first power in the pair of consecutive powers of $2$. We referred to this before as $[2^{e_{min}}$, $2^{e_{max}}]$. In a 32 bit float, 8 bits are used to store a biased exponent which can take any value from the range $[0, 255]$. This is a biased exponent and the actual exponent (in the sum interpretation of float) is equal to $\texttt{stored_exponent} - 127$. Thus the actual exponent can take any value in the range $[-127, 128]$. Therefore our pair of consecutive powers of $2$ can be anything from $[2^{-127}, 2^{-126}]$ to $[2^{128}, 2^{129}]$.

Remember the traditional view …

\[value = (-1)^{s} \times (1 + m) \times 2^e\]

where $e = e_{biased} - 127$, $e_{biased} = \sum_{i=23}^{30}{b_i2^{i-23}}$ and $m = \sum_{i=22}^{0}{b_i}2^{i-23}$

Here’s the alternate view …

\[value = (-1)^{s} \times (2^e + \frac{n}{N})\]

where $n = \sum_{i=0}^{22}{b_i2^i}$ and $N = 2^{23}$

Some Observations

The choice of $N$ is critical and represents the granularity of the quantization that we impose on the ranges between consecutive powers of $2$. The quantum is equal to $\frac{1}{N}$.

The mantissa $m$ represents how many quanta we add to the first power of $2$.

The choice of storage size for $e$ is critical and dictates the extremities of the range of values that we can represent.

We always use the same granularity of quantization (the same number of bins) regardless of the power of $2$ at the start of the range. This means that the absolute size of each bin (our quantum of value) grows the larger the number that we are representing. It also means that the relative size of each bin, relative to the starting value of the range remains the same for all ranges though. We can see this in the following image …

Here we are breaking each range into $16$ bins. Each range is progressively wider but the number of bins remains the same. Each bin between $1$ and $2$ is wider than each bin between $\frac{1}{8}$ and $\frac{1}{4}$.

Special Values

Floating point uses a few special values. These are $0$, $\texttt{Infinity}$, $\texttt{NaN}$ (short for Not a Number) and Denormalized Numbers; and there are actually two kinds of $\texttt{NaN}$ value: Signalling and Quiet. More on the difference between Signalling and Quiet in a moment. For now let’s look at the representation of these values.

$0$ is indicated by $e_{biased} = 0$ and $m = 0$. This leaves two possibilities for the sign bit and so there are actually two values $+0$ and $-0$.

$\texttt{Infinity}$ is indicated by $e_{biased} = 255$ and $m = 0$. Again the sign can be $0$ or $1$ and so there are actually two values $+\texttt{Infinity}$ and $-\texttt{Infinity}$.

There are lots of $\texttt{NaN}$ values. Any value with $e_{biased} = 255$ and $m != 0$ is considered $\texttt{NaN}$. Signalling $\texttt{NaN}$ values have the most significant bit of $m$ (i.e. $b_{22}$) set to $0$ and Quiet $\texttt{NaN}$ values have it set to $1$.

Signalling vs. Quiet …

Various floating point operations can generate an invalid result (i.e. dividing by zero) and such operations will return a value of $\texttt{NaN}$. Whether the operation results in a Signalling $\texttt{NaN}$ or a Quiet $\texttt{NaN}$ gets into the arcana of floating point unit (FPU) implementation and is beyond my current knowledge. I can say though that if an operation results in a Quiet $\texttt{NaN}$ then there is no indication that anything is unusual until the program checks the result and sees the $\texttt{NaN}$. That is, computation continues without any signal from the FPU (or library if floating-point is implemented in software). However a Signalling $\texttt{NaN}$ will immediately produce a signal, usually in the form of exception from the FPU. Whether the exception is thrown depends on the state of the FPU.

Denormalized Numbers …

These are represented by $e_{biased} = 0$. This state indicates a slightly different interpretation of the rest of the bits in the value. Denormalized numbers are a special case used to smooth out the progression of represented numbers into and through $0$. When $e_{biased} = 0$ we use this formula for the value …

\[value = (-1)^{s} \times (m) \times 2^{e + 1}\]

… as opposed to the regular …

\[value = (-1)^{s} \times (1 + m) \times 2^e\]

Or with our alternate view …

\[value = (-1)^{s} \times (\frac{n}{N})\]

… as opposed to …

\[value = (-1)^{s} \times (2^e + \frac{n}{N})\]

One Final Picture or two, or three …

To close let’s imagine a hypothetical one byte floating point format with 1 bit for the sign, 3 bits for the biased exponent and 4 bits for the mantissa. In this format, $e_{biased}$ can range from $0$ to $7$ and $m$ can range from $0$ to $15$. We can show all the possible values in a grid, as the rational values …

… and as the corresponding decimal values …

The values above include the denormalized numbers. To see the smoothing effect that these have let’s look at a picture showing the values of our representation near $0$ with and without denormalization. First without …

… and now with …

Much smoother.

Other Resources

Lots of people have written about floating point online. I have no illusions that my witterings are especially insightful or worthy. I have cribbed much of my knowledge from other places on the web actually. Here are some links to such resources. Go read more on the subject.

On Tech Conferences

2017-11-26T00:00:00+00:00

For some time now I’ve wanted to attend a big tech conference. Part of the motivation was to attend a large event but the main part was to be exposed to a concentrated collection of information and learning opportunities. I finally managed to remember to bring a proposal to my employer and I got approval to attend the 2017 PASS Summit in Seattle in November.

PASS is the Professional Association for SQL Server. It’s a community organization for users of the Microsoft Data Platform (SQL Server and related technologies) that “facilitates member networking and the exchange of information through local groups, online events, local/regional events and international conferences”, or so their blurb says. They hold an international summit each year, typically in Seattle, which is sponsored by Microsoft and other vendors of tools and technologies related to the Microsoft Data Platform. This was the event that I was going to attend.

The main conference was held over three days (Wed-Fri) but there were two days of pre-conference events to be held on the Monday and Tuesday as well. On these days one could attend a full day focused seminar on various topics. Since I was going to be flying all the way from New York to Seattle for this thing I thought that I may as well make the most of it and booked myself into the pre-conference sessions as well.

PASS has historically been a conference for DBAs, Data Architects and BI (an oxymoron?) professionals. More recently though they have been trying to include developers in their ranks and the tracks on offer at the summit (and pre-conference) reflected that. I chose to attend a full day on “Modern Web API Design” on Monday and another full day on “Entity Framework” (the Microsoft full-fat ORM framework) on the Tuesday. The Tuesday session was ultimately cancelled due to the fact that the presenter’s inbound flight was cancelled due to bad weather on the East Coast and I ended up attending a full day workshop on “Expert Performance Tuning” instead.

These pre-conference sessions were interesting and well attended but come the first day of the main conference (on Wednesday) I realized just how many people were actually going to attend this thing; literally thousands. The event was held at the Washington State Conference Center in downtown Seattle, a huge facility that we took over for the week. Catering for this many people was a military exercise and the food was about as good as you would expect. I suffered the breakfast buffet for a few days before I finally gave it up and just grabbed some good coffee (this was Seattle after all) and pastries in the hotel each morning.

One other colleague from work was attending the conference with me but we were exploring different tracks and although we did reconnect regularly our paths didn’t cross too much. Every session, meal and coffee break was an opportunity to meet other SQL Server professionals and users and meet them I did, many of whom were deeply knowledgeable. It was a truly international event with a big European contingent and I met people from the UK, Denmark, Sweden, France & Germany as well as all across the US. As expected, the attendees were predominently DBAs but I did meet a fair share of fellow developers as well.

I must say though that sterotypes aside the PASS Massive was an eclectic mix of people with impressive diversity, overwhelmingly of rude health and sartorial good taste. The beards though … what can I say? Hirsuite is not a sufficiently descriptive word.

During the main conference there was a “keynote” presentation each morning followed by a series of roughly one hour long presentations running in sixteen parallel tracks. At any time you had a choice of sixteen things to see; a lot of content to choose from. The presentations were all delivered by members of the PASS community and on the whole were not marketing pieces for related products and tools, which can be the case at some conferences. Most of the sessions were interesting and useful to me and only a few were disappointing. Clearly I only managed to experience a sixteenth of the material that was on offer but I should be able to access slides and recordings of all of the presentations in due course. It was definitely worth my time to attend, although I’m glad that I wasn’t the one paying for the flights, hotel and conference fee …

Of course, Microsoft are the largest sponsor of this event and I would say that it is the biggest single conference for SQL Server that there is, although there are bigger conferences for Microsoft technologies in general, and for Microsoft platform developers in particular (Ignite, Build, etc.). As such there were some significant new releases for the Microsoft data platform that were showcased during the week, SQL Server 2017 in particular. In addition to several new features, the 2017 release of SQL Server brings full(ish) support for Linux as a platform and there were some fascinating presentations on the details of how that port was achieved. Psst: It wasn’t a port. More on that in another blog post.

Here are some of my takeaways from the summit experience.

Notable Quotes

“If you want the data model to be simple then go out and make the world simple, then come back to me”

“SQL Server does key-value, it’s just a table with two columns”

Notable presentations

Hacking SQL Server

Of course you should protect against SQL Injection attacks
Encrypt your connections and use certificates to authenticate servers in order to prevent man in the middle attacks and TDS inspection/injection
Divide by zero exceptions can facilitate some clever brute force attacks against encrypted data. If you know the types of the fields in a table (and if I recall correctly the system tables are not encrypted so you can query for them) then you can create successive queries of the form SELECT 1 / (testValue - fieldName) FROM tableName that explore the space of possible values of each field and wait until you get a divide by zero exception. Once you do then you know that the field value is equal to testValue.

Data Architecture and NoSQL

Data models still matter in the NoSQL world
There is no schemaless data
Data patterns have existed for a long time and are just as relevant today, it’s just that now we have more database systems within which to express them, especially true graph databases

Lightweight ORMs

Entity Framework is not the only game in town when it comes to ORMs for C#. Dapper is simple (based on POCOs), lightweight and fast. I will be giving it a try.

SQL Server Service Broker

It has a bad rep and is not “cool”
However it is very powerful and has a legitimate tool in many workflows
We should all keep it in our toolboxes

SQL Server Client Tools

There’s a new Electron-based (i.e. like VS Code) cross-platform client app for SQL Server called SQL Server Operations Studio. It’s not fully featured but is featured enough, lightweight and fast. It’s a real alternative to SSMS for general querying and database management.

SQL Server on Linux

Works and works well. It’s the same binaries that run on Windows.
When SQL Server runs on Linux it thinks it’s running on Windows
It installs simply and quickly. Far quicker than the install on Windows …
Performance is basically the same
You can run it in Docker and that makes for a very compelling DevOps story for dev/testing
I wouldn’t consider running it in prod right now (HADR is not so rubust yet) but the Docker use case is fascinating and I will definitely be looking at that for dev/test

Notable Unplanned Events

Bob Ward (Principal SQL Server Architect at Microsoft) was in the middle of his presentation on how they conceieved of, and delivered, a version of SQL Server that runs on Linux when his laptop started installing automatic updates for Windows which couldn’t be cancelled. It then proceeded to reboot (several times) while it proceeded with the update. His presentation was completely hosed and morphed into a (still very interesting) Q&A session. I must say that he handled the whole affair with some grace.

Later I found this Haiku which I profer in his honor …

Random Observations

On the PASS Summit …

I am not old relative to the DBA community but there are a bunch of super smart and really knowledgeable people out there who are a lot younger than me. So … I still feel old.
Brent Ozar is a funny dude; and he is the Stig.
Erland Sommarskog may write very informative articles but, in person, he’s pretty dull. Sorry Erland.
All .Net/ C# references I saw were to .Net Core and all demos were using VS Code and .Net Core. It’s the future.
I didn’t win any of the vendor raffles. Life is so unfair.
Why does a conference of this size only choose to lay on coffee stations in the morning and at lunch?!? Don’t they understand that tech workers need regular cafination in order to function?

On Seattle …

It has cool out of the way jazz restaurants
It has great dive bars that play great music
It has great coffee
And I thought London had a lot of Hipsters
It has great craft beer
It’s very progressive

On Mac vs PC

2017-11-18T00:00:00+00:00

I’ve been a PC person for years. At work I only ever used (and use) PCs, although I worked with remote VMS (yes, that VMS) and Linux systems all the time. A few years back though, I bought a MacBook Pro 13 in order to get access to Xcode and try my hand at IOS development. Thus was my hands on intro to the world of Mac …

I have come to love that MacBook. It was a long and windy courtship full of frustration as I had to unlearn years of muscle memory based on Windows norms. I still struggle with that today but I’m better. The one thing that has led me to become a fully paid up member of the cult of Mac though is that trackpad.

The trackpad is sublime. It’s like glass, like silk. The mouse pointer glides around the screen and the two finger scrolling is a pure joy. No PC laptop has come close to matching the feel of it and believe me I’ve looked and tried a bunch. There’s something truly magical about the fact that Apple controls the whole package: hardware, drivers, OS, apps. They can, and do, focus on the tiniest of details and I’ve decided that those details matter. I can’t stand to use any other laptop now.

Desktop computers are different. With a good external mouse the experience of using Windows on a PC can be as good. I have yet to find a Windows laptop with a trackpad to match though.

And so, now that I find myself wanting to get a new power laptop for home use (for dev but also for general stuff) I have to go with a new MacBook Pro 15. I must say that the keyboard doesn’t feel great as yet. I hope I’ll get used to it. Thinkpads are still the gold standard for laptop keyboards I’d say. But that trackpad is still like butter, and it’s sooo much bigger now. And the new TouchBar is intriguing.

I’m done. I’m a fanboy. I don’t care that they are the most expensive laptops. They are the best laptops. By far.

On Getting Back on the Horse

2017-11-18T00:00:00+00:00

If you don’t keep doing something regularly then you will not do it regularly and soon that new habit will add up to weeks, months and even years. I was so motivated to start this blog back in late 2014, and I kept it up for a while, but I didn’t force myself to stick at it and soon it became an afterthought. Then boom … two years pass since my last post. Life is a succession of habits. It’s hard to develop good habits and it’s so easy to fall into bad habits.

When you recognize a bad habit though - or the absence of a good habit that you should have - that’s the first step to changing it. And so … let’s try to change. Let’s give this blog a new lease of life. I am recommitting to this and will endeavor to write posts regularly starting today.

A good way to get started with something is to change your environment, and so with this site. I was hosting this on a WordPress instance with Go Daddy but I was never happy with that experience and the site’s presentation. I don’t know exactly how the WordPress instance was hosted but the site was slow to load on the first request and wasn’t especially responsive thereafter. That suggests some kind of virtualization with minimal resources.

An old colleague of mine keeps a blog and it was a post there that introduced me to a tool called Jekyll which allows you to statically build a blog site from a bunch of markdown files. I checked that out, liked what I saw and also learnt that Jekyll is natively supported by GitHub via a facility they offer called GitHub Pages. You can maintain a repo at GitHub that contains a Jekyll project and GitHub will automatically build and publish the site as you check in changes to your repo. This works pretty well and is the new home of this little corner of the Internet.

As part of the move I’ve broken away from WordPress and its stock themes and I’ve taken it upon myself to own all the HTML, CSS and site design myself. This is something that I have dabbled with before but never used in earnest. I’ve learnt a lot as a result and those learnings will become a post at some point soon. The site design will evolve I’m sure but the great thing about Jekyll is that the source for the site builds to static files that can then be served up with minimal overhead from the web server environment. As such, the site is very snappy to load and use.

So here we are, back on the horse. Now I just need to learn how to ride this damn thing. I pull on the tail and it goes right?

On pythagorean cups

2015-12-29T00:00:00+00:00

Is it magic or is it science?

Don’t be greedy now. Fill your cup but not too much.

On LEGO guns

2015-09-12T00:00:00+00:00

More fun with LEGO^®. This time we make a gun, and a badass one too.

On How to Tie Your Shoes

2015-07-08T00:00:00+00:00

A video post this time. How do you tie your shoes? Have you actually even given this any thought since you learnt when you were a little kid? There’s always room to improve.

On How Legos is Wrong

2015-07-08T00:00:00+00:00

For the last time, it’s LEGO^® damnit! A LEGO^® set generally contains many LEGO^® bricks although most people, as I know from personal experience, just end up with a big old box of LEGO^® that gets tipped out on to the floor for play. And then when you step on a LEGO^® brick in your bare feet it hurts like hell and you hop around for a while desperately trying not to land on any more LEGO^®.

On Group Theory and LEGO

2015-07-08T00:00:00+00:00

Another video post! This time involving a Rubik’s Cube and a bunch of LEGO^®. By the way, anyone who says Legos should be taken out back and shot. Only Americans say Legos and they are wrong, about many things. Although they are right about some things too; but that, dear reader, is a post for another day.

On useless stuff

2015-05-16T00:00:00+00:00

What do you do when you get an Arduino starter kit as a gift?

You make a useless box! Yes, I know you can make one of these with just a servo motor and two switches but you can also program an Arduino to do it. Why do it the simple way when you can add complexity?! Enjoy!

On Computers

2015-03-20T00:00:00+00:00

My First Computer

At the risk of giving away my approximate age I would like to present my very first computer, the Sinclair ZX81; and what a beast it was with its Z80A 8-bit 3.5 MHz CPU, 1kB of on board RAM, 8kB of ROM containing a BASIC interpreter, monochrome UHF video output (supporting up to 64 x 48 pixels of graphics mode via 2x2 block characters) and the ubiquitous (at the time) external audio tape data storage. Ah, those were the days. How could one resist such awesomeness?

My Dad bought this for me around 1981/82 I think. This is where I first started to code, in BASIC, although my programs did nothing more sophisticated than print text to the screen. You can’t do much with 1kB of RAM so, in short order, we added a RAM expansion pack to bring us up to a whopping 16kB. You had to be careful not to jostle things though since the edge connector was a bit delicate and too much movement could crash the machine. That was a bit hard given the nature of the keyboard …

A Step Up

My next computer was a huge step up in power and sophistication: an Acorn BBC Micro model B. I’ll have you know that these were the mutt’s nuts back in the day.

This was an expensive device as I recall; and I chose to get it, at age 13 or so, as opposed to go on a foreign exchange trip to the USA with school. And that, dear reader, tells you everything you need to know about the sort of child that I was. It had a 2 MHz 6502 CPU, 32kB of RAM and multiple graphics modes including colour (yes, that U is supposed to be there - this was a computer made at the behest of the BRITISH Broadcasting Corporation remember). I wrote more BASIC code here, mostly just drawing colored geometrical graphics on the screen as I recall. I also remember my parents buying me a magazine called INPUT which I collected and studied (somewhat) for a year. More significant though were the games, including such gems as Chuckie Egg and the seminal Elite. I never did make it to Elite as I recall, just Deadly. Mostly via trading narcotics and slaves … a great education for a young boy.

Chuckie Egg. Fiendishly addictive.

Elite! 3D wire-frame graphics with hidden line removal. Groundbreaking at the time.

School Stuff

I remember that in the school library at Redmoor High School (this would be Junior High for the Americans out there) we had a few computers. We had a BBC Micro of course (on which we would sneak in a few games of Elite over lunch) but we also had one of these bad boys (a Research Machines 380Z) that I don’t recall doing much with to be honest …

RM 380Z. I used to observe this thing from afar, in awe of its raw computing power. It had floppy disks!

And I think we had a Commodore PET too. I had a cat at home myself.

It was around this time that I started going to a computer club in the evenings at the school where my Mum used to work. I didn’t go for long though as I recall. I wanted to learn to program in “machine code” but the chap who was running the club didn’t really know much more than COBOL so I was a bit disappointed. I do remember playing with some Logo robots though; and spending an hour typing in a printed program listing from a magazine only to then hear a tinny computer speaker squeak out the theme from Close Encounters of the Third Kind. Oh the wonder of it all.

The Games Machine

Then I got one of these, an Amstrad CPC 464. This was primarily used to play games on, including as I recall a poor knock off of Konami’s Track and Field arcade game. This was the era when the arcade was the place where the “real” computer games lived and kids used to hang out for hours on end watching each other try to “beat the machine”. Oh, them were the days.

Alan Michael Sugar Trading FTW! Back then noone knew that he’d be on the telly so much in the future.

I had me one of these bad boys too. You could customize all the buttons and it had a rapid fire switch too. I recall hacking said Track and Field game by setting the left leg run button to ‘x’ and the right leg run button to ‘X’ in the game and then programming the joystick rapid-fire function to press ‘shift’. Ah, such youthful sport.

Then there were the magazines. One publishing company seemed to have a lock on the cool kids with versions of basically the same format of mag for each of the popular games computers of the time (the Sinclair Spectrum, Commodore 64 and Amstrad). Here’s the Amstrad one.

It was basically just full of games reviews.

Games

Some games that I spent way too long playing back in the day …

The Hobbit - Where’s Thorin?

Manic Miner - In the Halls of the Mountain King …

Way of the Exploding Fist - Don’t try to kick me in the head, I’ll just crouch and punch you in the balls if you do …

Knight Lore - Ultimate, Play the Game!

Time Out

After a few years I lost interest in computer games and became embroiled in the world of Dungeons and Dragons. This was a great chapter in my childhood and led to me making some very good friends. There was nothing quite like some old school role playing with your mates around your parents’ dinner table. There’s another blog post that I’ll write one day.

Getting Back on the Horse

I didn’t really return to computers until my first job after university. I did use some of the machines in the college computer lab to write up my CV for job applications during my final year but the main thing I recall from that is having to learn how to use MS World on a Mac and it taking me ages. So much for the intuitiveness of Macs. I was thrust into the world of MS DOS with Lotus 123 (which seemed so natural and powerful) and Word Perfect (which I hated, I used to write letters in Lotus 123 … honestly). I got myself a Dell PC for home, because my neighbor worked for Dell and could get me a deal. I think it was a 486 20MHz with a 40MB hard drive. It came with Windows 3.1 which seemed so cool at the time.

After about a year of work I admitted that I hated my first job (we all make mistakes) and I realized that I needed to do something more cerebral where my smarts would get me somewhere. I ended up getting a lucky break and finding a job with a very young but very successful American company focused on delivering financial data and analytics to the institutional investment community. Clearly computers were an integral part of that and so I started to learn more and more.

I never studied Computer Science at university. Everything I now know I learnt on the job based on my own motivations. That was over 20 years ago. It’s been a fun ride.

Some Final Perspective

This has been a nostalgic walk down memory lane but also a graphic reminder of the power of Moore’s Law in that today (March 2015 as of this writing), for US$35, you get get a fully integrated computer the size of a credit card that is orders of magnitude more powerful than any of these old things. That’s progress for you.

What sort of computers will we have, and take for granted, in another 20 years time? I guess we’ll see in due course.

On Turning on RCS

2015-03-19T00:00:00+00:00

This is the second in a series of posts on Microsoft SQL Server. If you are the sort of person who doesn’t care about context and the logical flow of information then please, feel free to read on. However, I do suggest that you start your stroll through my mumblings on this subject at the beginning. It’s your choice though.

Read Committed Snapshot

Microsoft SQL Server offers a database-level setting called READ_COMMITTED_SNAPSHOT that controls whether data snapshots are used for transactions that run under the Read Committed isolation level. For a primer on the whole notion of transactions, isolation levels and the nature of this setting I direct your attention to the first in this series of posts. As mentioned in that previous article, turning on this setting can, in some edge cases, lead to a change in the behavior of transactions running under the READ COMMITTED isolation level. This is an edge case but it’s an instructive one to explore since it will give us a greater understanding of the nuance of row-versioning and snapshotting in the process.

Before we proceed I want to emphasise that running a database with the READ_COMMITTED_SNAPSHOT setting on is not a bad thing by any stretch of the imagination. In fact it’s a great feature to enable and will minimize contention in a database application. There’s no risk of inconsistent behavior among running transactions when it is on, but there is a risk of a behavior change from when it is on to when it is off, or vice versa. The risk is in having a live database that has been running with the setting off for a while and then turning it on. Some use-cases will see a behavior change when you do this. However if you are creating a new database then I would suggest that you enable it from the start.

Allow Snapshot Isolation

I should also remind you of the other, related, setting that SQL Server offers, i.e. ALLOW_SNAPSHOT_ISOLATION. When this is set to on then an additional transaction isolation level (SNAPSHOT) becomes available for use by clients. I would recommend that this setting should always be on and that clients should use it for any transactions that are modifying data in the database. In fact, use of this setting would mitigate the behavior change that I am about to describe. I’ll explain why at the end of the article.

Let’s still look at what could happen to unmodified T-SQL code, running under the default READ COMMITTED isolation level, before and after turning the READ_COMMITTED_SNAPSHOT setting on.

An Example of a Behavior Change after turning on Read Committed Snapshot

Let’s look at the behavior of a theoretical situation. Open a connection to an MS SQL instance (we will call this connection #1) and run this initial query:

-- Query 1 ---------------------------------------------------------------------

USE master;
GO

IF EXISTS (SELECT * FROM sys.databases WHERE name = 'MarblesTest')
BEGIN
  ALTER DATABASE MarblesTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
  DROP DATABASE MarblesTest;
END
GO

CREATE DATABASE MarblesTest;
--ALTER DATABASE MarblesTest SET READ_COMMITTED_SNAPSHOT ON;
GO

USE MarblesTest;
GO

CREATE TABLE dbo.Marbles (
  id INT PRIMARY KEY,
  color CHAR(5)
);
GO

INSERT INTO dbo.Marbles VALUES ( 1, 'Black' ), ( 2, 'White' );
GO

Note that the ALTER DATABASE statement to turn on READ_COMMITTED_SNAPSHOT is commented out and so the database will be created with that setting off (the default).

Now execute this query on the same connection:

-- Query 2 ---------------------------------------------------------------------
USE MarblesTest;
GO

DECLARE @id INT;

--
-- By default this transaction will run under the Read Committed isolation level
--
BEGIN TRAN
  SELECT  @id = MIN(id)
  FROM    dbo.Marbles
  WHERE   color = 'Black';
  
  UPDATE  dbo.Marbles
  SET     color = 'White'
  WHERE   id = @id;

The query will complete immediately. Now open up another connection to your MS SQL instance (we will call this connection #2) and run this query:

-- Query 3 ---------------------------------------------------------------------
USE MarblesTest;
GO

DECLARE @id INT;

--
-- By default this transaction will run under the Read Committed isolation level
--
BEGIN TRAN
  SELECT  @id = MIN(id)
  FROM    dbo.Marbles
  WHERE   color = 'Black';

  UPDATE  dbo.Marbles
  SET     color = 'Red'
  WHERE   id = @id;
COMMIT
GO

This query will block and sit executing until you take some further action. Now go back to connection #1 and excute this query:

-- Query 4 ---------------------------------------------------------------------
COMMIT
GO

This query will complete immediately, and once it has completed the other query (running on connection #2) will complete too.

-- Query 5 ---------------------------------------------------------------------
SELECT * FROM dbo.Marbles;

Now, in either connection run this query:

You will see this result:

id	color
1	White
2	White

Now, go back to connection #1, uncomment the ALTER DATABASE … SET READ_COMMITTED_SNAPSHOT … line from within Query 1 and run it again. This will drop and recreate the database, but this time with the setting on.

Now rerun the other queries on the different connections exactly as before. This time the final result will be:

id	color
1	Red
2	White

“Err, what!?” I hear you say. “How can that be?”

Let’s take a closer look at the queries from above, starting with query 2 (running on connection #1) …

-- Query 2 ---------------------------------------------------------------------
USE MarblesTest;
GO

DECLARE @id INT;

--
-- By default this transaction will run under the Read Committed isolation level
--
BEGIN TRAN
  SELECT  @id = MIN(id)
  FROM    dbo.Marbles
  WHERE   color = 'Black';
  
  UPDATE  dbo.Marbles
  SET     color = 'White'
  WHERE   id = @id;

First we’ll look at what happens when the READ_COMMITTED_SNAPSHOT setting is off. This query starts a transaction and then proceeds to issue a SELECT statement to determine the minimum id across all of the rows in the Marbles table that have a color of ‘Black’. Since the transaction is running with an isolation level of READ COMMITTED (the default) and the READ_COMMITTED_SNAPSHOT setting is off, then this tries to, and does, take out a shared lock on all of the rows that match the predicate, all one of them. The SELECT statement’s predicate selects just that one row, the row with id = 1, and then calculates the minimum id across that one row, which is obviously 1; and so we set @id to 1. The transaction then releases its shared lock as soon as the statement completes. Next it issues an UPDATE statement to set the color of the row (with id = 1) to ‘White’. This tries to, and does, take out an exclusive lock on that row and the UPDATE completes. The lock is not released yet however. It will be held until the transaction is committed, and this query does not commit the transaction. That comes later.

With the READ_COMMITTED_SNAPSHOT setting on nothing materially different happens (at least in terms of why we see this strange behavior). The first statement in the transaction (the SELECT) will not issue a shared lock in this case, instead it will read from a snapshot of the transactionally consistent row data as of the start of the statement. The second statement (the UPDATE) will still take out an exclusive lock as before and that lock will again be held until the transaction commits, which will happen at some point later.

Now let’s take another look at query 3 (running on connection #2) …

-- Query 3 - Annotated ---------------------------------------------------------
USE MarblesTest;
GO
 
DECLARE @id INT;
 
--
-- By default this transaction will run under the Read Committed isolation level
--
BEGIN TRAN
  -- With the READ_COMMITTED_SNAPSHOT setting off, this query will block here
  SELECT  @id = MIN(id)
  FROM    dbo.Marbles
  WHERE   color = 'Black';
 
  -- With the READ_COMMITTED_SNAPSHOT setting on, this query will block here
  UPDATE  dbo.Marbles
  SET     color = 'Red'
  WHERE   id = @id;
COMMIT
GO

Again, we’ll first consider what happens when the READ_COMMITTED_SNAPSHOT setting is off. The query starts a transaction and then proceeds to issue a SELECT statement to determine the minimum id across all of the rows in the Marbles table that have a color of ‘Black’. Since the transaction is running with an isolation level of READ COMMITTED (the default) and the READ_COMMITTED_SNAPSHOT setting is off then this tries to take out a shared lock on all of the rows that match the predicate. It can’t take out all of those locks though since the other transaction (running on connection #1) has an exclusive lock on one of the rows that this transaction wants a shared lock on. So, this transaction (on connection #2) blocks here and the first statement (the SELECT) will not run yet. Once we execute the commit statement back on connection #1 then that transaction releases its exclusive lock on the row it updated and the transaction (running on connection #2) can now take the shared lock on that row and proceed with its SELECT statement. Because this transaction is running as READ COMMITTED (meaning that it will see transactionally consistent data as of the start of each statement) then it will read the updated data written by the other transaction and thus will now see that both rows have a color of ‘White’. The minimum id value across the rows with a color of ‘Black’ is thus now NULL (there are no rows with a color of ‘Black’) and so @id is set to NULL. The subsequent UPDATE statement has no effect since there are no rows that match the predicate id = NULL. The transaction is committed and this query completes. The end result is that we have both rows with color = ‘White’.

With the READ_COMMITTED_SNAPSHOT setting on we see different behavior. The query starts a transaction and then proceeds to issue a SELECT statement to determine the minimum id across all of the rows in the Marbles table that have a color of ‘Black’. Since the transaction is running with an isolation level of READ COMMITTED (the default) and the READ_COMMITTED_SNAPSHOT setting is on then this statement does not require a shared lock and instead reads from a snapshot copy of the transactionally consistent data as of the start of the statement. This snapshot will contain the row data as it was before the other transaction (on connection #1) started, i.e. rowId 1 with a color of ‘Black’ and rowId 2 with a color of ‘White’. So the SELECT query’s predicate will select the one row with a color of ‘Black’ (rowId 1) and that will also be the minimum id of course. Thus @id will end up being set to 1. The subsequent UPDATE statement will try to take out an exclusive lock on the row with rowId 1 but will be unable to get it because the other transaction (on connection #1) is holding an exclusive lock on the same row. Once we execute the commit statement back on connection #1 then that transaction releases its exclusive lock on the row and the transaction (running on connection #2) can now take the exclusive lock and proceed with its UPDATE. Note that at this time the modification to rowId 1 (color now set to ‘White’) is committed. Because the transaction (running on connection #2) is running as READ COMMITTED then this statement will see a transactionally consistent view of the data as of the start of the statement (i.e. it will see rowId 1 with the modified color of ‘White’) but that doesn’t really matter since this statement is just going to go ahead and update the color of that row to ‘Red’. This it does. The transaction is then committed and the query completes. The end result is that we have rowId 1 with a color of ‘Red’ and rowId 2 with a color of ‘White’.

So there you go. Changing the READ_COMMITTED_SNAPSHOT setting can change the behavior of queries. Be wary. Having your databases run with READ_COMMITTED_SNAPSHOT on can provide real benefits to the concurrency of database queries and it is worth doing. Just make sure that you turn it on early in the life of your database (ideally at the start) so that clients do not become accustomed to the shared-lock based behavior.

With Snapshot Isolation

I mentioned above that if the client code were to use Snapshot isolation then this difference would not occur. Let’s look at that and explain why.

First let’s observe that this example involves overlapping multi-statement transactions that are modifying data. In order to ensure full isolation (the I in ACID) for these operations they should be running under an isolation level higher than the default level of READ COMMITED. One could argue that the above code is not guaranteed to work correctly for precisely this reason. According to the ANSI SQL standard, both client transactions should be running under the SERIALIZABLE level but, in SQL Server at least, that level involves excessive locking. The SNAPSHOT level provides the same guarantees without paying the excessive lock overhead. In fact the behavior (implementation) of SQL Server’s SNAPSHOT isolation level is basically the same as the behavior of other mainstream RDBMS engines like Oracle and PostgreSQL under the SERIALIZABLE level. Those engines fundamentally use an MVCC scheme based on row versioning and snapshots; there is no other way for them to work. This is in contrast to SQL Server which, by default, does not use row versioning/ snapshots and instead relies on locking to implement the requested transaction isolation level. SQL Server can be set to behave like Oracle/PostgreSQL though, by turning on both the READ_COMMITTED_SNAPSHOT and ALLOW_SNAPSHOT_ISOLATION settings and by using SNAPSHOT as the isolation level for multi-statement write transactions.

Let’s revisit query 2 (running on connection #1) from the example above …

-- Query 2 ---------------------------------------------------------------------
USE MarblesTest;
GO

DECLARE @id INT;

-- Explicitly set the isolation level
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;

BEGIN TRAN
  SELECT  @id = MIN(id)
  FROM    dbo.Marbles
  WHERE   color = 'Black';
  
  UPDATE  dbo.Marbles
  SET     color = 'White'
  WHERE   id = @id;

This query starts a transaction and then proceeds to issue a SELECT statement to determine the minimum id across all of the rows in the Marbles table that have a color of ‘Black’. This will read from a snapshot of the transactionally consistent row data as of the start of the statement. No shared lock is required.

The difference between this case and the example above (with READ_COMMITTED_SNAPSHOT on and running under the READ COMMITTED isolation level) is that this snapshot view of the data will persist until the transaction is committed. Any subsequent SELECTs to read from the same table will re-use the same snapshot. If the isolation level were READ COMMITTED then the snapshot would be discarded after the first SELECT and subsequent SELECTs would take a new snapshot of the table as of that time. This isn’t pertinent to the behavior that we are discussing in this example, since we are only issuing one SELECT, but it is worth noting.

The second statement (the UPDATE) will take out an exclusive lock and that lock will be held until the transaction commits, which will happen at some point later.

Now let’s take another look at query 3 (running on connection #2) …

-- Query 3 - Annotated ---------------------------------------------------------
USE MarblesTest;
GO
 
DECLARE @id INT;
 
-- Explicitly set the isolation level
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;

BEGIN TRAN
  SELECT  @id = MIN(id)
  FROM    dbo.Marbles
  WHERE   color = 'Black';
 
  -- This query will block here
  UPDATE  dbo.Marbles
  SET     color = 'Red'
  WHERE   id = @id;
COMMIT
GO

The query starts a transaction and then proceeds to issue a SELECT statement to determine the minimum id across all of the rows in the Marbles table that have a color of ‘Black’. This will read from a snapshot of the transactionally consistent row data as of the start of the statement. No shared lock is required. This snapshot will contain the row data as it was before the other transaction (on connection #1) started, i.e. rowId 1 with a color of ‘Black’ and rowId 2 with a color of ‘White’. So the SELECT query’s predicate will select the one row with a color of ‘Black’ (rowId 1) and that will also be the minimum id of course. Thus @id will end up being set to 1.

The subsequent UPDATE statement will try to take out an exclusive lock on the row with rowId 1 but will be unable to get it because the other transaction (on connection #1) is holding an exclusive lock on the same row. Once we execute the commit statement back on connection #1 then that transaction releases its exclusive lock on the row and this transaction can now take the exclusive lock and proceed with its UPDATE. Note that at this time the modification to rowId 1 (color now set to ‘White’) has been committed by the transaction on connection #1. This is where another aspect of the SNAPSHOT isolation level comes into play.

As well as extending the lifetime of data snapshots to that of the transaction as opposed to just the statement, under the SNAPSHOT isolation level SQL Server will check for multiple modifications to the same rows by different transactions and will not allow transaction B to commit if it has modified a row that another committed transaction (transaction A) has modified since transaction B began. This check is what will prevent the current transaction (running on connection #2) from setting the color of the row with rowId 1 to ‘Red’. SQL Server will detect this attempt and will immediately terminate the transaction with this error …

Msg 3960, Level 16, State 2, Line 15
Snapshot isolation transaction aborted due to update conflict. You cannot use snapshot isolation to access table
'dbo.Marbles' directly or indirectly in database 'MarblesTest' to update, delete, or insert the row that has been
modified or deleted by another transaction. Retry the transaction or change the isolation level for the update/delete
statement.

If the client running this query were to catch this error and then retry (as is suggested in the error message), and that retry were not to overlap with another transaction trying to modify the same row, then the second run of the above logic would not find any rows with color = ‘Black’, since the other transaction (on connection #1) already committed its change to set the only row with color = ‘Black’ to ‘White’. So for this run no rows would be returned from the first SELECT, @id would be NULL and the UPDATE would not happen. The upshot would be that once the two transactions (on the two different connections) had both completed successfully then the result would be the same as in the original scenario, with READ_COMMITTED_SNAPSHOT off, i.e. …

id	color
1	White
2	White

Conclusion

The example used here may seen a bit contrived, and it is, but use cases like this can and will occur in real world applications. The point of this article is not to warn anyone off from turning on READ_COMMITTED_SNAPSHOT for their SQL Server databases. In fact, as I have said above, I firmly believe that READ_COMMITTED_SNAPSHOT and ALLOW_SNAPSHOT_ISOLATION should both be on for all SQL Server databases, since by doing so you are just telling SQL Server to work like Oracle and PostgreSQL and not be a dusty old 1980s RDBMS that uses locking for everything. The real point of the article is to warn you that your applications probably have poorly written queries in them, that should be written to specifically use higher levels of transaction isolation for correctness but don’t; and there’s a small risk that the behavior of those queries may change if and when you turn on READ_COMMITTED_SNAPSHOT. These behavior changes will only happen for certain types of multi-statement transactions that are modifying data and overlap in their execution with other such transactions. However, it’s precisely because of highly concurrent workloads that you might be considering turning READ_COMMITTED_SNAPSHOT on.

Just be educated and be cautious.

On RDBMS

2015-03-19T00:00:00+00:00

I’ve been working with relational database technologies on and off for a big part of my tech career but over the last few years I’ve had the fortune (some may say the misfortune …) to be able to use a few of them in some depth. I’m currently involved in a heavy evaluation of RDBMS products (including Oracle, DB2, SQL Server, PostgreSQL and MySQL/MariaDB) as part of a standardization initiative, and I’m getting to learn a lot about all of them (their differences, commonalities and nuances). However, as of this writing, the one that I know best is Microsoft SQL Server (MS SQL).

Cognitive Friction

One thing that I’ve noticed as I’ve got up the learning curve with MS SQL, and helped others get up that same curve, is that there are some cognitive frustrations that experienced developers inevitably suffer when they first start working with SQL and relational engines. Developers naturally want to control the algorithm that is used to access data and perform a calculation, but that’s precisely what you DO NOT do when you write a SQL query. SQL is a declarative language, as opposed to an imperative one, and it’s the query optimizer in the RDBMS that actually “writes” the code to really execute the query. I have spent many an hour staring at execution plans and thinking to myself “Why is the fracking optimizer choosing to do it that way as opposed to the way that I want it to be done?”. There comes a point where you just have to let go and trust that the optimizer knows what it’s doing. This is when you also start to develop an appreciation of the difference between a good cost-based optimizer and a poor one. You also appreciate the critical importance of creating the appropriate constructs in your database (e.g. indexes, foreign keys, up-to-date statistics) to give the optimizer as much information as you can on which to base its decisions.

Transactions and the ACID Properties

Another cognitive hurdle is that of concurrency. An RDBMS is generally a multi-user system and has to manage and coordinate access to data across multiple client connection at varying granularities. It’s all part and parcel of ensuring that the database engine can support its ACID promise. Remember that one? Atomicity, Consistency, Isolation and Durability.

At the heart of this is the concept of a transaction. As Wikipedia says so well, a transaction symbolizes a unit of work performed within a database management system (DBMS) against a database, and treated in a coherent and reliable way independent of other transactions. They provide an “all-or-nothing” proposition, stating that each work-unit must either complete in its entirety or have no effect whatsoever. Further, the system must isolate each transaction from other transactions, results must conform to existing constraints in the database and transactions that complete successfully must get written to durable storage. Transactions are one of the key mechanisms through which a DBMS ensures that it meets its ACID contract. And speaking of that contract …

Atomicity

The atomicity property requires that each transaction be “all-or-nothing”; if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged. An atomic system must guarantee atomicity in each and every situation including power failures, errors and crashes. To the outside world, a committed transaction appears (by its effects on the database) to be indivisible (“atomic”) and an aborted transaction does not happen.

Consistency

The consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers and any combination thereof. This does not guarantee correctness of the transaction in all ways the application programmer might have wanted (that is the responsibility of application-level code) but merely that any programming errors cannot result in the violation of any defined rules in the database.

Isolation

The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e. one after the other. Providing isolation is the main goal of concurrency control. More on this below.

Durability

The durability property requires that once a transaction has been committed, it will remain so, even in the event of power loss, crashes or errors. To defend against power loss, transactions (or their effects) must be recorded in non-volatile memory.

Cognitive Friction Revisited

“So that’s all fine and dandy Alan, thanks for the background, but what has that got to do with this other cognitive hurdle that you say developers have to get over? What’s the thing that they need to let go of?” - Ed.

Why, I’m glad you asked. It’s locking, or rather their control (or lack of control) over it. Any developer with experience of multi-threaded programming will look at an RDBMS and automatically think that data will need to be locked to ensure transaction isolation. However they will quickly realize that they never explicitly issue locks on things themselves. As with writing the actual query execution plan, this is the job of the database engine as opposed to the job of the database developer. This lack of direct control can freak out many an individual, especially when they see queries blocking each other and potentially even getting into deadlocks. Just recently I saw one, very experienced, C++ developer throw his hands up and claim that a database was “broken” because it was experiencing frequent query deadlocks. He wanted to “take control” of its locking but could not. It was not until I helped him to debug why his queries were deadlocking, and educated him about transaction isolation levels (see below), that he finally calmed down.

This is another example of the need to “let go” and trust that the database engine will ultimately do the right thing, as long as you embrace declarative programming and specify the sort of isolation level that you require for a given query. That’s the real trick; knowing that you can control the required isolation level for a transaction, or set of transactions, and understanding what those isolation levels mean. It’s more declarative programming again.

Transaction Isolation Levels

MS SQL supports four transaction isolation levels by default with a fifth one (Snapshot) being available if a database-level setting is on. There’s also another database-level setting that controls the behavior of the Read Committed transaction isolation level. More on this in due course. Full details of the available levels can be found on MSDN but a quick description follows.

Read Uncommitted

Transactions running at this level can read rows that have been modified by other transactions but not yet committed. They also do not issue shared locks to prevent other transactions from modifying data read by the current transaction. Such transactions are also not blocked by exclusive locks that would prevent the current transaction from reading rows that have been modified but not committed by other transactions. When this option is set, it is possible to read uncommitted modifications (you can get “dirty reads”). Values in the data can be changed and rows can appear or disappear in the data set before the end of the transaction.

This is the least restrictive of the isolation levels and is not typically used unless application developers specifically do not need consistency in their data model.

Read Committed

Transactions running at this level cannot read data that has been modified but not committed by other transactions, thus preventing “dirty reads”. However, data can be changed by other transactions between individual statements within the current transaction, resulting in non-repeatable reads (a row returned from a previous query has a different value when queried again) or phantom data (different rows come back from a repeat evaluation of the same query).

This is the default transaction isolation level. Unless a different level is specified (via the SET TRANSACTION ISOLATION LEVEL command) then this is the level that will be used. The mechanism by which MS SQL achieves the semantics of this isolation level depends on the value of the database-level setting READ_COMMITTED_SNAPSHOT. If that setting is off (the default) then the semantics are implemented via locking (reads will take out shared locks on objects and will block writers but not other readers). However, if the setting is on then MS SQL will use a row versioning scheme in order to allow transactions to see a snapshot of the state of committed data as of the start of each statement. Shared locks will no longer be taken by read statements in the transaction and so other transactions, that need exclusive locks, will not be blocked and their modifications will be accommodated via the row versioning scheme.

Note that the life of the data snapshot is just for the statement. The next statement in a multi-statement transaction will see a new snapshot (as of the start of that statement), and so data committed by other transactions will be visible within the current transaction. Non-repeatable reads and phantom data are still possible.

Repeatable Read

Transactions running at this level cannot read data that has been modified but not yet committed by other transactions; and also, no other transactions can modify data that has been read by the current transaction until the current transaction completes. This is achieved via locking. Shared locks are placed on all data read by each statement in the transaction and are held until the transaction completes. This prevents other transactions from modifying any rows that have been read by the current transaction.

Note that although other transactions can’t modify (update or delete) rows that match the search conditions of statements issued by the current transaction (i.e. the non-repeatable reads problem is avoided), they can insert new rows that would match those search conditions. This means that phantom data is still possible if the current transaction were to re-run statements.

Because shared locks are held to the end of a transaction instead of being released at the end of each statement, concurrency is lower than the default READ COMMITTED isolation level. It’s a trade off. If you need the data isolation guarantees of this level then you have to use it, but there will be more blocking of other transactions.

Serializable

In transactions running at this level: statements cannot read data that has been modified but not yet committed by other transactions, no other transactions can modify data that has been read by the current transaction until the current transaction completes, and other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction; until the current transaction completes.

This is achieved via yet more aggressive locking. Range locks are placed on the range of key values that match the search conditions of each statement executed in the transaction. This blocks other transactions from updating or inserting any rows that would qualify for any of the statements executed by the current transaction. Thus if any of the statements in a transaction are executed a second time, they will read the same set of rows.

This is the most restrictive of the isolation levels because it locks entire ranges of keys and holds the locks until the transaction completes. Again, it’s a trade off. Transaction concurrency is lowest when this level is in use but sometimes you just need the isolation guarantees.

Snapshot

As mentioned above, this level is not available by default but is enabled via a database-level setting (ALLOW_SNAPSHOT_ISOLATION). Similar to the implementation strategy for the READ COMMITTED isolation level (when the database-level READ_COMMITTED_SNAPSHOT setting is on), MS SQL uses row versioning to facilitate snapshots of committed data and avoid shared locks. Different semantics to READ COMMITTED are provided though.

Any statement in a transaction running at this level will read the state of committed data that existed at the start of the transaction. Data modifications made by other transactions after the start of the current transaction are not visible to any statements executing in the current transaction. So effectively, the data snapshot lives for the life of the transaction as opposed to just one statement. Note however that the transaction can view changes made by itself. For example, if the transaction performs an UPDATE on a table and then issues a SELECT statement against the same table, the modified data will be visible.

Reads will not take out shared locks (as with the READ COMMITTED level when the READ_COMMITTED_SNAPSHOT setting is on) and so other transactions that want to write data will not be blocked. Also though, transactions writing data do not block other SNAPSHOT level transactions from reading data.

Snapshot Disambiguation

One of the things that I see people confusing a lot when they start using MS SQL is the difference between the READ_COMMITTED_SNAPSHOT setting and the SNAPSHOT transaction isolation level. People tend to think that READ_COMMITTED_SNAPSHOT is, itself, another transaction isolation level just like SNAPSHOT. It is not. It is a database setting that “changes the implementation strategy” of the existing READ COMMITTED transaction isolation level. If someone says that they are running a transaction as READ_COMMITTED_SNAPSHOT or using the READ_COMMITTED_SNAPSHOT isolation level then that is wrong, and the speaker should be corrected.

Executing a transaction under the READ COMMITTED isolation level is a statement (this is declarative programming remember) of the sort of isolation that you require for your transaction. The semantics of that isolation are as described above. By default MS SQL will achieve these semantics via the use of shared locks on the objects that each statement in your transaction is reading; and those shared locks will cause other transactions that want to write to the same object, to block since they will request exclusive locks on those objects. However, if the database-level READ_COMMITTED_SNAPSHOT setting is on, then MS SQL will achieve the required semantics via row-versioning and no shared locks will be taken. This will result in no contention with transactions that need to write data and is generally considered a GoodThing™.

Turning on the READ_COMMITTED_SNAPSHOT database-level setting is a significant step. It will cause MS SQL to physically change the way that it stores data pages on disk, since it will now store “versions” of rows. Additional information has to be added to every row that is written to a data page under a row-versioned scheme and the old versions of the rows will be stored in tempdb as opposed to the regular data pages. Your database will grow in size and will be more reliant on tempdb. The degree to which this is the case will depend on the pattern of access from your clients.

Turning on this setting can, in some edge cases, lead to a change in the behavior of transactions running under the READ COMMITTED isolation level. An example of such an edge case can be found here. Now, there’s nothing wrong with turning on READ_COMMITTED_SNAPSHOT, in fact I think that it should typically be used. There’s no risk of inconsistent behavior among running transactions when it is on, but there is a risk of a behavior change from when it is on to when it is off, or vice versa. If you determine that your database would benefit from row-versioning for the READ COMMITTED isolation level then turn it on early in the life of the database and then leave it on. If your database has been in place for years, and many client applications may have queries in place that use READ COMMITTED (which they will since it’s the default) then be careful since there is a risk, however small, that those clients will see a behavior change.

Optimism in Locking

We have been talking about row-versioning for a bit but let’s come back to locking. Locking is essential in order for an RDBMS to support a transaction’s requested isolation semantics when writes are involved, but that locking can either be done in a pessimistic or an optimistic way.

Under a “Pessimistic Locking” scheme an object (row, page, table, …) is locked immediately when a write lock is requested, while in an “Optimistic Locking” scheme it is only locked when the changes made to that record are updated. With pessimistic locking a write transaction will always succeed but under optimistic locking there’s a chance that another transaction might have written to the record first. When the first transaction then tries to commit it’s change MS SQL will detect the data inconsistency and cause the transaction to fail.

So, optimistic locking leads to the potential for write transactions to fail, and client applications will need to catch such failures and re-try, where it’s likely that the write will then succeed. Optimistic locking will provide for less blocking between transactions though and thus will allow client load against the database to scale better. It’s a trade off. I like to think that it’s one that is well worth it though, as long as you can coordinate the necessary logic changes with client applications. As with the READ_COMMITTED_SNAPSHOT option above, if you plan to use SNAPSHOT isolation early in the life of a database, before client applications become accustomed to the behavior, then it’s easier to deploy.

You can read up on all the nuances of how to enable the READ_COMMITTED_SNAPSHOT and ALLOW_SNAPSHOT_ISOLATION settings here.

On Who You Want to Be

2015-02-27T00:00:00+00:00

I recently learned that an old colleague of mine just passed away. This was sad news and a surprise, even though I knew that he had been diagnosed with cancer some time ago and that his time was limited. He had not been working for a while and so the status of his health had moved beyond my day to day visibility. It’s always a shock when someone you know finally dies, even though we know that everyone will die one day. He was relatively young (early 50s) and left a loving family and many friends and colleagues behind.

I have not experienced too much death as an adult. My grandparents mostly passed away when I was a child, and at that age you don’t tend to think much about the ramifications of death other than the fact that you feel sad for a while. You certainly don’t then tend to evaluate your own life and what you are doing. Recently, a few people that I know have passed and it’s made me think about who we are and the way we live our lives.

My old work colleague, Mike D, was one such person. He was one of the early founding fathers of the company where I myself have worked for 20 years, FactSet Research Systems. Now, the very phenomenon of working for the same firm for 20 years in today’s day and age is rare but in my case it’s been a journey that has followed the arc of a great tiny company growing up to be a great big company (there’s another blog post to write about that …). Mike was there from when FactSet was basically a startup and was closer to 30 years in. He didn’t hire me, I was actually hired in another continent by another founding father (who never seems to get the historical credit that he deserves), but he was always a significant figure in the background in my career. He was one of FactSet’s earliest engineers and did a lot of the foundational work that made the FactSet system what it is today. He did hire many of my contemporaries, a lot of whom are still around too, and to hear the tributes that they have offered after his death, it’s clear that he was very influential to them; he was a mentor, a role model and just a good guy. They commented on his contribution to the company but, more than that, they talked about him as a person, what he was like and what he did over and above his work as an engineer, engineering director and ultimately President and COO of the company.

It’s this last aspect that resonates with me and the reason that I felt compelled to write this blog post. He was a multi-faceted man with diverse interests and a good sense of humor. As I said, he didn’t hire me and I didn’t work for him directly. He was never a close, personal mentor figure to me or even an engineering role model. I only ever knew him once he was already quite senior in the company, as Director of the entire Software Engineering department. It would be easy for someone that senior to feel aloof and remote but he was never like that.

I remember taking trips with him to visit some of our clients and talking about work but also about books and art. He was the person who persuaded me that I had to read Zen and the Art of Motorcycle Maintenance. He was the one who persuaded me that I had to visit the Musée d’Orsay in Paris.

I didn’t have a formal education in Computer Science and I had joined FactSet as a customer support rep. However, once I saw what software engineering was all about it was clear to me that that was what I wanted to do. I took it upon myself to learn the necessary skills but it was Mike who ultimately said yes and allowed me to join the software department.

I remember working with him in Greenwich on FactSet’s first, and ultimately aborted, attempts to build a web product back in 98/99. I remember working on refinements to our =FDS() Excel functions and then strategizing with Mike on ways to “sell” this approach to one of FactSet’s other engineering founding fathers, who was most definitely aloof and not always of a mind to agree to do things that were not his ideas in the first place. I remember him visiting our remote offices in London and Tokyo when I worked there and enjoying some great dinners.

Others talk about his prowess on the sports field, and indeed he was a big guy and a varsity athlete in several sports. They also talk about how he was an avid fan who always had something to say about your team and would often make a cutting joke out of it. That didn’t mean much to me, being a non-athlete myself and also a Brit, unfamiliar with the American sports fan dynamic. I just remember him looking like Clark Kent and being able to sit very low in his office chair for such a tall man. There were some great images of his fandom though (see above).

He was not the greatest individual engineer, although his work ethic was amazing; he was not the greatest COO, being unwilling to shake things up when they really needed it sometimes, but he was a leader that people respected. He was someone who epitomized the idea of there being a FactSet family. He wasn’t a personal mentor to me but it’s clear that he was one to many others and he set the tone for much of the style and approach of FactSet’s engineering for years to come.

He was someone who represented the idea that you need to be more than just the role you serve in at a company. You need to influence the people around you as a person just as much as you do as an individual contributor (salesperson, marketer, engineer) or manager. Share your interests and love of many things and try to instill in others a similar love. Be a person who people will remember for more than the job you do day to day.

I’ve worked with, and for, many people over the years who were nothing much more than the job they did. I never felt a personal connection to those people and I will not remember them fondly. Don’t be that guy, be like Mike. I believe I have tried to do this and I will continue to do so all the more.

Goodbye sir. You were a good man, I wish I had known you better.

So dear reader … Who are you? Who do you want to be? How do you want to be remembered?

On Puns

2014-10-08T00:00:00+00:00

I tried to catch some fog. I mist.

When chemists die they barium.

Jokes about German sausage are the wurst.

A soldier who survived mustard gas and pepper spray is now a seasoned veteran.

I know a guy who’s addicted to brake fluid. He says he can stop anytime.

How does Moses make his tea? Hebrews it.

I stayed up all night to see where the sun went. Then it dawned on me.

This girl said she recognized me from the vegetarian club, but I’d never met herbivore.

I’m reading a book about anti-gravity. I can’t put it down.

I did a theatrical performance about puns. It was a play on words.

They told me I had type A blood, but it was a Type O.

A dyslexic man walks into a bra.

PMS jokes aren’t funny. Period.

Why were the Indians here first? They had reservations.

Class trip to the Coca-Cola factory. I hope there’s no pop quiz.

Energizer Bunny arrested: Charged with battery.

I didn’t like my beard at first. Then it grew on me.

How do you make holy water? Boil the hell out of it!

What do you call a dinosaur with an extensive vocabulary? A thesaurus.

When you get a bladder infection, urine trouble.

What does a clock do when it’s hungry? It goes back four seconds.

I wondered why the baseball was getting bigger. Then it hit me!

Broken pencils are pointless.

On Character Encodings

2014-10-08T00:00:00+00:00

In the beginning was the telegraph. It started with human operators manually sending messages in Morse code but in time, technological evolution led to automatic teleprinters using codings such as Baudot code and ultimately ASCII, the latter growing out of the desire to support upper and lower case letters as well as numerals and punctuation.

ASCII coding is actually very clever. It uses 7 bits to encode a total of 128 ‘characters’ including control characters (for the teleprinter or end display device) as well as actual characters for display. The structure of the encoding is what’s interesting though. All the control characters (with the exception of DEL) start with the bit sequence $00$. This means that it’s easy to check whether a character is printable or not, you just have to check bits 6 and 7 for 0. Also, the letters of the alphabet are all encoded in sequence starting with ‘A’ at 65 ($1000001$). Then ‘B’ is $1000010$, ‘C’ is $1000010$, etc. So if you mask bit 7 then you get direct access to the position of the character in the alphabet. Then the lowercase letters start with ‘a’ at 97 ($1100001$) and you can see the same sequence but with bit 6 set to 1 as well. To shift from uppercase to lowercase is a simple bit flip (the equivalent of adding or subtracting 32). Finally, the numerical digits start with ‘0’ at 48 ($0110000$) and then run consecutively through ‘9’ at $0111001$, and again we see that by masking bits 5 and 6 we just get the actual binary values 0 - 9. DEL is an interesting character. It’s not a control character meant to indicate that the user typed the key to delete the previous character, that’s Backspace (with code 8). Rather DEL is a way to void out an existing character that may have been stored somewhere. A good example of this is when punched cards would have been used to store data to be used in mainframe batches. To void out a character (indicated by a pattern of holes punched in a column on a card) you just had to punch out all the holes, effectively making that character deleted, i.e. ‘DEL’ ($1111111$).

Here’s the table:

So far, so good … for the English speaking world at least. But what about countries that used additional characters in their alphabets? They developed their own encoding systems of course. The arrival of the computer age, and the fact that systems processed data in 8 bit chunks (bytes), helped in that now a character would be represented as an 8 bit pattern and so 256 different values could be stored. ASCII was naturally extended and the values 128 - 255 were used to store additional characters. Different regions of the world chose to use this extra space differently though and thus was born the idea of codepage where a byte was interpreted relative to any one of a number of different lookup tables in order to determine what character it actually represented. Most codepages used exactly the same bit patterns to encode the standard ASCII character set but they all used the patterns with a leading 1 bit (values above 127) differently. The Latin-1 codepage (aka ISO/IEC 8859-1) was used widely in Western Europe since it contained all the characters needed for those languages. Other codepages were created to accommodate the cyrillic and other alphabets. In countries like Japan, China and Korea - whose languages contained way more glyphs than could be encoded in a single byte - multibyte \encoding schemes were invented. Often these were still compatible with ASCII but the byte values above 127 were used to indicate a shift into a different page where the following byte value was interpreted. This allowed them to encode pretty much all the characters that they needed. Standards proliferated and incompatibilities abounded. It wasn’t so much of a problem in the early days of computing but with the advent of the Internet, and the fact that data generated on one system (e.g. in Japan) could be shared with another system (e.g. in France), the incompatibilities were brought into stark relief.

Things could be made to work via clever switching of codepages but it wasn’t pretty. A new standard was needed. An industry working group was formed called the Unicode Consortium and, via a minor miracle, they managed to create a new, all encompassing standard called, unsurprisingly, Unicode. In very simple terms Unicode is a big table that assigns a unique number (a Unicode code point) to all characters. The Unicode standard actually includes a lot more than that though: rules for character collation and other essential matters such as support for right to left text; but for our purposes we can think of it as character = unique number. The code point is just a number now, how it is stored in computer memory is another matter.

Initially computer vendors implemented Unicode by storing all characters using either two or four bytes. These two schemes were known as UCS-2 and UCS-4 respectively (UCS = Universal Character Set). There were a few problems with this approach though. Firstly, the vast majority of the textual data stored on computers around the world was English and using two (and especially using four) bytes per character, to store data that was mostly just ASCII, was incredibly wasteful. UCS-4 encoded files were four times the size of the equivalent ASCII files. Secondly, the C programming language had introduced the concept of the null terminated string (commonly called C strings) whereby string data was stored as an array of bytes (characters) with the end of the string marked by a null byte. This assumed string structure was baked in to a “lot” of code, and that code was now incompatible with strings stored as arrays of UCS-2/4 characters because those character arrays contained null bytes in the middle of the strings. Code written to expect C strings would misinterpret the leading null byte in the UCS-2 encoding of an uppercase letter ‘A’ (ASCII code 65) as an end of string marker.

UCS-2 and UCS-4 were fine for new applications that stored, and interpreted, all strings as arrays of multiple bytes but they were no good for the efficient transmission of character data (because of the bloat) and interaction with legacy APIs (because they broke the C string paradigm). A new encoding was needed.

So along came UTF-8, a variable length Unicode encoding scheme, supposedly designed in an evening on a placemat in a diner. UTF = Unicode Transformation Format. UTF-8 is very clever too.

The UTF-8 encoding of ASCII is ASCII. Nice. For all the character data out there that fits in that space it just stays the same.
UTF-8 does not have embedded nulls and so UTF-8 strings can still be considered as null-terminated byte arrays, and thus can be consumed by legacy C APIs.
UTF-8 supports Unicode codepoints above 127 via a variable length byte encoding with the following scheme:

Notice that the following bytes in a multibyte sequence all start with $10$. This means that those bytes will never be misinterpreted as ASCII characters.
Also it’s very easy for code to move forwards and backwards by characters in a UTF-8 string. Simply scan bytes (forwards or backwards) until you find the next (previous) one that starts with something other than $10$.

Let’s look at an example: é, the lower case acute accented e character, whose Unicode codepoint is 233. This is encoded as follows:

UTF-8 became incredibly popular and I think it’s fair to say that it is now considered the de-facto way for character data to be encoded and exchanged across the Internet. In fact I saw a statistic the other day that stated that now there is more UTF-8 encoded data stored across computer hosts in the world than old codepage encoded data.

For strings stored in memory inside of a given modern application it’s likely that a fixed width character encoding will still be used, for the efficient (offset based) random access to characters that it gives. However as soon as that character data leaves the application domain and is exchanged with another application it will almost certainly be serialized as UTF-8.

On Pi

2014-10-07T00:00:00+00:00

Given that my last post was about floating point I thought this comic strip was very apropos, and it also introduces the topic of $\pi$ which is worth a post of its own.

So much has been written about $\pi$ already, and it’s probably the most broadly known mathematical constant, but for those (hopefully few) who don’t know what it is I refer you to this button …

The ratio of the circumference of a circle to it’s diameter is a constant that is the same for any size circle. This constant has been known since antiquity and is represented by the greek letter $\pi$ (pronounced pie). It is an irrational, and transcendental, number whose decimal representation is $3.141592…$.

Now, why does the definition involve the diameter of the circle as opposed to the radius? This is a question around which there is some controversy and there are people who are firm believers that $\pi$ is wrong and instead we should use the number $6.283185…$ that they represent by the greek letter $\tau$ (pronounced tau).

Most people can quote you the first few digits of $\pi$ ($3.141592…$), and to many that “is” $\pi$, but of course we know (as we learnt in my post about floating point it is but one possible representation, namely the base 10 positional notation representation. We can represent $\pi$ in different bases as we can any number. Here are a few alternative representations:

Resources About $\pi$

What Wikipedia has to say on the subject
The Joy of Pi (the website that accompanied the book)
A slice of Pi
A million digits of Pi
Vi Hart on Pi
More Vi Hart on Pi

On Floating Point

2014-09-24T00:00:00+00:00

I work for a software company that develops productivity applications for financial professionals (bankers, traders, portfolio managers, etc.). Our core activity is the distribution and presentation of numbers. Again and again over the years I have seen questions coming from clients, or from our client support staff, about the accuracy of numbers. Lay people (i.e. non software developers) understand the basic notion that computers store numbers to a limited degree of precision but their conceptual understanding stops there. If I had a dollar for every time I’ve heard someone say (or more accurately, write) that “floating point” numbers (they know the term at least) are accurate to only seven significant figures, then I would be a rich man. It’s not that straight forward. I’ve tried, and failed, on several occasions to educate people as to how floating point representation actually works. However, despite that track record, I am going to try one more time.

Base

The first thing we need to understand is the concept of base in the representation of numbers. Actually, even before that we need to recognize that “123” is just a symbolic representation of the abstract concept of the number one hundred and twenty three. And (and this is the important part) it’s only one of a whole range of possible symbolic representations. “123” is the representation of the number in base 10 positional notation. Now that representational system happens to be a very natural and convenient one that is rooted in the history of western culture. In fact, it’s how we “say” numbers in English (and many other languages): “one hundred and twenty three”. Well, roughly speaking that is, one might argue that perhaps it should be “one hundred, two tens and three”, but we’re not here to talk about the history of the evolution of the English language; we’re here to talk about floating point, and first about base.

Let’s think about what a set of digits in base 10 positional notation actually means. It’s a description of a sum of multiples of powers of ten.

\[123_{base10} = (1 \times 10^2) + (2 \times 10^1) + (3 \times 10^0)\]

More generally, for an integer N:

\[N_{base10} = \sum_{i=+\infty}^{0}{d_i10^i}\]

Where $d_i$ are the digits of N in base 10.

We can extend this beyond integers of course.

\[123.45_{base10} = (1 \times 10^2) + (2 \times 10^1) + (3 \times 10^0) + (4 \times 10^{-1}) + (5 \times 10^{-2})\]

Or more generally, for a real number R:

\[R_{base10} = \sum_{i=+\infty}^{-\infty}{d_i10^i}\]

Of course we don’t have to use base 10; any base will do. The representation of R in base B is:

\[R_{baseB} = \sum_{i=+\infty}^{-\infty}{d_iB^i}\]

For example, using base 8:

\[123.45_{base10} = (1 \times 8^2) + (7 \times 8^1) + (3 \times 8^0) + (3 \times 8^{-1}) + (4 \times 8^{-2}) + (6 \times 8^{-3}) + (3 \times 8^{-4}) + (1 \times 8^{-5}) + ...\] \[123.45_{base10} = 173.34631..._{base8}\]

Note the use of the ellipsis there to indicate that the digits continue. They continue forever actually with the four digit pattern $4631$ repeating again and again. The thing to note here is that a fraction with a relatively compact representation in one base ($.45_{base10}$) can have a much more verbose representation in another base. And the reverse can be true too.

\[0.001953125_{base10} = (1 \times 8^{-3}) = 0.001_{base8}\]

The fractional part of the base N positional notation representation for a rational number will either terminate or fall into a repeating sequence of a fixed number of digits. Whether it’s one or the other will depend on the base. An irrational number, however, will have a non-terminating, non-repeating fractional part (e.g. $\pi = 3.14159265 …$).

Consider $1 \over 10$. This is $0.1$ in base 10 but $0.000110011001100…$ in base 2, where the pattern of repeated digits (bits in this case) is $0011$. And $1 \over 3$ is $0.1$ in base 3 but is $0.3333333…$ in base 10 with $3$ repeated forever.

Scientific Notation

The second thing we need to understand is the concept of scientific notation, a standard way of writing numbers that are too big or too small to be conveniently written in decimal form. In this notation any Real number R is represented as:

\[R = m \times B^e\]

Where m is the normalized value of R (called the significand or mantissa) and e is an integer (called the exponent). m is chosen such that $B^0 \leq |m| < B^1$ and e is chosen accordingly. For example:

\[123.456_{base10} = 1.23456 \times 10^2\] \[0.00123456_{base10} = 1.23456 \times 10^{-3}\]

Significant Figures

The final thing we need to understand is that for obvious reasons we can’t store and manipulate numbers in a representation that requires an infinite number of digits. We are always going to have to limit the amount of space (computer memory or literal space on a page) required, and trade precision for storage size. So, we have the concept of the maximum number of contiguous digits (known as the number of significant figures) that can be part of a given representation.

Let’s look at a base 10 example using scientific notation and assume that we use a fixed 6 significant figures for the mantissa and 2 for the exponent.

\[123.456789012_{base10} = 1.23457 \times 10^{02}\] \[123,456,789,012_{base10} = 1.23457 \times 10^{11}\] \[0.000000000123456789012_{base10} = 1.23457 \times 10^{-10}\]

In the first case we actually store $123.457$ which is off by $0.000210988$; in the second case we actually store $123,457,000,000$ which is off by $210,988$; and in the third case we actually store $0.000000000123457$ which is off by $0.000000000000000210988$. In all cases the delta in percent terms is the same: $0.000171\%$.

Floating Point

And so finally we come to floating point representation as codified in the IEEE 754 standard, Commonly encountered formats are 32 bit (“single precision” or “float”) and 64 bit (“double precision” or “double”). Both use the same structure to represent a number in base 2 scientific notation but allocate a different number of bits to each part. A single precision value is structured like this:

There’s an initial sign bit, followed by 8 bits in which the exponent value (e) is stored as an unsigned integer (with a bias of 127), followed by 23 bits in which the digits of the normalized mantissa are represented with an implicit leading 1 digit (i.e. $1.b_{22}b_{21}b_{20}…b_0$). The represented value is given by:

\[value = (-1)^{s} \times (1 + m) \times 2^e\]

where $ e = e_{biased} - 127 $, $ e_{biased} = \sum_{i=23}^{30}{b_i2^{i-23}} $ and $m = \sum_{i=22}^{0}{b_i}2^{i-23}$.

In the above example:

\[sign = 0\] \[m = \sum_{i=22}^{0}{b_i2^{i-23}} = 2^{-2} = 0.25\] \[e_{biased} = \sum_{i=23}^{30}{b_i2^{i-23}} = 2^2 + 2^3 + 2^4 + 2^5 + 2^6 = 124\] \[e = -3\] \[2^e = 2^{-3}\] \[value = (-1)^0 \times (1 + 0.25) \times 2^{-3} = 0.15625\]

Double precision values use 11 bits to store the exponent and 52 bits to store the mantissa.

So, what about the misconception that we mentioned at the top of this post, that a float is accurate to 7 significant figures? Well, I hope you now realize that it depends on the number and how that number can be represented as a sum of powers of 2. The number $1 \over 65536$ is $0.0000152587890625$ in base 10 but that can be represented with 100% accuracy in a float since it’s just $1 \times 2^{-16}$. However the number $1 \over 10$, $0.1$ in base 10, cannot be represented accurately in a float since it cannot be represented as a finite sum of powers of 2.

\[0.1 = 2^{-4} + 2^{-5} + 2^{-8} + 2^{-9} + 2^{-12} + 2^{-13} + 2^{-16} + 2^{-17} + ...\]

With only 24 significant binary digits the infinite summation terminates and is equal to $3355443 \over 33554432$ which is $0.0999999940395355$.

So, the next time someone says that a float has 7 significant figures, the correct answer is “well, that depends …”.

On Intellectual Jokes

2014-09-19T00:00:00+00:00

I saw this old joke again today …

“What’s the difference between a joke and a rhetorical question?”

… and it reminded me of the beauty of the “Intellectual Joke” - a joke that is only funny to those “in the know”, those with a base of knowledge or understanding about certain topics. I love these jokes. So I thought I’d compile a list of a few of my favorites.

Entropy isn’t what it used to be.

Q: What’ s the difference between an entomologist and an etymologist?

A: An etymologist knows the difference.

Ed: Something about that joke bugs me a little.

Some scientists want to cool my body down to -273.15 degrees Celsius. My wife thinks it’ll kill me, but I think I’ll be $0$k.

Is it solipsistic in here or is it just me?

Werner Heisenberg, Kurt Gödel, and Noam Chomsky walk into a bar. Heisenberg turns to the other two and says, “Clearly this is a joke, but how can we figure out if it’s funny or not?” Gödel replies, “We can’t know that because we’re inside the joke.” Chomsky says, “Of course it’s funny. You’re just telling it wrong.”

Werner Heisenberg and Erwin Schrödinger are driving together and get pulled over for speeding. The cop asks Heisenberg “Do you know how fast you were going?” Heisenberg replies, “No, but we know exactly where we are!” The officer looks at him confused and says “you were going 108 miles per hour!” Heisenberg throws his arms up and cries, “Great! Now we’re lost!”

The officer looks over the car and asks Schrödinger if the two men have anything in the trunk. ”A cat,” Schrödinger replies. The cop opens the trunk and yells “Hey! This cat is dead.” Schrödinger angrily replies, “Well he is now.”

Q: Why do engineers confuse Halloween and Christmas?

A: Because Oct 31 = Dec 25

There are 10 kinds of people in the world, those who understand binary and those who don’t.

There are two kinds of people in the world, those who think that there are two kinds of people in the world and those who don’t.

There are two kinds of people in the world, those who can extrapolate from incomplete data and …

Three logicians walk into a bar. The bartender asks “Do all of you want a drink?” The first logician says “I don’t know.” The second also says “I don’t know.” The third says “Yes!”

A Helium atom walks into a bar and orders a beer. The bartender says, “Sorry, we don’t serve noble gases here.” He doesn’t react.

The French existentialist Jean-Paul Sartre was sitting in a cafe when a waitress approached him: “Can I get you something to drink, Monsieur Sartre?” Sartre replied, “Yes, I’d like a cup of coffee with sugar, but no cream.” Nodding agreement, the waitress walked off to fill the order and Sartre returned to working. A few minutes later, however, the waitress returned and said, “I’m sorry, Monsieur Sartre, we are all out of cream - how about with no milk?”

Q: What does the “B” stand for in Benoît B. Mandelbrot?

A : Benoît B. Mandelbrot.

I’m so meta even this acronym.

xkcd - Some of my favorites

On Writing

2014-09-15T00:00:00+00:00

Writing … a skill that is vastly underappreciated.

Good writing is rare these days, especially in the modern business arena where throw away communication - email and, increasingly, instant messaging - tends to dominate. A well constructed, and well edited, document is a pleasant breath of fresh air among the usual collection of odors that masquerade as typical business communication. So, it seems appropriate that my first real blog post should be on the subject of writing. It is afterall one of the main reasons that I decided to finally start a blog, long after blogs have become old hat, to be replaced with the modern world of ocial media, throw away “posts” and 140 character tweets.

Now, my views are, inevitably, colored by the arena of my experience, that of software development in the service of financial services. Perhaps other fields - the law or journalism - are still bastions of well constructed rhetoric, I don’t know. However, in the world of software, good code is seen far more often than good prose.

I have always valued communication skills alongside technical skills and I’d like to think that I have a good collection of both. My career to date has certainly been defined by the blend of communication and technology, and often by the task of communicating “about” technology, whether that is as part of the sales or client support function, internal project management, or training and documentation. The moniker “Communication Skills” is ascribed to a category of competences including oral presentation, coaching, selling, consulting, writing and more. We all get opportunities to exercise our verbal skills every day, and I can wax extemporaneous with the best of them, but how often do we go out of our way to exercise our skill in writing?

I am starting this blog for various reasons: to create more of a public face for myself (if I’m honest) but also to give me a reason to write. I want to take the time to craft articles about topics that interest me and publish them for posterity. Perhaps some people will get some value out of the things I write, I sincerely hope so, but it doesn’t really matter if they do or they don’t. I’m doing this so that I can exercise some discipline in order to research and craft a collection of articles, each of which will represent a whole lot more effort than a Facebook post or a tweet. Now this is not to say that I don’t see Facebook and Twitter as valuable communication channels, I do. I use Facebook all the time (perhaps too much actually) but it’s a very different medium, all (for me at least) about staying in contact with friends (old and new, near and far). I’m a big fan of throw away posts and sharing random witticisms and aphorisms but that’s not writing. This blog will be my writing.

At least I’m going to limit myself to the “typed” word as opposed to the written word; I’m not that much of a masochist. I wrote way too many essays by hand back in school (far too many years ago) and I don’t think my hand would last for more than a page these days before it would collapse into a withered claw of cramp. No diary for me, just some online articles about software development, mathematics, guitars, culture and other miscellaneous crap.

Of course this could all be hubris and bravado and I might end up writing a bunch of cheesy fanboy pieces on nostalgic topics from my youth.

So you have been warned. Abandon all hope ye who continue beyond this post …

On Getting Started

2014-09-12T00:00:00+00:00

I have been meaning to create a personal website and associated blog for some time now and finally, and self-evidently, I have. Every journey has a first step and every blog has a first post, so this is it. It will be brief and will end here. Until next time.

Das Miscellany

On The Countdown Numbers Game

Game Shows

That Other Game

Initial Approach and Attempts

My Solution

How Does It Work?

On Fizz Buzz

Coding Interviews

Fizz Buzz

On Ants, Boxes and Straight Lines

A Puzzle

On Macs

A New Macbook Pro

On Getting Back on the Horse ... Again

On Perspective

The Tao of Ted

On a Tricky Derivative

Tools

Approach #1

Approach #2

Comments on the function

Now It All Gets a Bit Complex …

On a Cool Logarithm Identity

Prep

Proof

On a Neat Excel Formula

Tools

Searching for occurrences of a substring inside a string

Example 1

Example 2

On Windows Auth and Kerberos

Authenticate You I Will. But How?

Authentication and Authorization

Different Types of Authentication

Single-Factor Authentication

Two-Factor Authentication

Multi-Factor Authentication

SQL Server Authentication

Windows Authenticaion

NTLM

Kerberos

Did my SQL Server connection Use Kerberos or NTLM?

References

On AD Managed Service Accounts

What Account Should a Windows Service Run As?

Creating and Configuring a Group Managed Service Account

Installing a Group Managed Service Account on a Computer

Configuring the SQL Server Service to Logon as the Service Account

Service Dependencies

On the Structure of Pi

Number

Number Systems, Base and a Bias for 10

Other Types of Number

Special Decimal Numbers

Continued Fractions

Continued Fractions for Irrational Numbers

The Continued Fraction for \(\pi\)

The Continued Fraction for \(\sqrt{x}\)

Rational Approximations to Irrational Numbers

\(\pi\)

\(\phi\)

On Flowers

Inspiration

A Seed Renderer

Animation

On How to Fold a Fitted Sheet

On Trees and Paths

The Question

Solution Strategies

Binary Tree Traversals

Generic Code for Tree Traversal

Specific Code for the Problem at Hand

Solution Validation

Resources

On an Interesting 9 Digit Number

A Puzzle

Solution Strategies

A Mathematical Approach

On Parabolas and Multiplication