The Goal

The media is buzzing with discussions about Chat GPT, Stable Diffusion, and other artificial intelligences (AIs) that seemingly threaten to replace all our jobs. However, are these stories presenting a complete and accurate picture? To gain a better understanding of these emerging technologies and their implications, it is crucial to delve deeper. In my humble opinion, the most effective approach to unraveling these questions is to deconstruct and reconstruct some of these technologies ourselves. Thankfully, brilliant minds such as Grant Sanderson (3Blue1Brown) and Andrej Karpathy (Stanford University) have already paved the way for us. In this blog post, we will leverage the frameworks they have provided to rebuild these models from scratch. I highly recommend watching the videos in the upcoming section, as they will form the foundation of our journey today.

This series will consist of the following parts:

  1. Create a Value object that can store floats and has the basic arithmetic operators implemented.
  2. Implement a graphing solution to visualize our arithmetic operations
  3. Modify our Value object with some bonus features (Gradient Descent, Topological Sorting, ReLU)
  4. Train and solve the MNIST handwriting database

Hopefully by the end of this series of posts, we will have a deep understanding of how these Neural Networks are designed, implemented, trained, and tested.

Prerequisite Learning

The Initial Library

Let’s start by creating a library of Values that have some special properties. Those properties are:

  1. Can store (wrap) a float value
  2. Can execute simple arithmetic operators (+, -, *, /, **)
  3. Can store the operands of the resulting value

Our requirements will evolve overtime as we continue on your journey.

Storing Values and Arithmetic

Firstly we need a value object that can store some floats. This will then be used later to do some simple arithmetic.


class Value:
    pass

class Value:

    def __init__(self, value = 0) -> Value:
        self.value = value

Note: The first time we define the Value class we just write pass. This is so that our type hinting in the second class definition will not error out when we output a Value. This is the lazy solution.

Now lets see what our Value object looks like:


from value import Value

a = Value(2.0)

print(a)

When we run this we get:

<value.Value object at 0x7f516c103550>

Which is the memory address and not a useful representation of our object.

This is where Python’s magic methods come into play. Python’s magic methods are simply a way to define behavior for hidden expressions. Simply put when we type:

print(a) or b = a + 3.0

what the python interpreter unpacks is:

print(a.__repr__()) and b = a.__add__(3.0)

Knowing this we must now define the methods for add, multiply, and representation.


class Value:
    pass

class Value:

    def __init__(self, value = 0) -> Value:
        self.value = value

    def __add__(self, other) -> Value:
        other = other if isinstance(other, Value) else Value(other)
        return Value(self.value + other.value)

    def __mul__(self, other) -> Value:
        other = other if isinstance(other, Value) else Value(other)
        return Value(self.value * other.value)

    def __repr__(self) -> str:
        return "Value({})".format(self.value)

Which would now make a test like this work:


from value import Value

a = Value(2.0)
b = 3.0

c = a + b
d = a * b

print(c)
print(d)

Resulting in:

Value(5.0)
Value(6.0)

But what if we did the operations the other way?


a = Value(2.0)
b = 3.0

c = b + a
d = b * a

print(c)
print(d)

Results in:

Traceback (most recent call last):
  File "/home/otis/github/llms/tests.py", line 6, in <module>
    c = b + a
TypeError: unsupported operand type(s) for +: 'float' and 'Value'

Luckily we have another set of magic methods prefaced with the character r.

    def __radd__(self, other):
        return self + other

    def __rmul__(self, other):
        return self * other

When __add__() is called on the Float 3.0 it can’t resolve its logic for the parameter of type Value. Python will then call the ___radd__() method on the Value operand and pass in the Float as the new parameter. What that looks like is:


a = Value(1.0)
b = 2.0

c = b + a

c = b.__add__(a) # Fails and replaces expression with below

c = a.__radd___(b)

Using this knowledge we can now define some more simple arithmetic for our Value class including division, powers (floats and ints only), and subtraction.


class Value:

    def __init__(self, value = 0) -> Value:
        self.value = value

    def __add__(self, other) -> Value:
        other = other if isinstance(other, Value) else Value(other)
        return Value(self.value + other.value)

    def __mul__(self, other) -> Value:
        other = other if isinstance(other, Value) else Value(other)
        return Value(self.value * other.value)

    def __pow__(self, other) -> Value:
        assert isinstance(other, (float, int))
        return Value(self.value ** other)

    def __truediv__(self, other) -> Value:
        other = other if isinstance(other, Value) else Value(other)
        return Value(self.value / other.value)

    def __sub__(self, other) -> Value:
        return self + (-other)

    def __neg__(self) -> Value:
        return self * -1

    def __repr__(self) -> str:
        return "Value({})".format(self.value)    

    def __radd__(self, other) -> Value:
        return self + other

    def __rmul__(self, other) -> Value:
        return self * other

    def __rsub__(self, other) -> Value:
        return other + (-self)

    def __rtruediv__(self, other) -> Value:
        return Value(other) / self

Storing the Children

Now that we have a value object that can utilize common operators like addition, subtraction, multiplication, and divison, we want to make sure that we can track the children of these operations. This will come in handy when we start graphing and would like to see the genealogy of the operation.

Lets start by creating a place to store our operands. We store a tuple as our children since each operation will have at most 2 operands. Storing the tuple as a set for speed optimization.


class Value:

    def __init__(self, value = 0, op = "", children = ()) -> Value:
        self.value = value
        self.op = op
        self.children = set(children)
    
    ...

Lets see how to implement this with the addition operation. Modify the __add__() method to include the following:


def __add__(self, other) -> Value:
        other = other if isinstance(other, Value) else Value(other)

        out = Value(self.value + other.value, "+", (self,other))

        return out

We can now modify the rest of the methods in a similar way. Resulting in this semi-final* class:


class Value:
    pass

class Value:

    def __init__(self, value = 0, op = "", children = ()) -> Value:
        self.value = value
        self.op = op
        self.children = set(children)

    def __add__(self, other) -> Value:
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.value + other.value, "+", (self,other))
        return out

    def __mul__(self, other) -> Value:
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.value * other.value, "*", (self,other))
        return out

    def __pow__(self, other) -> Value:
        assert isinstance(other, (float, int))
        out = Value(self.value ** other, "exp {}".format(other), (self, ))
        return out

    def __truediv__(self, other) -> Value:
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.value / other.value, "/", (self, other))
        return out

    def __sub__(self, other) -> Value:
        return self + (-other)

    def __neg__(self) -> Value:
        return self * -1

    def __repr__(self) -> str:
        return "Value({})".format(self.value)    

    def __radd__(self, other) -> Value:
        return self + other

    def __rmul__(self, other) -> Value:
        return self * other

    def __rsub__(self, other) -> Value:
        return other + (-self)

    def __rtruediv__(self, other) -> Value:
        return Value(other) / self


Conclusion

So far we have created a Value object that can store numbers and conduct some arithmetic operations. It can also store its children, which are the operands of each operation. In the next post we will be implementing a graphing solution so we can better visualize these children-parent relationships.

We will also be modifying our Value object with gradient descent and ReLU in the next section. Transforming this from a simple numbers library, into something a neural network might be able to use.