Automatic Differentiation
Automatic differentiation is a simple yet powerful tool which allows you to calculate (evaluate) the partial derivatives of any function at any point where it is defined ! In this article we are going to code a simple automatic differentiation tool in C++.
Dual Numbers
Several techniques exist, but the simplest, especially in C++ (you will see why), uses dual numbers. Dual numbers, just like complex numbers, are an extension of the real numbers. They can be expressed as the following :
Here (a,b) are real numbers, and epsilon is an abstract number, just like i²=-1 for the complex. It gets really interesting when you use dual numbers this way :
Where x’ is the derivative of x, with respect to itself in which case it will be 1 (we call it the seed value), or with respect to another variable in which case it will be 0. It is very interesting because of the arithmetic that we can build upon this. Lets make some examples. What happens if you add two dual numbers ?
We get a new dual number with the real part being the sum of the real parts of each individual dual numbers, and the derivative part is indeed the derivative of the real part (derivative of a sum is the sum of the derivatives). Lets try multiplication.
As you can see, the very simple rule which states that epsilon²=0 makes the whole thing work beautifully. You can expand these calculus to all operations, always associating an expression with its derivative :
Operators Overloading
C++ allows us to overload operators. It means that we can redefine the behavior of the elementary operations (+, -, *, /) on classes we’ve created. That makes the whole thing super easy to implement !
Therefore, the idea is to create a Dual class which acts as a number, but we will overload its operators to compute derivatives when calculations are made. We will first write our header file which contains all the function’s prototypes.
Now we need to write the actual body of these functions in a source file. First the constructors, getters and setters.
Then the arithmetic and output operators.
The last function simply says that whenever we decide to print a dual number, we print the value part of it. Finally, we implement some commonly used math functions.
That’s it ! Now that we have overloaded the arithmetic operators, whenever you are going to write something like a+b
where a and b are instances of the Dual class, the derivative of our expression is going to be calculated at the same time as the expression itself.
Lets say we want to differentiate f = y*x^2
with respect to any of the two variables. We know that df/dx = 2xy
and df/dy = x^2
. Thus, if we evaluate the derivative of f
with respect to x
at the point (x,y)=(5,6)
we should get as a result 2*5*6 = 60
.
This is exactly what you get if you run this code ! Isn’t it amazingly simple ? You can now differentiate any math formula !
You can find the full code used in this article, plus some extra gradient descent implementation, on GitHub.
Conclusion
Technique presented here is called forward mode automatic differentiation and you may have noticed that it has a major downside : if you want to get the full gradient of a function, you have to compute the actual function as many times as it has input variables (changing the seed value every time). This will become really inefficient for a function that has many inputs.
Another technique called reverse mode automatic differentiation allows you to compute the gradient running the computation as many times as you have output variables. That means you can calculate the gradient of a many-to-1 function in only one go ! This technique is a little harder to implement but it is way more efficient.
I hope you learned something ! If you liked this article please give it a clap (or two), it would help me very much !