I love the idea behind declarative programming languages: you start by specifying the result you want to achieve rather than how you want to achieve it. Some, such as SQL hide away the complexity by providing a fixed set of highly optimized and yet expressive high-level operations such as SELECT, JOIN etc. Others, such as my personal favorite, Prolog, allow unlimited flexibility as long as you're willing to implement your own operators.
The design of Prolog is simple and beautiful. It basically does just two things, and it does them really well: allow you to specify a search tree for your problem and then traverse that tree.
The devil is in the details though: Prolog's logic based rules are merciless. One overlooked condition, and the program would be stuck forever, generating and mindlessly following an infinite number of branches in the tree, while the solution might lie just in the sibling of one of the looping branches.
In my opinion the language of the future does not traverse the tree specified by the programmer, whether it's an Abstract Syntax Tree that is translated literally to a sequence of operations, or a tree of theorems and conclusions traversed by Prolog. Instead it explores it, not afraid to skip some operational branches if a solution can be found faster.
Currently it is not yet feasible to have an entire program generated or executed by some intelligent system based on a loose set of requirements - I believe that, dependent on how loose and high level the requirements are, achieving this goal would be very close to building a general artificial intelligence. We can, however, relax our imperative set of instructions a little bit, allowing our program to be clever when the operational details are vague. This saves us some time by decreasing the number of operations we have to explicitly code in a program.
Consider the following program:
val k = choose between 1 and 5
val answer = (k == 4) ? "Happy" : "Sad"
select answer, k where answer == "Happy"
Imagine that the constant 4
is replaced by some hidden variable that our program is trying to find, in order to know when the answer would say "Happy"
. The only prior knowledge we have is that it's value lies somewhere between 1 and 5. Looks simple enough, right? We could write a simple for loop to find the desired value of k
. With a slight syntax alterations Prolog could do it very easily by testing each value of k
as a separate hypothesis. SQL could do it, if choose
was a function that generates a table of numbers. Even in Python, a program where choose
is replaced by range
would look similar enough.
How about
val k = choose between 1.0 and 5.0
val answer = (k ~= 4.0) ? "Happy" : "Sad"
select answer, k where answer == "Happy"
where ~=
is float comparison. It's easy to see how a standard left-to-right search would waste a lot of cycles, since there's a lot of values between 1.0
and 4.0
. The same happens if we add more parameters to the function - the search space grows exponentially. If the problem was instead stated as looking for a solution to a function k - 4.0 ~= 0
, it could be solved much faster using e.g. Newton's method. The takeaway from this is: if we have a useful model of our parameter space (in this example a differentiable function of one variable), we are able to search for a solution efficiently.
If the above examples don't look very useful consider instead
val k = choose between 2 and 50
val experiment = trainKnn(k, trainingData, trainingLabels)
optimize experiment by experiment.errorRate limit 10
What I would like this expression to do is to run a K Nearest Neighbors while trying to minimize errorRate
by tweaking the parameter k
. Furthermore, I would like the optimization to finish after no more than 10 iterations. Such optimization procedure is often implemented as grid search in popular machine learning packages such as sklearn. However, it turns out that KNN and other machine learning algorithms' objective function can be successfully modeled as a Gaussian Processes and iteratively optimized wrt. their hyperparameters.
Relaks
Above snippets are valid programs in Relaks - a Scala DSL. It treats unknown parameters as first-class citizens - to a programmer there's no distinction between a parameter and operationally computed variable while writing the program. This allows stating a parameter search problem in a declarative way, abstracting away the "how" and focusing on the "what".
The similarities between Relaks and SQL are intentional - it uses a powerful experiment database metaphor to define the optimization problem. It exposes a LINQ-like API to the programmer. The columns are inputs and outputs of the algorithms under optimization - hyperparameters and evaluated fitness, but also any output an algorithm may produce, for bookkeeping. Once defined, an experiment table can be queried using familiar SQL-like syntax - LIMIT, WHERE etc via the LINQ API. Using the API the programmer can manipulate the search space and optimization process using almost natural language. For example a WHERE clause on a hyperparameter would narrow down the search space and LIMIT would decrease the number of performed search iterations. Under the hood the experiment table doesn't actually exist - it is generated row by row, using hints from the query to limit possibly very large search space.
Let's take a look at a comprehensive example:
object Program extends
DSLOptimizerInterpreter(SpearmintOptimizer) {
val x = choose between 0.0 and 15.0
val y = choose between -5.0 and 10.0
val result = optimize (x, y) map { case Tup(x, y) =>
val res = to (branin _) apply (x, y)
res as 'answer
} by 'answer
limit 100
// This could alternatively be written as
// val answer = to (branin _) apply (x, y)
// val result = optimize (x, y) by answer limit 100
store(result)
}
Program.dump()
This program will try to find a minimum of the Branin-Hoo function within the specified search space, using up to 100 iterations of an optimization algorithm. store
and dump
are hints to the interpreter to materialize and print out the experiment table at the end of the program.
Work in progress...
An experimental implementation of a language of the future with a Gaussian Process optimizer as a DSL for Scala is available on Github. It consists of LINQ-like extensions for Scala with a query optimiser that can reorder SQL operators and fuse them with hyperparameter definitions. I have also implemented an interpreter for the language and a caching system that can factor out common operations between subsequent optimizer runs and cache the results in memory. This was part of my Master's Thesis (abstract).