An Architecture for Computer Go
2.0 Knowledge Representation
The best Go programs today play only at the advanced beginner level. It is our belief that with better tools for representing and maintaining Go knowledge, it will be possible to develop stronger Go programs. In this paper we describe a rule language for representing Go knowledge declaratively and a software architecture that supports incremental state update and knowledge maintenance. This architecture includes a system for backtracking state changes, a truth maintenance system for propagating the effects of state changes, and a knowledgebase for storing rule-derived information about the state.
A brute force approach to playing the game of Go is impossible. The most successful programs try to model human Go play, but play only at the advanced beginner level. Some of their weaknesses are inadequate analysis of life and death, lack of consistency of play, and poor static evaluation functions [Müller 95].
Based on our own experience, the computer Go literature [KCN 90, CGML 94-96], and communications with other programmers, it seems that these inadequacies are due in part to difficulties in software engineering and knowledge representation. There is a great deal of Go knowledge available to Go programmers in the form of human expertise and the traditional Go literature, but much of it is difficult to represent, and as a result has been ignored or replaced with overly simple heuristics.
In order to build a program that makes use of more of the available expertise, we have developed a software architecture that facilitates the representation and maintenance of complex Go knowledge.
Two basic observations about competent human Go play motivate our design:
There are certain configurations of stones (usually called patterns in computer Go discourse) that frequently recur in games. The presence of a given configuration, or pattern, often conveys useful positional information that would otherwise have to be derived through difficult, time consuming search.
For example a strong human player will instantly recognize the configuration shown in Figure 1, and know that playing at X is probably a very good move for black to connect her stones and put pressure on white's stone.
FIGURE 1. Black to play
Meanwhile, only 4 of the other 31 empty points in the configuration look even plausible for black, and most of the others look downright hopeless.
In addition to knowing that X is probably a good move, the knowledge of the follow-up sequences can aid a player in analyzing exactly how good a move it is in a particular context (the surroundings outside the area shown in the diagram). The position shown in Figure 1, for example, is so common that all of the reasonable variations are known to any strong player. Each of these variations is visualized to the end, and only at the leaves, where interactions with stones outside the area pictured begin to occur, is real analysis necessary. Using search to discover these variations during a game would be terribly time consuming, if not a practical impossibility.
Players apply their pattern knowledge to recognize other kinds positional information as well, including the existence of positional features such as relationships, territory walls, and eyes. For example, a Go player who sees the pattern below will recognize that the two stones are related by a keima relationship, and can use that knowledge in positional evaluation.
FIGURE 2. Keima relationship
In order to capture this kind of pattern knowledge declaratively, we use a first-order language to state rules of the form: "if preconditions then assertions." In any state where the preconditions hold, the assertions are known to hold as well.
The preconditions of a rule consist of a set of constraints on occupancy, liberty count, and territory ownership of points, among others. Sets of occupancy constraints that are spatially related are represented in a single pattern map. Pattern maps may be rectangles of any size ranging from 1 × 1 to 19 × 19. If there is more than one pattern map, each map must be related by an inter-map relation to another pattern map in the rule.
Figure 3 shows a rule with two pattern maps. The points marked with diamonds allow black, white, or empty. The inter-map relation requires that points P and Q be in the same white block, but don't specify what that block "looks like." Additional constraints require the block labeled Q to have three liberties and each of the blocks labeled R to have more than two. The liberty constraints on Q and R guarantee that Q pushing out at S will not threaten the capture of an R block.
FIGURE 3. Two map "net" rule
X <> S AND SameBlock(P, Q)
libs(Q) = 3 AND libs(R) > 2
The rule whose preconditions are shown in Figure 3 has an assertion of the form:
move(good, black, X, reduceLibs, Q, specificity 30, difficulty 70)
This says that playing at the point marked X is a good move for black to try to reduce the liberties of the white block marked Q.
The specificity is a value between 1 and 100 with low values indicating very general rules, and high values very specific ones. All else being equal, we will examine a move suggested by a more specific pattern over a less specific one.
The difficulty is a value between 1 and 100 with low values indicating an easy, stylized continuation, high values a more complicated one. The difficulty is our estimation of the complexity of reading out the situation that results from playing the suggested move. In the presence of time constraints, we prefer to examine simple variations before more complex ones.
If there are obvious counter-moves to the move suggested by a rule then we include pointers to the rules suggesting these counter-moves. In this way, we represent the reasonable variations stemming from the suggested move. In addition, if there are no counter-moves, we allow specification of directions, or hints to the reading routine for how best to evaluate or continue the analysis of the position.
Figure 4 shows a position where the rule of Figure 3 would match.
The first pattern map matches the diagram under a 90° counter-clockwise rotation with Q mapped to point a. The second map matches under a 180° rotation with P mapped to point c. The map relations are satisfied since d is not the same point as either of the points marked e, and a and c are both in the same block. The additional constraints are satisfied since the block containing a and c has three liberties and each of the blocks labeled b have more than two. In this situation, black to play at the point d is recommended as a means to capture the white block.
Developing a good rulebase is an important part of developing a strong program. Errors in the computer's play are often due to a promiscuous rule (one that makes inappropriate assertions), or a missing rule (when the rulebase fails to generate an important valid assertion).
Rulebase refinement is an ongoing process. When detected, a promiscuous rule is replaced with a set of more specific rules that cover all the contexts in which the old rule was useful, but don't generate the inappropriate assertions. When a missing rule is detected, a new rule is created to fill the gap.
To make the creation and editing of rules fast and convenient, we have developed a graphical rule editor.
Human Go players accumulate knowledge about a game as it progresses. Specifically, they recognize and keep track of high-level entities such as blocks of connected stones, groups of "related" blocks (see Figure 2), and territories, as well as attributes of these entities. In the course of evaluating candidate moves, they use goal-directed reading to develop adversarial plans, or trees of move and counter-move in pursuit of some small set of goals in an area. Goal-directed reading is necessary to answer narrow tactical questions, like whether a particular group can be captured.
By applying pattern knowledge and accumulated positional knowledge, a player is able to exclude bad moves from consideration and quickly recognize and stop exploration in hopeless positions during planning.
Human players also try to determine a context for their plans, or a set of conditions under which their analysis will remain valid. Often, these conditions are of the form: "I don't have to worry about it unless my opponent plays in this area." If one of these conditions is violated in subsequent play, a player knows that the plan may no longer be valid. At that point he may choose to reevaluate the plan to see if it still works, or may decide that it is no longer important and discard it.
By maintaining plans from move to move, players gain a number of advantages:
We have developed a software architecture that supports this kind of incremental knowledge maintenance. The main components of this architecture are a reversion system (for backtracking), a truth maintenance system (for propagating the effects of change), and a knowledgebase (for storing assertions about the current position).
Any Go program that performs search must implement some kind of backtracking scheme. There are several obvious approaches. In what follows, we refer collectively to all the data structures representing information accumulated about a game up to a particular point in time as "the state."
One non-incremental approach to backtracking is to make a complete copy of each state while searching, and then backtrack by reverting to the previous copy. Since we expect the changes to the state caused by a single move to be only a small fraction of the whole state, this method is bound to be inefficient.
Another non-incremental alternative is to save nothing, and to regenerate the entire state anew after every move and backtrack. This is easy to implement, but becomes increasingly inefficient as more and more knowledge is represented in the state.
One incremental approach is to write two versions of every routine, one for the forward direction calculating the effects of a move on some part of the state, and one for the backward direction reversing those effects. This method is difficult, tedious, and error-prone.
We have implemented an incremental scheme that supports the efficient backtracking of changes to data structures while placing minimal burden on the programmer. Since our program is written in C++, all structures of interest are objects. Objects whose state must be preserved for backtracking changes are called revertable objects.
Before a revertable object's state is changed, it records an image of its current state, and pushes it onto a private history stack. It then alerts the revertable object manager, which stores a reference to the object.
When a move is backtracked, the revertable object manager tells each revertable object that was changed during that move's state update to revert to its previous state. Objects created and destroyed during the update are destroyed and reactivated, respectively.
For example, if a value is to be appended to a revertable linked list, a new node n with this value is constructed, and the last node of the list, m, saves its state before pointing to n. After the backtrack, n is destroyed, and m's old state (pointing to nil) is restored.
The system is unintrusive to the programmer. It requires only that each new revertable class be derived from the base class Revertable, that two macros be instantiated, and that a "save" function be called in every member function that changes the object's state. The system is efficient because objects are only copied when they are changed, and only when they are changed for the first time after a new move. Any state update routine changing a revertable object through its interface need not worry about how these changes will be backtracked later.
Using this system, we have developed a revertable library of common container classes including linked lists, stacks, queues, binary search trees, hash tables, dictionaries, sets, etc. This library benefits us as Go programmers in the same way that standard class libraries benefit other programmers, with the added benefit that we are free to implement algorithms that make arbitrarily complex changes to the state without having to worry about how the changes will be reverted.
In our program's state, there is a natural hierarchy of Go knowledge. At the bottom are the primitives of the game: the occupancy of the points of the board. From occupancy we construct higher level objects like blocks. Using rules like the keima relationship in Figure 2, we construct objects representing positional features. From blocks and relationships, we construct group objects, and so on. The existence of each object is contingent on the existence of its lower-level constituents.
Knowledge in Go is by nature time-varying: empty points are constantly being occupied, and, when there is a capture, occupied points are emptied. An incremental state update procedure must propagate the effects of playing a move up through the knowledge hierarchy. A change in a lower level object may require the modification or destruction of a higher level object. Because this dependency of one object on another is pervasive, we have built explicit support for it into our program.
We allow one object to register its dependency on another. When the depended-upon object changes, or is about to be destroyed, it sends an alert message to its wards. If a ward has different kinds of dependencies, it registers itself and requests a different alert message from each.
This system provides the programmer with two ways of handling alert messages: they can be processed immediately as they are received (asynchronous update), or processing can be postponed until an update signal is received (synchronous update). Synchronous update is efficient when any action taken before all alerts have been received will be wasted.
This system is regular and convenient to use. Each class that is depended upon by another is derived from the base class Dependency, and each class that depends on another is derived from the base class Ward. Each ward class defines a function to handle each different alert message that it will receive. Dependency classes inherit functions to register wards and issue alerts.
Explicitly representing dependencies between objects using this mechanism has helped us think clearly about the structure of our state, and has resulted in simpler, more uniform code.
As described in Section 2.0, every rule in the rulebase is of the form: if preconditions then assertions. In each state, if the preconditions hold, the assertions are added to a knowledgebase.
The assertions can be accessed by making queries against the knowledgebase. The system currently supports only conjunctive queries in which each assertion field is fully specified or left wild. For example, to query the database for all good moves for black to trap the block on point (3, 3), we would construct the following query: (good, black, *, trap, (3, 3)). The result would be a set of assertions each suggesting a move to accomplish the goal. So far this restricted query language has proved sufficient for our purposes, but could be enhanced should the need arise.
Pattern matching, like other state update, is performed incrementally. At the beginning of the game the matching rules are found and their assertions entered into the knowledgebase. Then, after each move, any rule that failed to match before but matches now has its assertions added to the knowledgebase, and any rules that no longer match have their assertions revoked.
The knowledgebase is itself a revertable object (see Section 3.1). This means that any changes made to it during reading will be automatically reverted in backtracking.
Assertions in the database may be depended upon by other objects. When an assertion is revoked, it issues alerts to its wards (as described in Section 3.2).
We have implemented the reversion system and revertable class library, truth maintenance system, rule matching system, and knowledgebase described above in 16,000 lines of C++. The graphical rule editor is 6,000 lines of C++.
On top of this software architecture, we are building a Go playing program. The program's rulebase currently contains a few hundred rules, mainly for feature recognition and move suggestion in the opening stages of the game. We ultimately expect it to contain between 5,000 and 10,000 rules.
Our program has a module that performs stylized middle game reading. This module identifies sets of candidate moves (of both colors) with related purposes as "theaters" of action. Each candidate move in a theater is evaluated by using minimax with pruning on the tree of reasonable variations (see Section 2.2). The value of a theater is the swing between the value of the best white and best black move in that theater.
We are now implementing a system for goal-directed reading (adversarial planning) and plan maintenance.
Our purpose in developing the software architecture described in this paper was to support the incremental management of knowledge in a Go program.
Using this architecture has freed us to a great extent from the complexities of ad-hoc knowledge management, and allowed us to focus our energy on developing new algorithms that more closely model expert human Go play.
[CGML 94-96] Computer Go Mailing List Archive. ftp://bsdserver.ucsf.edu/
[KCN 90] Kierulf, A., Chen, K., and Nievergelt, J. 1990. Smart Game Board and Go Explorer: A Study in Software and Knowledge engineering. Communications of the ACM, v. 33, pp. 152-167.
[Müller 95] Müller, M. 1995. Computer Go as a Sum of Local Games: An Application of Combinatorial Game Theory. Ph.D. Dissertation, Swiss Federal Institute of Technology, Zurich.