Moved from Opening systematic classification.
DougRidgway Consider the problem of indexing joseki compactly. Specifically, how about an alphabetical code AAAA, unique for each line of play, i.e. one code for each leaf in the joseki tree. Four letters is about right: 26^3 is around 17,000, not quite enough given that there are more than 20,000 joseki in print, and it's nice to have some room for additions and classification. The first two letters could key the classification, giving around six hundred possible classes of sequences, and the second two listing all possibilities in the class, giving up to about six hundred lines per class. More than that, and you have to subdivide the class. To capture every joseki in Kogo's joseki dictionary, which is I think around 6,000, you'd need to average at least ten lines per class.
The alphabetic code would indicate a line of play, not a particular position. Specific positions could be indicated by AAAAnn, where nn is the number of moves along the sequence AAAA.
Joseki aren't truly a tree, of course, but transpositions can be handled by reference, as in a paper encyclopedia or SGF file. E.g., W 3-3 invasion of B 4-4 point, followed by B tenuki, transposes to B shoulder hit of W 3-3.
If someone wants to indicate color, corner, and orientation, well, capitalization of four letters gives 4 extra bits to play with, although that can't be used in wiki titles. Complete openings could be created by listing all the joseki played.
This proposal is similar to what Tamsin was suggesting, and to me, it doesn't seem too unreasonable or unwieldy. The hard part is creating a reasonable classification and indexing a large, generally available joseki resource.
Charles Is that the hard part? I've hardly used Kogo's, but I have indexed my own trees for the taisha, for example. Personally I'm not much interested in competing here with Kogo's. It is surely now not so difficult to get searches with Kombilo or similar engines to produce a big tree from a database, covering all the frequently-seen plays. That would be a very good start to what was suggested. The particular coding problem is more like telephone numbers, isn't it? Which is managed in practice by not being too miserly with the number of digits.