Categories
Uncategorized

Review: German Puzzle Championship

This post is about the German Puzzle Championship “Logic Masters” 2021 which took place yesterday as an online event. I want to write about the various rounds as long as my memory is fresh. If you want to solve the puzzles in an unbiased way, you may want to do so before reading on.

First of all, some acknowledgements to Christoph Seeliger are in order. He was the main organizer of the event; there were a dozen puzzle authors who contributed separate rounds, and Christoph was the one who first put them together and then spent the entire day watching over the contests. This was no doubt an incredible work – many thanks to him!

Second, below you will find some critical remarks on several of the championship rounds. You may notice some correlation between my assessments and my personal results in these rounds. Although I can imagine that my results are tainting my views, I believe some of what I have to say is objectively correct. Anyway, let me start by saying that I think anyone who ended up ahead of me deserves to be there. Congratulations in particular to Ulrich – the appeals period is not over yet, but it would take an earthquake to affect his victory.

Now, to the different rounds. There were nine rounds of 45 minutes each, which I think is a lot for an online competition on a single day. I was pretty exhausted after the event, and I guess so was everybody else. I am not sure what the best volume of an online Championship is, but I think this was a bit too much; perhaps six or seven rounds would have done.

Round 1: This was a Welcome round, and a very good one in my opinion. The combination of puzzles was excellent (some well-known styles as well as a few unfamiliar hybrids), covering most branches of the taxonomy tree. Plus, the difficulty of the puzzles varied to a reasonable degree, without any actual rocks to crush. This is exactly what I believe a Welcome round should look like. I skipped three intermediate puzzles but managed to solve the rest.

Round 2: An Instructionless round. I had some reservations beforehand, although it turned out the examples I dealt with were quite clear regarding the rules. And, once again, the difficulty level was pleasant; I left out one of the four groups (one of the cheaper groups, consisting of four puzzles) and finished the rest with precisely one second left on the clock. Booyah.

Round 3: This is where the trouble started for me. The round featured puzzle combinations: There were four groups of three puzzles each, and within each group there were two basic styles and then a hybrid of the other two. Here I made two expensive mistakes. In the Anglers/Nurikabe hybrid, I stumbled upon my own notation and, having erased parts of the solution earlier, mistook a fishing line for a few mere water cells. My solution for the (basic) Summon looked fine at first glance, and it took me several minutes afterwards to spot the rule violation, but in the end it was genuinely incorrect.

Anyway, these errors were on me. My main issue with this puzzle round is about balance – something I intend to write about in a future article, but bear with me for now. Most of the basic puzzles were very cheap, and the hybrids were disproportionately expensive. The six cheaper puzzles in this round would come to a total of 80 points, which is half the value of the Star Battle/Summon hybrid. The Masyu/Skyscrapers hybrid was worth 100 points, and the two mistakes I made cost me 130 points. I solved four of the cheap puzzles but obviously could have spent the time better.

Another thought: It is my position that combinations like these make good puzzles if the components somehow interact. The Masyu/Skyscrapers hybrid I solved felt like two separate puzzles. I started drawing the Masyu loop (with only a little input from the Skyscrapers component), and at some point I decided to close the loop as fast as possible, in order to leave as much space as possible for the number entries. This led to an irregular Skyscrapers grid, with not even much Skyscrapers logic behind it. I would not call this puzzle a disappointment, but it did not feel like an actual hybrid either.

Round 4: The first of three Assorted round, and it was a nightmare. First of all, it was highly unbalanced as well. With the experience from the previous round I decided to skip the cheaper puzzles entirely and focus on the hard ones. It might have been a good strategy on another day, but this time I broke the Cross Sums puzzle and ended up with only two completed puzzles.

Mind you, I could have approached the round differently. However, apart from Ulrich, barely anyone completed more than half the puzzles, even among the top solvers, which I consider a bad sign. This round was, in my humble opinion, completely unsuitable for a 45 minutes time window. Unless one is a multiple world champion, one can either ignore the harder puzzles from the start and accept travelling in the second class car of participants, or one can have a go at one of the big targets and risk getting stuck there.

I spent ages with the Even/Odd Japanese Sums, but what really put me off was the Cross Sums. Enter numbers from 2 to 8 – seriously? Why not from 1 to 7? That would effectively yield the same puzzle. It appeared to be designed specifically to confuse the solvers. Well, it worked, at least in my case. After three attempts, I was stuck in a contradiction with not enough time to get out. In the last few minutes of my window I tried to guess the Master Word solution in frustration, but with no success either.

Round 5: Another Assorted round, but it came after a lunch break, which was good because I felt I was close to a heart attack after the disaster from the previous round(s). This round still felt a little overloaded (I solved eight out sixteen puzzles and made a stupid mistake with the Stations, overlooking one of the solution code columns), but at least it was a decent assortment of puzzle styles. Like the Welcome round, it contained some classics and some unfamiliar types, so that I got to pick the puzzles I liked most.

Round 6: This round, labelled “Divide and Conquer”, had an interesting concept: In each puzzle, the clues had to be split up into two separate grids, which then had to be solved independently. I did not have enough time to do the Shakashaka (well, I was not going to anyway), and I broke the Masyu, but the puzzles themselves were great; each of the ones I tried had a nice flow. I often started using intuition near the end because of the time limit, but I still enjoyed the puzzles very much.

Round 7: The third Assorted round of the event, and another experience I would rather forget. To be fair, I did not like most of the puzzle types featured in the round, which is obviously not the author’s fault. In particular, the second half consisted entirely of loop puzzles which I had rarely solved in the past, and I was never good at Coral puzzles, either.

Still, there was again some of the “overdoing” I mentioned before. There were two expensive Japanese Sums variations with wildcard clues and an unknown range of numbers to enter. One of the two rule changes would make for an interesting puzzle, but both at the same time? And two of these puzzles? That was entirely unnecessary in my opinion, apparently to make the round as tricky as possible.

I never understood why authors try to make an already difficult puzzle exceptionally hard by replacing most of the clues with question marks, just to keep the given information at an absolute minimum. Puzzles can be nice even if there is a redundant clue somewhere, you know. I started off with the first of these two Japanese Sums variations, but after a lot a sweat I decided not to try the second at all. Instead I spent an age solving the Coral, then misread the solution code instructions, miscounted a large group of unshaded cells and watched another 100 points float away. And finally I broke the Roller Coaster puzzle. Not my finest hour.

Round 8: Geisterbahn! (For some reason, I keep hearing the word with the tune of the Spider-Man theme in my head, which is not beneficial for the solving, I can assure you.) I guess I am pretty much the only participant – at least in the upper section of the table – never to have solved a Geisterbahn round under contest conditions. I tried to familiarize myself with the potential rules beforehand, yet I made a silly mistake early into the round.

In the end, the round felt overloaded, too. Eleven puzzles with instructions that are essentially unknown before the round begins are simply too much in a 45 minutes window. Perhaps 60 or even 90 minutes? Clearly this was not an option with the championship day already as stocked as it was, hence one could question the round selection for the event as a whole. Anyway, I ended up with three solved puzzles (two actual “puzzles” plus a filler, to relax the mind a little). But Ulrich reached the same score with eight puzzles, including a mistake in one of them. Again, this feels wrong in terms of balance.

Round 9: An interesting concept, consisting of six times two puzzles where the second puzzle in each set requires some input from the first. I got my hands on less than half the puzzles, but this time it was on me again. First I had a mishap transferring the clues from the Easy As Puzzle to the ABC Box. This cost me a lot of time; fortunately, I spotted my mistake in time. And second I made a computational error in the Japanese Sums which I could not fix. Instead I jumped to the Yin and Yang puzzle and managed to save at least 10 more points.

These mistakes near the end can be explained by the exhaustion after a full day of competition. When it comes to such tendencies in the results over the different rounds, it appears I am not the only one. The scoring tables show a decline in the number of solved puzzles approaching both the lunch break and the end of the day. This suggests that, except perhaps for a handful of participants at the very top, the event was a bit longer and harder than it should have been.

To summarize: My favourites were the rounds 1, 2 and 6, followed by 5 and 9. On the other end of the spectrum, I did not like the rounds 4 and 7 at all (for the reasons I gave above). The Championship as a whole was too extensive for my taste. As such, it joins a growing list of contests which I feel are over the top, such as past championships, Grand Prix rounds, etc. Of course, as mentioned at the beginning, my view may be tainted by my own results.

There is always feedback to the effect that people now have a large amount of puzzle material left which they can enjoy in the following weeks. To be honest, this is not my way of thinking. When there is a contest, the proper amount of puzzles should me determined by the extent of the contest, not the time after it.

I am under the impression that people are slowly forgetting that puzzles are ultimately there for the solvers. Sometimes I think authors are so excited by their creations that they forget to wonder whether they are still suitable for the audience. This is something we can see in the Puzzle Portal as well; we should keep in mind, though, that the Portal platform serves a totally different purpose. When it comes to puzzle events, the current trend bothers me, and I hope we can stop it.

It is said that one is either part of the problem or part of the solution (or just part of the landscape). I should not exclude myself in any of this, because I know that I have frequently published more and harder puzzles than necessary in the past – yes, including championship rounds. I am not a championship author in the near future, although I have volunteered to design the puzzles for next year’s qualifier. Thus time will tell if I can live up to my own expectations and satisfy the demands I am setting myself.

There is another possibility to consider. Maybe I am overly sensitive in this regard, and the mainstream view is that contests are fine the way they are, with people solving just 30% of the puzzles on average, and all that. If this is the case, I am simply no longer up for it. I used to participate in such events with a certain ambition, and the results of the recent years have already shown that I can no longer compete with the top of the world – regarding both pure solving skills and endurance in larger events.

If I was, say, a professional Snooker player, I would probably hold a press conference now, announcing my retirement. This would be an overreaction in every regard, of course. In my previous post I said that my enthusiasm for competitive puzzle solving is back, and yesterday’s championship put a damper on it. But rash decisions are always a bad idea, and I think the best thing to do is take another break and see how things unfold. So long.

6 replies on “Review: German Puzzle Championship”

Hi Roland,

thanks for your thoughts, I will try to reply to some things as long as my memory is still fresh. Not going into details of some rounds, since I think we disagree even more there and opinions might not be totally independent from your solving experience and performance. (Especially sorry for the 100 point loss for the coral. I really thought hard about what to do there. I accepted similar codes where people at least counted right, but if there is a wrong number, it is no longer 100% clear for me, if you miscounted or if there is a local mistake in your solution.)

First, it seems like we disagree about the amount of puzzles that a typical contest round should have. For my part, I don’t like the trend set in recent years GP rounds, where top solvers finish in about 60 minutes with 30 minutes to spare. This gives some people the luxury of a totally different approach, because they can almost be sure to finish and don’t have to worry about time management, puzzle selection and such things, and I don’t like that. (On the other hand, they have to do everything, even puzzle types they are very uncomfortable with.)

So the goal was to make the rounds barely finishable and from the test solving times, this was not unreasonable for the rounds with low point totals. Furthermore, I gave a very clear indication, how to read the point values in the instruction booklet. Something like that is typically not done before championships, but was imho necessary for fairness, since I revealed a lot of the times to authors, when we discussed the rounds. From reading your round descriptions and points, you got close to where I expected you, if you did not screw something up. I expected most of the top solvers to get to the conclusion that they should start with the high pointers, since it is otherwise likely that they will have one hard puzzle left and not enough time to do it. Maybe I’m also overestimating the amount of preparation and round pre-planning that happens, since I never have to care about that. And maybe I also underestimated the typical amount of time people lose to mistakes, which is not included, if you calculate a sum of testsolving times.

I might have also overestimated the amount of exceptional top performances I have regulary seen at championships. Maybe this just doesn’t happen that often in online contests. I expected more things to happen like the 28 minutes solve of Round 6 by Freddie Hand, where nobody else finished. He was not exhausted though and maybe this is only possible with some “all-in” approach that can backfire badly, unless you are in a very specific situation (like fighting for a playoff spot in the last round). And thats the range, where time bonus points start to get really unfair, especially with the risk to lose them all.

Altogether, I don’t think too many puzzles are that bad, as long as the top people still have to choose, what few low-mid pointers to leave out instead of what to solve. That some rounds felt like they had much to much puzzles was not planned and maybe also exhaustion kicked in. I agree that the championship was maybe one or two rounds too long and continous 45 minute rounds were more exhausting than I expected. Especially the drop in the normalization point value in round 7 and 8 came really unexpected.

Regarding hard puzzles, I agree that everything above 100 points (that should be 10-12 minutes for you and 30 minutes for me) is maybe too hard to put into a 45 minute round. But in most cases the hard puzzles that remain where just an attempt to cut the losses. With the very limited time (not even five weeks from where I have seen most of the puzzles for the first time until the championship), it was just not possible to tell an author “please make a new round”. We already scratched some super hard puzzles (like the Laser without given crossings that Eva put into the portal) and made replacements (thats how the second Japanese sums happened). In some more cases, I also suggested to make changes or add clues (even which ones), but only got “I rather make a new puzzle” as an answer and the next puzzle was just as hard. I did not overrule that and change the puzzle myself, if we couldn’t get an agreement. Maybe I should have done that in some cases, but there is always the risk of discouraging people forever and I’m really happy that we have such a big pool of authors that contribute to championships and other contests. But if you have so many people, it is not possible they have the same opinions and experience, regarding what is good and bad for a contest, puzzle selection, difficulty and so on. Some don’t even care, they are happy to just write some puzzles that will get used in a championship.

I really hope this contest doesn’t drove you into a longer break and you can enjoy the multitude of leftovers.

Thank you for your feedback, Christoph. You are making many fair points; let me go over some of them again.

First of all, there is no need to be sorry for the Coral (and other puzzles) I screwed up. On the contrary, I think your approach regarding manual score adjustments is spot-on, and we should try to keep the standards you are setting here (i.e. accept entries which use a clear alternative notation, such as X’s and O’s swapped, but not otherwise). In the Coral case, I had several layers of pencil/pen solutions on my sheet, so it might or might not have been accepted by a scorer in a real-life event. Like I said, those mistakes are on me.

Next, the perfect size for a contest: Clearly this is where opinions collide. Personally, I prefer contests (and rounds) where a significant number of participants has a reasonable chance of finishing. There are various reasons for this, such as the feeling of success for weaker solvers, a lack of incentive to solve the remaining puzzles afterwards, etc. Of course, either view has its justification, and this is just my personal taste. Incidentally, I have had a similar exchange with Rainer after the qualifier.

The mere amount of puzzles is not the only factor. There are certain complex puzzles which have the potential to “fix” the solvers, for example if they include bottleneck solving steps or otherwise tricky situations, where it is particularly tempting to go wrong. By their nature, these puzzles create a much higher variance, and if the total difficulty is already beyond a typical solver’s scope, it adds to the risk that only a very small percentage of the puzzle is solved. I have seen some puzzles of this kind yesterday, even several in a single round, which I usually try to avoid.

The danger I mentioned in the previous paragraph is even higher if the puzzle type is unfamiliar, perhaps a variant or hybrid of some sort. And it turns out that many authors are trying to impress with exceptional constructions (no offense, I am one of them myself). Perhaps I am just missing old-fashioned assorted rounds which feature only basic puzzle styles, like a Sudoku, a Kakuro, a Slitherlink, a Cave, etc. – you get the idea. We don’t have many of these any more. Today’s rounds often include stuff thats sounds as if it comes from another planet: an Innen-Knapp-Daneben-Aussen-Beruehrungs-Rundweg, Coded Anti-Knight Japanese Sums with zeroes, or – Ulrich’s favourite example – a “Standard JaTaHoKu”.

Referring to difficulty once more, I take it that the test solving gave you a good deal of guidance beforehand. A recommendation I am often trying to spread is this: The feedback of test solvers should cover much more than just raw solving times. For example, if the solving leads to discoveries of either shortcuts or bottlenecks, it should be included at any cost. This is especially important if the test solvers come up with either very good or very poor results. If some kind of guesswork was involved, it should also be included in the feedback. All this helps assess the “real” difficulty of the puzzles.

Finally, you are certainly right about discouraging authors. (This has frequently been an issue.) If in addition, as it happened this year, the entire work had to be done under enormous time pressure, the efforts of the contributing parties must be applauded even more.

What I would like to point out, though, is that critical feedback is important. For some time, people used to make only positive noises after an event, in a few cases using all the extremes they could think of (“Great! How wonderful!”, etc.) which I consider counter-productive. It gives the impression that everything is perfect and nothing has to be changed. Sometimes I have the feeling that negative opinions are highly unwelcome, so nobody expresses them. But I am not trying to trash the entire Championship; there were many rounds that I enjoyed.

Speaking about what went bad (or less than ideal) is the only way to do it better next time. I also don’t feel comfortable with extreme praises and no constructive feedback. Overall, it went pretty smooth and there were some things that can rightfully be criticized, just like every year before.

I generally don’t pay much attention, how narrow the solving path is, if I don’t get stuck. I should keep that in mind for further contests. But I construct my own puzzles almost always with a linear path and if there are multiple ways to proceed, it’s never on purpose. My experience with general puzzle construction is still limited though (I don’t think it’s more than 30 non-sudokus total and a lot of that are still latin squares.)

However, I think it’s asking a bit too much to get such feedback from testsolvers. Maybe having Ulrich as brother helps with getting valuable feedback for puzzle contests (as it does help Ulrich to have you), but good testers for puzzle contests are incredibly hard to find (much harder than for Sudoku contests).
While I was quite happy with a few of the people who helped me, there were again a lot of disappointments, just as I tried this the last time in 2014. Some don’t solve half the round and the times you get are highly inconsistent and don’t make any sense compared to other people. Or you get great feedback for a first round and then you never hear again from them (and as far as I remember I always said Please and Thank you in each mail.) Getting valuable feedback beside the pure times is just the icing on the cake and more than you can ask from 90% of the people you get, when you ask for testsolvers somewhere. (Once again, I’m really grateful for the people who solved all or most of the puzzles I send them. With the time limits we had, a lot of what they got was a draft at best and they also had to be pretty fast.)

Disclaimer: I did not participate in the German Puzzle Championship as it started in the middle of the night for me, so I can’t personally speak to the contents of those sets.
My sweet spot for a competition length is between what you and Christoph said above- top solvers should have a chance to finish, but with very little time to spare. Some of the recent GP rounds have hit this mark, others haven’t from having some heavily overvalued puzzles- the 100 point Double Choco from 2021GP R5 comes to mind, as I solved it in 2 minutes.

As for testsolving and feedback, I recently (well, it’s still ongoing for another day) ran a puzzle contest of my own, and the testsolving was done by a few non-contest puzzlers, and myself resolving on paper while determining solution code rows. For a 2 hour contest, one of my test solvers took over 10 hours to solve everything- but those timings and comments were still useful about likely sticking points and how error prone certain puzzles were. It’s difficult, but certainly not impossible to balance things given enough time. The hard part is in finding that time.

It must surely be said that the right length for a contest depends both on the audience and the purpose. At the WPC, for instance, I think it’s right that most rounds are not finishable or barely finishable, because otherwise the primary discriminator for top solvers is endurance and the ability to not make mistakes under pressure. While valuable skills, I don’t want them to be the main factor in deciding a WPC podium.

I think I’ve read this post a few times now, and had various thoughts about it (none of which are hugely original) but it’s taken me this long to gather them…

Firstly, I have never liked the idea that puzzle selection should play a large part in determining the winners of contests; if nothing else it feels like a more robust comparison between competitors if they are all solving much the same thing. Of course I note that individual competitors will have their own preferences and relative strengths/weaknesses. Still, that’s largely a matter of taste and I don’t think that there shouldn’t be any contests like that.

However, my second observation would be that if puzzle choice does come into things, then there is always the risk that your testing isn’t fully consistent and you end up with “easy points” on offer, particularly when these easy points come attached in large quantity to a single puzzle or two amongst many. Now, this will be inevitably be true for at least some individual solvers, who after all have their own individual preferences and relative strengths/weaknesses, but it may also be more widely true for larger groups of solvers. It’s there that you get into the territory of allowing serendipity to potentially determine your champion, and that’s where you need to be careful. Not necessarily because it means that you allow someone to fluke a title they didn’t otherwise deserve, but more likely because you are ruling out people who had the ability to be serious contenders had they not had the misfortune to step on a puzzling land-mine.

The third observation is contrary to the second, and that is to say that harder puzzles with more points attached are always going to be higher variance than smaller puzzles. I don’t think I have a satisfactory answer to the problem of having an objectively hard puzzle with which to prove uniqueness where nevertheless there are convenient places to guess and get at least one solution out – maybe you have to prove uniqueness separately and tell your testers to go no holds barred at the puzzle and get an answer out as quickly as possible. Still, it doesn’t quite sit right with me to have a 50 point puzzle which solves smoothly alongside another 50 point puzzle that warrants that valuation based on the right guess and which otherwise is more like a 100 point puzzle.

I suppose one answer to a related problem is to make sure that you don’t have a single puzzle that dominates the points in a single round, and instead you offer a selection of puzzles with a fairly narrow distribution of points. That way you aren’t trapping solvers into an unpleasant sunk costs dilemma.

Finally, I share your lament about “Standard JaTaHoKu”, and also wish that there were a few more basic styles in contests these days. I see a lot of innovation for the sake of innovation these days, with many examples of combinations that having solved once, I’d be pleased never to encounter a second time.

Leave a Reply

Your email address will not be published. Required fields are marked *