Roy Dyckhoff, University of St Andrews, 30 June 1997

We propose a small collection of problem classes, mainly gathered from other sources, for benchmarking theorem provers for intuitionistic propositional logic. Our approach is based on that of Heuerding et al [HS], in proposing not just certain particular formulae but classes of formulae that will scale up for faster machines and better techniques. Nonprovable formulae are used as well as provable ones, to ensure that provers tuned to spot positive answers have some work to do.

The 6 classes we include are the de Bruijn formulae, the pigeonhole formulae, some examples of Korn & Kreitz, equivalences, formulae needing many contractions and formulae with big normal NJ proofs. Code for these is given both in Prolog and in the LWB language.

[HS] has proposed a suite of benchmarks for several propositional modal logics, using a method which is intended to allow the comparison of provers even when faster machines and better techniques make old benchmarks obsolete. Our purpose here is to apply the same method to develop a benchmark suite for intuitionistic propositional logic, providing data that will allow a rough comparison of several techniques.

One intended use of the benchmarks is for a comparison of submissions of results by others, in association with the Tableaux'98 meeting. For details of this, see here, from which address the present report is also available in electronic form.

Propositional logic is of course of less interest than first-order logic, which should be the subject of a future report. (We think in fact that for this logic, proof enumeration rather than just provability are required.) Time efficiency is not the only criterion that should be used; space efficiency and verifiable correctness are equally important issues. It is even arguable that space is more important than time, since it is harder to add RAM to a machine just when the theorem prover runs out of memory than to allow extra time. The present benchmarks aim just to address the time complexity issue.

Some of the benchmarks appear also in [Fr2], to which source we are indebted. There, however, the organisation was (a) a wider range of benchmarks; (b) a special interest in experimenting with the speedups, if any, gained with a parallel algorithm and (c) little consideration of the range of values n for which the n^th formula could be quickly decided. The same work comments that "Experience suggests that decision procedures for intuitionistic propositional logic are very much more sensitive to the exact formulation of a problem than classical algorithms, and it is probably difficult to justify the choice of any particular set of benchmarks". We shall not attempt to do so either. These benchmarks differ from those of [Fr3], devoted to first-order logic.

For simplicity we do not repeat ideas in [H] discussing and explaining the methodology. In summary, we allow

- unprovable as well as provable formulae
- formulae of various kinds
- arbitrarily hard formulae

and require

- that for each formula, whether or not it is provable is already known
- that simple tricks do not help to solve the problems
- that benchmarking of a prover takes not too much time
- that the results can be summarised concisely.

We add one criterion: that the unprovable formulae should all be provable in classical logic, otherwise it is simple just to preprocess the formulae with a classical decision procedure. Some modifications with double negation or excluded middle are used to make the unprovable formulae classically provable if necessary. Of course, in the middle of a proof search, a prover may spot that a subgoal formula is classically unsolvable (or even that it is classically true and allows appeal to Glivenko's theorem): this is not regarded as a simple trick and is allowed.

We make no suggestion that the problem classes here really arise in practice. (Practice really requires a first-order language). We are aware, for example, of some real-world applications of provers for this logic:

- [BB] (where a much less complex fragment of the logic suffices);
- [Coq] (where we have no data on the kind of problems tackled; an IPL decision procedure is embedded as an 'Auto' tactic)
- Some work on counting constructive proofs of formulae as evidence in an expert system.
- Work by Mendler et al on intuitionistic lax logic applied to hardware design.
- The work of Tyugu and Mints [TM] in the 1980s.
- [NuPRL] and other proof assistants based on constructive type theory.

We will describe in [D97] a theorem prover based on the "contraction-free" approach of [D92] (also developed by Hudelmaier); this is effective at deciding simple problems and some of the harder problems. It is intended just as a decision procedure for propositional logic (rather than as e.g. a proof search procedure). The same ideas are implemented in a number of other systems, such as Coq and the LWB [LWB]. See [D92] for the history of this approach, which goes back to 1950 work in the Soviet Union. Details are available here.

We have tried to find problem classes that are hard for this approach with little feeling for what classes are hard for other approaches. Some classes turn out to be easy; but maybe they are hard for other systems. Hopefully, others will experiment with the problem classes below and report their results.

We are grateful to Alain Heuerding, Jacob Howe and Daniel Korn for their comments and suggestions, to Torkel Franzen for permission to quote freely from his work of 1989 and to Didier Galmiche and University of Nancy 1 for a position allowing work on this topic. The report's inadequacies, however, are the author's.

Atomic formulae P, Q, ... are of almost any form acceptable unquoted as a Prolog non-variable term, other than the compound formulae defined below. Thus 'p(123)', 'a(57)', 'p456', 'p', 'abc', 'a', 'xyZ' are all atoms (without the quotes). In practice we just use roughly the first three kinds.

The three Prolog atoms 'true', 'false' and 'v' are not allowed as atomic formulae; they are logical constants (i.e. logical connectives, of arity 0, 0 and 2 respectively).

Formulae A, B, C, .... are inductively defined from the atoms by the rules:

A ::= P | true | false | A & B | A v B | A `->`

B | ~A | A `<->`

B

for atoms, conjunctions, disjunctions, negations, implications
and equivalences respectively. Parentheses are used to disambiguate
the structure where necessary. '&' and 'v' are left associative;
'`->`

' is right associative; '`<->`

'
is non-associative. We avoid ~~A and A&~B etc because such
combinations would not be correctly read by a Prolog interpreter.
Thus, spaces are required to separate any two connectives.

This syntax is in accordance with that used in the LWB [LWB].

In this section we use

&&_range to indicate a multiple conjunction

and vv_range to indicate a multiple disjunction

We follow the practice of [HS] in giving in each section two classes, e.g. C_p(n) and C_n(n) for the "n^th provable formula" and the "n^th not provable formula" in class C.

de Bruijn [dB] has given an example of a formula (our "debruijn_p(1)")

((b`<->`

c)`->`

(a&b&c))
& ((c`<->`

a)`->`

(a&b&c))
& ((a`<->`

b)`->`

(a&b&c))
`->`

(a&b&c)

which is intuitionistically provable, but is (intended as) quite a tough exercise for students to prove by natural deduction. Variations of it with 2n+1 (for n >= 1) variables form a natural class of provable benchmark formulae. The variations with an even number of variables are classicallly nonprovable; we therefore make these intuitionistically nonprovable (but classically provable) by using an instance of the classical axiom of excluded middle.

Thus,

de_bruijn_p(n) == LHS(2*n+1) `->`

RHS(2*n+1)

de_bruijn_n(n) == LHS(2*n) `->`

(p0 v RHS(2*n) v
~p0)

RHS(m) = &&_{i=1..m} p(i)

LHS(m) = &&_{i=1..m} ((p(i)`<->`

p(i+1))
`->`

c(n))

where addition is computed modulo m.

The n^th provable pigeonhole formula "ph_p(n)" says that if n+1 pigeons occupy n holes then at least one hole contains two pigeons.

occ(p,h) says that the p^th pigeon is occupying the h^th hole

ph_p(n) =def left(n) `->`

right(n)

left(n) =def &&_{p=1..n+1} (vv_{j=1,..n} occ(i,j) )

right(n) =def vv_{h=1..n, p1=1..{n+1}, p2={p1+1}..{n+1}} s(i1,i2,j)

s(p1,p2,h) =def occ(p1,h) & occ(p2,h)

left(n) says that every pigeon p is in some hole h

right(n) says that for some hole h there are (with p1 `<`

p2) two pigeons, p1 and p2, in hole h

s(p1,p2, h) says that pigeons p1 and p2 are sharing hole h

The n^th nonprovable pigeonhole formula "ph_n(n)" says roughly that if n+1 pigeons occupy n holes then at least one hole contains two pigeons; this is made intuitionistically unprovable by use of double negation.

ph_n(n) =def left_n(n) `->`

right(n)

left_n(n) =def &&_{p=1..{n+1}} (vv_{h=1..n} occ_n(p,h)
)

This says that for every pigeon p there is a hole h for which
pigeon p is not absent from hole h. In fact, the formulae are
then easily refuted. So, one just applies DN to the last of the
disjuncts occ(p,h), i.e; when h=n.

occ_n(p,h) is like occ(p,h), with a DN modification as just described.

Other formulae are as before.

Franzen [Fr] gave examples of formulae "fr(n)" whose proof in LJ needs n contractions:

fr(1) is ~ ~(~p1 v p1)

fr(3) is ~ ~(~p1 v ~p2 v ~p3 v (p1 & p2 & p3))

of which the first is one of the standard easy examples illustrating
the importance of contraction for this logic. (Another example,
in the implicational fragment, is "((((p`->`

q)`->`

p)`->`

p)`->`

q)`->`

q".)

Glivenko's theorem easily disposes however of all these formulae.
We therefore avoid use of negation, replacing every ~A by A`->`

f
for a new atom f. We also change the order of the terms so that
a system that blindly takes the first item in a disjunction to
prove will have a hard time. Thus,

con_p(2) is (((p1 & p2) v (p1`->`

f) v (p2`->`

f))`->`

f)`->`

f

Formulae of this class are easy for a prover using appropriate techniques to handle contraction. The class is included to test provers that either ignore the contraction problem or handle it with a depth count or a loop-check.

Easy modification (using DN) of the class yield unprovable formulae con_n(n) where the search in LJ still has to take the possibility of n contractions into account:

con_n(2) is (((p1 & p2) v (~ ~p1`->`

f) v (p2`->`

f))
`->`

f) `->`

f

which are significantly harder to decide than the formulae con_p(n).

Schwichtenberg [Sch] mentions a class of provable formulae so that the n^th formula has no normal natural deduction proof of size less than an expontial function of n.

Our experience of these problems is that they can be decided very fast but can generate space problems, e.g. for some implementations of Prolog.

We use DN at one place to generate a variant class of unprovable formulae.

schwicht_p(n) =def antp(n) `->`

p(0)

antp(n) =def p(n) & &&_{i=1..n} (p(i) `->`

p(i) `->`

p(i-1))

schwicht_n(n) =def antn(n) `->`

p(0)

antn(n) =def (~ ~p(n)) & &&_{i=1..n} (p(i) `->`

p(i) `->`

p(i-1))

Korn and Kreitz proposed [KK] a class of formulae which can be proved "essentially by classical reasoning" and thus "even compared to the contraction-free proof systems presented in [D92] we achieve an O(n!) reduction". Their n^th formula is

~( ~a(0) & (( b(n) `->`

b(0) `->`

a(n) & ( &&_{i=1..n} (( b(i-1) `->`

a(i))
`->`

a(i-1)) ) )

We propose here a minor modification kk_p(n) of these formulae
so that classical reasoning (justified by Glivenko's theorem)
is not obviously possible. This replaces each negation ~A by A`->`

f,
for a new atom 'f', with no left-rule for 'f': thus the proof
search cannot exploit GlivenkoÕs theorem, which is valid
for intuitionistic but not for minimal logic. ([KK]
exploits this theorem.)

The speed of a prover on this example may depend significantly on the order in which antecedent formulae are stored (ie. the order in which they are extracted from the formula). We therefore generate two versions (one for each order) and conjoin them together to get the formulae kk_p(n).

We also propose a variation (suggested by Korn) to generate unprovable formulae kk_n(n), but fixed, using double negation, to ensure that they are classically provable.

These (modified) formulae (both provable qnd unprovable) seem to us to capture the essence of the difficulties of root-first sequent calculus-based proof search for IPL, with the generation of lists of nested implications from which just one has to be selected.

Our own decision procedure [D97] includes a heuristic which orders all nested implications by a modest lookahead, and thus allows quick discovery of proofs of the provable formulae kk_p(n). Such a method fails of course for the unprovable formulae. We considered complicating the provable formulae by adding extra subformulae 'true' to complicate the task of such heuristics; but our scheme of considering also unprovable formulae (for which ALL nested implications may have to be looked at rather than just the most promising) renders this superfluous.

Formulae expressing the associativity of bi-implication are notorious examples for (a) not always being intuitionistically provable and (b) causing difficulties even to classical theorem-provers.

The class of provable formulae equiv_p(n) is defined so that

equiv_p(2) is (a(1)`<->`

a(2))`<->`

(a(2)`<->`

a(1))

equiv_p(3) is ((a(1)`<->`

a(2))`<->`

a(3))`<->`

(a(3)`<->`

(a(2)`<->`

a(1)))

and again we generate nonprovable formulae equiv_n(n) with the use of DN.

There are many other possibilities, using odd parenthesisations and permutations of the order other than our simple reverse. The above two classes together seem to us to be adequate.

Prover: We used a Prolog implementation [D97] of the contraction-free calculus of [D92], compiled under SICSTUS Prolog 2.1 to generate "fastcode".

Availability: see next section.

Hardware: SUN SPARCStation 10, main memory 128 MB, 1 CPU, 80
MHz (?)

Timing: The timing does not include the parsing of the formulas
but includes all preprocessing (basically, just ensuring that
all atoms are easily recognised as such).

To time an example, one typically does

[load].

de_bruijn(3, A), time(A, T).

which succeeds with T as the time (in msec); the result (true/false) is written to standard output.

Results: in each case we give the largest n for which class(n) is solvable in 100 seconds.

WATCH This Space...

class class ------------------------------------------ de_bruijn_p 43 de_bruijn_n XXX ph_p XXX ph_n XXX kk_p XXX kk_n XXX con_p XXX con_n XXX schwicht_p XXX schwicht_n XXX equiv_p XXX equiv_n XXX ------------------------------------------

LWB code and Prolog code generating all the formulae classes is available in the Appendix to this document, which is itself available on the WWW at

"http://www.dcs.st-and.ac.uk/~rd/logic/ipcfmls.html".

The Prolog code for our own theorem prover, LJT.pl, is available at

"http://www.dcs.st-and.ac.uk/~rd/logic/nonmac/"

as is the code for a less refined implementation (not discussed here) based on a multi-succedent calculus: this explicitly generates "refutations" (in the form of Kripke counter-models) for non-theorems.

The Prolog code compiles on a SUN under SICSTUS Prolog 2.1. It also compiles on a Power Macintosh under LPA MacProlog32; a slight modification is required at the top level to measure the run time.

We have carried further the work of [HS] and perhaps of [Fr] in benchmarking non-classical logics. We draw no conclusions about the relative efficiency of our own theorem prover, which will be documented further elsewhere [D97] (our interest therein now being not in its efficiency but its verifiability).

- [BB]

B. Bennett, Spatial Reasoning with Propositional Logics in Doyle, J, Sandewall, E & Torasso, P (editors), Principles of Knowledge Representation and Reasoning: Proceedings of the Fourth International conference (KR94), Morgan Kaufmann, San Francisco, pp.51-62 - [Coq]

The Coq system, details here - [dB]

de Bruijn, N: personal communication in about 1990. - [D92]

R. Dyckhoff, Contraction-free sequent calculi for intuitionistic logic, J. Symbolic Logic, 1992. - [D97]

R. Dyckhoff, Verifiable implementation of a fast intuitionistic propositional logic decision procedure, University of St Andrews Research Report (in preparation). - [Fr1]

T. Franzen, Algorithmic Aspects of intuitionist propositional logic, SICS Research Report R87010B, 1988. - [Fr2]

T. Franzen, Algorithmic Aspects of intuitionist propositional logic II, SICS Research Report R-89/89006, 1989. - [HS]

A. Heuerding & S. Schwendimann, A benchmark method for the propositional modal logics K, KT, S4; Technical report IAM-96-015, University of Bern, October 1996; also available from [LWB]. - [KK]

D. Korn & C. Kreitz, A constructively adequate refutation system for intuitionistic logic, position paper at Tableaux'97, available in Research Report CRIN 97-R-030, University of Nancy 1, Nancy, France or from korn@informatik.th-darmstadt.de. - [H]

J. Howe, Two loop detection mechanisms: a comparison; in Springer LNAI 1227 (1997). - [LWB]

Logic WorkBench documentation, details here. - [NuPRL]

NuPRL - [Sch]

H. Schwichtenberg, Termination of permutative conversions in Gentzen's sequent calculus, unpublished (1997). - [Sh]

N. Shankar, Proof search in the intuitionistic sequent calculus, Springer LNCS 607 (1992). - [SFH]

Sahlin, Franzen and Haridi, An intuitionistic predicate logic theorem prover, Journal of Logic & Computation 2 (1992). - [T]

T. Tammet, A resolution theorem prover for intuitionistic logic, paper (not implementation) available here. - [TM]

Tyugu & Mints.

########################################################################### ########################################################################### ########################################################################### ########################################################################### ########################################################################### 8 Appendix: Formulas generated with the LWB or by Prolog ########################################################################### ########################################################################### ########################################################################### ################################################# 8.1 de Bruijn formulae ################################################# #de_bruijn_p(2) true & ((p(1) <-> p(2)) -> p(1) & p(2) & p(3) & p(4) & p(5)) & ((p(2) <-> p(3)) -> p(1) & p(2) & p(3) & p(4) & p(5)) & ((p(3) <-> p(4)) -> p(1) & p(2) & p(3) & p(4) & p(5)) & ((p(4) <-> p(5)) -> p(1) & p(2) & p(3) & p(4) & p(5)) & ((p(5) <-> p(1)) -> p(1) & p(2) & p(3) & p(4) & p(5)) -> p(1) & p(2) & p(3) & p(4) & p(5) #de_bruijn_n(2) true & ((p(1) <-> p(2)) -> p(1) & p(2) & p(3) & p(4)) & ((p(2) <-> p(3)) -> p(1) & p(2) & p(3) & p(4)) & ((p(3) <-> p(4)) -> p(1) & p(2) & p(3) & p(4)) & ((p(4) <-> p(1)) -> p(1) & p(2) & p(3) & p(4)) -> p(0) v p(1) & p(2) & p(3) & p(4) v ~p(0) ################################################# ################################################# 8.2 Pigeonhole formulae ################################################# #ph_p(3) (o(1, 1) v o(1, 2) v o(1, 3)) & (o(2, 1) v o(2, 2) v o(2, 3)) & (o(3, 1) v o(3, 2) v o(3, 3)) & (o(4, 1) v o(4, 2) v o(4, 3)) -> o(1, 1) & o(2, 1) v o(1, 1) & o(3, 1) v o(1, 1) & o(4, 1) v o(2, 1) & o(3, 1) v o(2, 1) & o(4, 1) v o(3, 1) & o(4, 1) v o(1, 2) & o(2, 2) v o(1, 2) & o(3, 2) v o(1, 2) & o(4, 2) v o(2, 2) & o(3, 2) v o(2, 2) & o(4, 2) v o(3, 2) & o(4, 2) v o(1, 3) & o(2, 3) v o(1, 3) & o(3, 3) v o(1, 3) & o(4, 3) v o(2, 3) & o(3, 3) v o(2, 3) & o(4, 3) v o(3, 3) & o(4, 3) #ph_n(2) (o(1, 1) v ~~o(1, 2)) & (o(2, 1) v ~~o(2, 2)) & (o(3, 1) v ~~o(3, 2)) -> o(1, 1) & o(2, 1) v o(1, 1) & o(3, 1) v o(2, 1) & o(3, 1) v o(1, 2) & o(2, 2) v o(1, 2) & o(3, 2) v o(2, 2) & o(3, 2) ################################################# ################################################# 8.3 Formulae requiring n contractions ################################################# #con_p(3) (p(1)&p(2)&p(3) v ((p(1)->f) v (p(2)->f) v (p(3)->f))->f)->f # i.e. **( (p1 & p2 & p3) v *p1 v *p2 v *p3 ) where *A abbreviates A->f #con_n(3) (p(1)&p(2)&p(3) v ((~ ~p(1)->f) v (p(2)->f) v (p(3)->f))->f)->f ################################################# ################################################# 8.4 Formulae of Schwichtenberg with big NJ deductions ################################################# #schwicht_p(3) p(3)&(p(1)->p(1)->p(0))&(p(2)->p(2)->p(1))&(p(3)->p(3)->p(2))->p(0). #schwicht_n(3) ~ ~p(3) & (p(1)->p(1)->p(0))&(p(2)->p(2)->p(1))&(p(3)->p(3)->p(2))->p(0) ################################################# ################################################# 8.5 Formulae of Korn & Kreitz ################################################# #kk_p(3) ((a(0) -> f) & ((b(3) -> b(0)) -> a(3)) & ((b(0) -> a(1)) -> a(0)) & ((b(1) -> a(2)) -> a(1)) & ((b(2) -> a(3)) -> a(2)) -> f) & (((b(2) -> a(3)) -> a(2)) & (((b(1) -> a(2)) -> a(1)) & (((b(0) -> a(1)) -> a(0)) & (((b(3) -> b(0)) -> a(3)) & (a(0) -> f)))) -> f) #kk_n(3) (a(0) -> f) & ((~~b(3) -> b(0)) -> a(3)) & ((~~b(0) -> a(1)) -> a(0)) & ((~~b(1) -> a(2)) -> a(1)) & ((~~b(2) -> a(3)) -> a(2)) -> f ################################################# ################################################# 8.6 Equivalences ################################################# #equiv_p(3) ((a(1)<->a(2))<->a(3))<->(a(3)<->(a(2)<->a(1))) #equiv_n(3) ((~ ~a(1)<->a(2))<->a(3))<->(a(3)<->(a(2)<->a(1)))

Last modified: Tue Nov 18 22:33:59 GMT 1997