Wolfram Institute Bulletins

Games and Puzzles as Multicomputational Systems

Stephen Wolfram — Wed, 08 Jun 2022 15:32:24 +0000

Humanizing Multicomputational Processes

Multicomputation is one of the core ideas of the Wolfram Physics Project—and in particular is at the heart of our emerging understanding of quantum mechanics. But how can one get an intuition for what is initially the rather abstract idea of multicomputation? A good approach, I believe, is to see it in action in familiar systems and situations. And I explore here what seems like a particularly good example: games and puzzles.

Multicomputational Irreducibility

James Boyd — Mon, 06 Jun 2022 17:18:07 +0000

Multicomputation is the cornerstone of much of the basic science that my teammates and I are doing with Stephen Wolfram. We see an opportunity to metamodel many areas of applied science using the multicomputational paradigm. In fact, the range of opportunities that we envisage is so wide that we are launching the Wolfram Institute in order to expand the effort beyond Wolfram Research. It’s a momentous period. But because multicomputation is still new, I feel a responsibility to help communicate in greater detail the aspects of multicomputation that I personally find to be compelling. And I have great reference material for doing so, because introducing a new paradigm of science is precisely what Stephen began to do in the 1980s with the computational paradigm. And I still refer to those works from the 1980s today because they cover then-novel concepts that have come to serve as guiding principles for the work that we do now. And a key concept, which distinguishes those papers from other theoretical literature on the study of computability, is that of computational irreducibility. So, now that we are developing a new paradigm that builds upon the one that Stephen pioneered decades ago, it seems appropriate to consider irreducibility in the multicomputational context.

A Review of Computational Irreducibility

Computational irreducibility is the property of computational processes whose behaviors resist prediction. Although popularized through A New Kind of Science, the property was previously subject to considerable study in the 1980s, with the results published in several papers. Such results were surprising because they contradicted algorithmic information theory (AIT), which was in turn influenced by earlier works in information theory and computability theory. The key presumption of AIT, which Stephen’s findings contradicted, was that one could assign a value to the complexity of an algorithm by measuring its description length, where algorithms with shorter rules were presumed to produce outputs with greater regularity and less randomness. The finding that computations with simple rules could be autoplectic—that is, increasing in complexity without external influence—was surprising, and its implications extended beyond the theory of computation to the general study of physical systems. It cast doubt on a paradigmatic principle of modern science, namely that one, in the mathematical tradition, could design parsimonious models with few parameters that make the world tractable enough for humans to gain predictive power over it.

Multicomputational Irreducibility

In what follows, we too will study the behavior of simple computational rules. However, our focus will not be placed on individual rules; rather, we will examine multicomputations that involve either multiple rules that are computed together or rules that admit multiple pattern matchings. In either case, multicomputations differ from their single-way predecessors in that they have parallel evaluation fronts. Let’s consider the case of a multicomputation that admits two different Turing machine (TM) rules:

Let’s first run each TM rule separately for 25 steps:

Looking at the individual TM plots, we see that the behavior shown in either case is predictable. Next, let’s contrast the behavior of the individual rules with the behavior of a multiway Turing machine (MTM) that accepts both rules, which we’ll run for 10 steps:

Intriguingly, the multicomputational behavior is more difficult to anticipate. We can predict how each rule behaves individually, but we cannot as readily predict the “interactions” between those rules. Thus, it appears as though multicomputations too can be either predictable or unpredictable, depending on the convergence and divergence of their parallel evaluation fronts. We will refer to the property of unpredictability for multicomputations as multicomputational irreducibility. Such a property might seem obvious, but it opens upon a trove of new questions about the behavioral study of computational rules (or, as we like to call it, ruliology) and its application to real-world systems (metamodeling).

A Pure n-Machine Definition

Let us try to produce a slightly more formal definition of multicomputational irreducibility. In order to do so, we must think about an idealized machine, one that differs from those that we might usually encounter. The machines that we typically find in theoretical computer science are 1-machines. Graph-theoretically, we model their discrete computations as 0-spaces (vertices) and the overall program that they execute as a 1-space (a path). For a 1-machine, the initial condition is a vertex, as is the final state, and every state in between. A 2-machine, on the other hand, is a machine that computes over paths (1-spaces), with the overall computation forming a sheetlike 2-path (which, prima facie, is not unlike the terms of higher identity types constructed in univalent foundations).

Together, one can use 2-machines, 1-machines and higher n-machines to determine the path deformations, boundaries and geni of multicomputations, as one does in generalized homology with CW-complexes, homotopies and functors. And one can think of the “laps” performed by such machines over multicomputational graphs as playing a role similar to that of homotopy groups. But a key point here—as shall be explored later—is that such concepts are relevant to the experimental study of multicomputational irreducibility. For instance, a maximally confluent multicomputation consists of highly interdeformable paths; such a multicomputation is also easily predictable and thus multicomputationally reducible. As the multicomputation runs, we know that the paths are, and will continue to be, interdeformable.

On the other hand, with multicomputations for which 2-machine laps are non-Abelian (such that as the multicomputation proceeds, one cannot always “go back and forth” between paths), irreducibility is more likely. Because the paths are not all interdeformable, we do not know whether or not they will converge as the multicomputation continues to run. Another elementary example of reducible multicomputations is the class of fully ramified multicomputations that do not exhibit any confluence (and are thus entirely refractory to 2-machines); we can safely anticipate that confluence will never occur.

As of now, 2-machines and higher n-machines are idealizations. No known computational technologies can implement 2-machine capabilities. Nonetheless, 2-machines are useful for positing theoretical definitions. In particular, multicomputation can be defined with reference to such machines: a computation with multiple evaluation fronts is multicomputationally irreducible if its corresponding 2-machine computation is undecidable. Put informally, if as a multicomputation proceeds it is unclear if and how one could compute “horizontally” (pathwise) from one evaluation front to another, then the multicomputation is itself irreducible. But, on the other hand, if one always knows that all paths are interdeformable, then the corresponding 2-machine computation is decidable (positively so), as is the case if we know that the multicomputation is fully ramified and never converges (negatively so). Alternatively, as Stephen once put it during a conversation, a multicomputation is irreducible if, in order to know what paths a multicomputation gives, one must simply compute all paths.

However, we do have a way to study multicomputational irreducibility even without 2-machines: branchial space. By extracting from a multicomputation its branchial graph, we can examine the relations between paths, and are thus effectively performing a “2-machine branchial reduction,” or a conversion of an (infeasible) 2-machine computability problem into something that our Physics Project makes tractable. (And Jonathan Gorard’s branchial Turing machine offers a nice connection to the TM and MTM examples given previously.)

Approximation of Multicomputational Irreducibility Using a Branchial Lyapunov Exponent

Branchial reductions of multicomputations provide a practical way to approximate irreducibility. In order to show how approximations as such can be done, let’s begin with some simple examples.

Consider the numerical multicomputation . As we shall see, it is multicomputationally reducible, as its paths are all interdeformable. Here is the multicomputation, which we run for six steps:

And here are the branchial graphs generated at successive steps:

Evidently, the branchial graphs do not evolve whatsoever, and neither do their corresponding distance matrices (given as array plots here):

And we can predict (correctly) that such will continue to be the case, indefinitely, for all future steps. Graphs for which all paths are interdeformable are multicomputationally reducible, as discussed previously.

Next, consider a multicomputation that is entirely ramified:

The corresponding branchial graphs consist of disconnected graph segments, each containing two vertices:

This behavior is also predictable. At step t, the number of segments in the branchial space is , where . A general remark: if one can readily provide an equation that predicts future branchial behavior, it is clear that one has found a case of multicomputational reducibility.

Here’s another view of the same branchial evolution: an array plot of the graph distance matrices for each step. In this case, we see that the distance-1 branchial relation (i.e. the segment) is “propagated” consistently:

One initial test that Stephen performed when studying randomness and autoplectic behavior among cellular automata is, among other tests, the study of their Lyapunov exponents. In the case of elementary cellular automata, one can measure Lyapunov exponents by calculating the slopes of a computation. In order to study multicomputational irreducibility, we can introduce a branchial analog of the Lyapunov exponent, denoted by λ_B.

Lyapunov exponents measure the “drift” of initial conditions in dynamical systems. When we perform a branchial reduction of a 2-computation, the initial condition is the branchial graph at step 1, which in this case is a single branchial edge with two vertices (i.e. the segment). And, as we can see in this case, the initial distance does not “drift” as we iterate the ramified multicomputation given previously: the one-unit distance segment is simply “propagated” along the matrix diagonal. As a result, the array plot appears “linear” in that the same branchial distance is “passed” from one branchial pair to another. If, however, the branchial graph assumed a more elaborate, connected graphical configuration, then the branchial evolution would “drift” from the initial condition, with matrix entries appearing farther from the diagonal (as we’ll see shortly).

We can also describe λ_B more formally. Consider branchial graph distance matrix entries for a distance matrix D = r · c. Now, consider only those entries for which the distance matrix value d is 1 (which constitute the entries of the adjacency matrix):

Row and column positions of each entry are obtained via a position query :

And yields the positions nearest to the diagonal

where

We measure λ_{B_i} by taking, for each i:

In the case of our ramified multicomputation, for each node in the branchial graph, λ_B = 0:

And in general, “λ_B plot flatness” is a visual heuristic for multicomputational reducibility. In the case of the ramified multicomputation, the result is not surprising. We can see that confluence will never happen; thus, the multicomputation is reducible.

Next, let us consider multicomputations with nontrivial confluence, such as this one, which we run for seven steps:

We see that the branchial distance matrices exhibit slightly more “complex” behavior and cannot be as easily predicted:

Consider now the following multicomputation:

Its branchial graphs are nontrivial

as are their corresponding distance matrices:

Notice here that the original condition, the branchial segment, is not simply “passed” to other segments. With time, as the branchial graph takes shape, there are many branchial vertices that are one branchial unit of distance away from others in a large connected component. Thus, the initial condition is effectively “diffused” throughout branchial space, rather than just being “propagated” at each step in segmentary form.

Here are the λ_{B_i} values for the branchial graph obtained after 10 steps:

As one can see, this plot is far from being flat.

One should keep in mind that our distance matrices can be permuted, and that the distance matrices that we obtain by default in the Wolfram Language correspond to a default assignment of vertex numberings. But it so happens that the matrices that we obtain by default make it particularly easy to study the predictability of branchial behavior. Nik Murzin has suggested that an ensemble-like λ_B measure could be obtained by considering all possible matrix permutations. Such an approach will be subject to further study.

Foliation as Multicomputational Choice

Multicomputational research differs from computational research in a number of ways. One key difference between the two is that we enjoy the prerogative of choice when doing multicomputational research; there are different options available to us that aren’t available when we study single-way computations. When one runs a single computation—and an irreducible one in particular—one has little choice other than to run it and study its behavior. But in the multicomputational case, one is able to make choices in how one studies behavior. This is the case because, when one computes multiple evaluation fronts, one can “synchronize” or “coordinate” multicomputational states in different ways.

And yes—we actually do have a choice in this matter. One might presume that, when we run multiple rules, it is simply the case that all states “from step 1” of each computation are synchronized, all states that result “from step 2” are synchronized and so on. But, as will be shown, one has a choice in the “simultaneity orientation” that dictates which vertices in the multicomputational graph “occur together” at each step.

The researcher makes such choices by selecting a foliation. Originally, the foliation was proposed as a theoretical concept and computational technique for the Physics Project. But we have since come to understand that foliation choices can be made for all multicomputations. Curiously, with multicomputation, we are making the study of computational behavior even more “complex” than before by building systems with many rules (or rules that can be evaluated in different ways); nevertheless, by doing so, we reintroduce a prerogative of choice into our computational methodology that we do not have when we study rules individually. And we anticipate that choice as such affords tremendous metamodeling advantages. For instance, we think that multicomputation makes it possible to capture the aggregate behavior of systems, so long as one selects the right foliation. And we believe that observation and the general transduction of all systems can be metamodeled this way.

But why is choice of foliation important? As we will see, λ_B approximations for the same multicomputation differ by foliation. For instance, let’s revisit the multicomputation , this time experimenting with possible foliations. Nine such foliations are given here (using Nik’s ever-helpful GraphFoliations function):

Each foliation includes 8–9 foliation slices (which, in the original Physics Project metamodel, are hypersurfaces). For the first foliation shown, the final slice includes vertices 14, 16, 17 and 18, whereas the final slice in the last foliation contains only vertex 14. In order to better understand the difference between these foliations, we can plot the “cardinality” of each foliation slice (i.e. the number of vertices in each) for 500 foliations of the same multicomputation:

We might notice that, even though we take 500 different foliations, we don’t see in the previous plot what appear to be 500 distinct “trajectories.” But we should be able to partially disaggregate the trajectories because there are many foliation slices that share a common cardinality value but include different states. As a proxy measure, we can summate the states in each foliation slice (because, if the numbers differ, then their sums often will too):

Given here is a histogram showing the foliation slice cardinality variance

and the kurtosis:

It appears as though the most “probable” foliation choices do not minimize cardinality. Thus, foliations are not “all the same,” and if one wants slices that minimize volatility (i.e. variance) or surprisal (i.e. kurtosis), then one must select an “improbable” foliation, which requires careful choice.

As we shall see, careful choice of foliation can mitigate multicomputational irreducibility by allowing one to “observe” the multicomputational evolution without “taking in too much at once.” Let’s consider two random foliations for the rule , run for seven steps:

And here are array plots for the graph distance matrices for the respective branchial graphs (computed up to 11 steps):

Let’s compare their respective λ_{B_i} values:

Overall, foliation 2 exhibits “tamer” branchial Lyapunov exponent values. And looking at the previous array plots, we see that the evolution of the multicomputation with foliation 2 is more “gradual” than that of foliation 1.

An Interlude: On Non-Archimedean Reductions and Rulial Primes

Understanding branchial reductions might require honing some unfamiliar modeling sensibilities. This is because branchial distance is non-Archimedean, unlike most graph distances. In a conventional graph, we straightforwardly measure the distance between vertices in terms of the edges that join them. Such a measure of graph distance obeys the Archimedean property in that if the distance d from vertex c to vertex a is greater than the distance d from vertex b to vertex a, it is still the case, for some n, that n×d (b, a) > d (c, a). Put differently, one can multiply n by the distance from b to a and obtain a distance greater than that from c to vertex a. Consider the distance function

computed as a density plot:

Here, d is Archimedean. But branchial distance measures something different. It measures the shared graph ancestry of vertices that may not be joined by edges in the original graph at all; thus, it is non-Archimedean. A common non-Archimedean system in mathematics is that of the p-adic numbers. The p-adic distance d_p(x, y) between x and y, for some prime p, is p exponentiated by the reciprocal of the greatest power of p that divides the absolute value of the difference between x and y:

Being non-Archimedean, p-adic distance can be difficult to visualize. And p-adic space itself is totally disconnected. But nonetheless, for expository purposes, we can construct a “Euclidean-interpolated” contour space of p-adic distances for –10 ≤ x ≤ 10, –10 ≤ y 10, p = 2, 3, 5, 7, 11, 13. Here, we plot the contours over different surfaces in order to provide a cursory glimpse of the four-dimensional contours:

Branchial distance and p-adic distance share some similarities. When we measure branchial distance, we are effectively asking, “For any two vertices in a graph, how many steps backward, going from vertices to the prior vertices that feed into those vertices, must we take until we find a common vertex?” Similarly, p-adic distance concerns relationships between numbers in a way that corresponds to prime factorization. In neither case is one asking, “How far in some direction must one travel directly to reach there from here?” Rather, in both cases, one is asking, “How far removed are these two things from one another in terms of common feeders?”

However, the affinities between branchial and p-adic distance become somewhat more concrete when the feeders themselves embody some quality of primality, and primality as such need not be limited to the prime numbers. For instance, consider the case of metamathematical space. One important finding that Stephen made in A New Kind of Science is that, when one lists all possible theorems that can be proved from the axioms of Boolean algebra, it so happens that those that are “named” and are subject to human study are precisely those theorems that “cannot be derived from preceding theorems.” In the following, I plot graphically the dependencies between 60 theorems of Boolean algebra (the same that Stephen considers in the section on empirical metamathematics in the metamathematics piece):

Here, each of the named theorems is an attractor, a common ancestor without its own respective ancestor. I propose that ancestorless theorems in metamathematics are a particular case of rulial primes, objects in entailment fabrics (coarsened slices of the Ruliad that serve as reference frames for observers) that are not constructed from the application of a rule to another object. Of course, all objects are constructed from lower-level computations, with the ultimate primordia of the Ruliad being emes. But the point is that multicomputation gives rise to aggregate properties that can be captured by observers (or instruments that transduce values from systems), by merit of interactions between the different evaluation fronts in the multicomputation. And it is in the aggregate context that rulial primes, untethered from lower-level computational dynamics, arise. In the case of Boolean algebra, the theorems that share a rulial prime as a common “sink” are connected in branchial space, such that the branchial distance between each is a non-Archimedean distance taken with respect to a rulial prime (which is not too different, conceptually, from p-adic distance).

Thus, a branchial 2-machine reduction might not simply be an expedient gadget; it might be important in its own right. It might help us to identify particular bulk objects that “stand out” in systems.

ϱ-Varieties: A Multicomputational Response to Arithmetic and Algebraic Geometry

Multicomputation is a paradigmatic successor to computation, with computation itself already being a successor to the mathematical paradigm. However, it is perfectly possible for multicomputation to “reach backward” two paradigms and metamodel mathematics, even in exotic ways. But multicomputation should also allow us to compute certain objects of mathematics that are not in the canon of the mathematical paradigm, precisely because they involve multicomputational irreducibility and thus cannot be easily studied without some experimental, generative procedure.

For a numerical multicomputation, confluence occurs when different rules yield the same value; that is, their outputs agree. And these outputs are just values that satisfy multiple rules. But the idea that we can study values that satisfy multiple formal specifications is not new. In areas of mathematics such as algebraic geometry, we study common values that satisfy systems of polynomial equations (known as solutions or “roots”) as spaces, known as algebraic varieties. A well-known example of an algebraic variety is an elliptic curve, consisting of solutions to the equation y² = x³ + a x + b. The following are examples of this particular algebraic variety with values for a and b toggled:

And in arithmetic geometry, one seeks to answer questions such as the number of solutions admitted by a variety. Arithmetic geometry is sometimes described as the study of the “complexity” of varieties, though perhaps it doesn’t capture as much complexity as it could.

In the case of multicomputation, however, we don’t study static equations. We study systems composed of rules, such as , which can be iterated for an indefinite number of steps. Were we to write such rules as equations, we would use recursive functions; in this case, , where for some initial condition . And, for many multicomputations, one can extract from the overall multicomputation the graph of its confluent values, which I will call a rulial variety (shortened as ϱ-variety).

For something generative like a ϱ-variety, it is much more interesting to study the behavior with which confluent values appear than the total number of “solutions” (which, in arithmetic geometry, we would approximate using some height function). And this behavior can be multicomputationally irreducible.

Consider the multicomputation . In the following, we begin with initial condition 2 and multicompute for two steps:

And here, we multicompute for eight steps:

There are vertices in these graphs with indegree three (maximal indegree); that is, there are values that satisfy all three rules. Thus we can, in turn, extract from this eight-step multicomputation a ϱ-variety, which is a “subgraph” composed of all vertices with maximal indegree. Indegree must be maximal, for vertices with less than maximal indegree are not numerical values that satisfy all rules. The ϱ-variety for the eight-step multicomputation is shown here:

This ϱ-variety possesses notable properties. For instance, it appears to be the case that, if we compute the eight-step ϱ-variety for the rule with any two initial conditions, the resulting ϱ-varieties are isomorphic to one another. Here are the eight-step ϱ-varieties for initial conditions four and five:

They look isomorphic. And indeed, we can prove such to be the case:

And this suggests that, for the rule , one could take the ϱ-variety for all initial conditions and obtain a holochaotic moduli space composed of initial conditions in an isomorphism class (with “holochaotic” being a portmanteau of χάος and ὅλος to connote “possessing all initial states”). And, in principle, one can generate multicomputations that begin with different initial conditions, which makes it easy to study chaos.

We can also continue to yield sub-ϱ-varieties, or subgraphs of ϱ-varieties that are in turn their own respective ϱ-varieties. And we do so by forming a subgraph of vertices that themselves have maximal indegree in the ϱ-variety itself. For the multicomputation , we can yield two further sub-ϱ-varieties, the last of which is a minimal sub-ϱ-variety:

Finally, consider the original ϱ-variety and its sub-ϱ-varieties together:

As one can see, the branchial graphs for the ϱ-variety and the first sub-ϱ-variety consist of paths that are all interdeformable (with the minimal sub-ϱ-variety possessing only one branchial path):

Thus, it appears to be the case that the branchial 2-machine reductions are decidable.

Now, let us once again examine the nettlesome multicomputation . Here are the multicomputations for initial conditions 2, 3 and 4:

Next, we can compute their respective ϱ-varieties:

These are clearly not isomorphic:

And the distance matrices for their respective branchial graphs differ considerably (chaotically) according to choice of initial condition. Here are array plots for graph distance matrices for 15 different initial conditions (ranging from 3 to 45, increasing by increments of three):

We can compare initial conditions by measuring the entropies of the graph distance matrices corresponding to their multicomputations. The lower the entropy, the more uniform the values of the matrix. Here, we plot the entropy over initial conditions 2 ≤ ≤ 101:

Disregarding the cases of small-valued initial conditions, the plot seems to exhibit random walk–like behavior, suggesting that for the rule , one branchial graph for a given initial condition tells us little about the corresponding branchial graph for another. (Note that here we are studying initial conditions for the rule itself, rather than the initial branchial conditions, which we study when we measure λ_B.)

There is much more to be explored with respect to ϱ-varieties. I introduce them here to reinforce the idea that the behavior of interacting computational rules (such as satisfaction of common values) can be studied behaviorally (i.e. ruliologically) rather than in limit cases (as arithmetic geometry does when estimating the number of solutions for systems of polynomial equations). And, what is more, there are many questions of multicomputational irreducibility (as well as chaotic and holochaotic behavior) that can be examined when studying ϱ-varieties and sub-ϱ-varieties.

Some Next Steps: Metamodeling Physics

For those interested in straightforward, initial projects on multicomputational irreducibility, the Wolfram Physics Project presents a few opportunities. It is suspected that multicomputational irreducibility is an important concept for metamodeling quantum interference. (In fact, it was during a conversation with Stephen on the topic that I first proposed the concept of multicomputational irreducibility.) There also exists a Registry of Notable Universes on the Physics Project website, in which the graph distance matrices for Wolfram models are already computed. Thus, it should not be too difficult to identify examples that, at least after a certain number of steps, exhibit irreducibility.

Conclusion

Multicomputation is just beginning. And it appears as though there are countless open questions that we can consider. A careful reader might note that this bulletin introduces several new concepts and raises many open questions, with the findings provided being specific to selected case studies and lacking in generality. This piece is intended to serve as an invitation, with many ideas and provocations introduced with brevity. As the popularity of this paradigm grows, the open questions raised and case studies suggested in early multicomputational works can serve as guideposts for those who are interested and wish to find a way to make helpful contributions.

The computational paradigm of basic science was motivated by the finding that many computations (which are the most fundamental models of discrete processes that follow rules) are unpredictable and autoplectic in their behavior. Multicomputation is an exciting new paradigm because, thanks to the prerogative of choice afforded by foliations, it appears as though what we had previously thought to be physical limits to the understandability of algorithmic behavior can in fact be “negotiated.” As we proceed from ruliologically studying individual processes to systems of interacting processes, we expect to develop a general theory and metamodeling practice with which we can extract bulk, aggregate properties from multicomputations.

Such a goal might sound ambitious, but our quotidian experience as humans suggests overwhelmingly that, despite the dizzying behavior of low-level processes, we, as observers of the world, enjoy a phenomenology with nice “physical UX” and an “interface of macros.” Thus, the foliation can be thought of as the abstract equivalent of a user interface in the pure sciences. And UIs have been incredibly important in that they have made it possible for more people to technologically harness computational power. And in the case of multicomputation, foliations should allow us to scientifically interface between low-level and high-level computational behaviors (including high-level behaviors corresponding to things in the world that we care about), effectively fashioning a bridge from pure computation to the applied sciences. And we will be pursuing many projects in applied multicomputation at the Wolfram Institute for this very reason.

Lastly, multicomputation provides a way to understand the construction and exploration of the Ruliad; it is constructed theoretically by running all possible computations at once, and we explore it by taking foliations over “slices” of the Ruliad and transporting such foliations across rulial space. More will be written about the exploration of the Ruliad—a paramount scientific imperative of our time, also to be pioneered by the Wolfram Institute—on another occasion.

A Note of Appreciation

I would like to thank Stephen Wolfram for taking an interest in my ideas on multicomputational irreducibility and providing helpful advice on how to communicate my ideas effectively to others; to Nik Murzin for his identification of errors and points in need of greater clarification; to Hatem Elshatlawy for his suggestions regarding descriptions of foundational concepts; and to Xerxes Arsiwalla for encouraging me to think about long-term directions in which I can take the study of multicomputational irreducibility.

Twenty Years Later: The Surprising Greater Implications of A New Kind of Science

Stephen Wolfram — Mon, 16 May 2022 02:13:49 +0000

From the Foundations Laid by A New Kind of Science

When A New Kind of Science was published twenty years ago I thought what it had to say was important. But what's become increasingly clear—particularly in the last few years—is that it's actually even much more important than I ever imagined. My original goal in A New Kind of Science was to take a step beyond the mathematical paradigm that had defined the state of the art in science for three centuries—and to introduce a new paradigm based on computation and on the exploration of the computational universe of possible programs. And already in A New Kind of Science one can see that there's immense richness to what can be done with this new paradigm.

On the Concept of Motion

Stephen Wolfram — Fri, 18 Mar 2022 17:00:10 +0000

How Is It That Things Can Move?

It seems like the kind of question that might have been hotly debated by ancient philosophers, but would have been settled long ago: how is it that things can move? And indeed with the view of physical space that’s been almost universally adopted for the past two thousand years it’s basically a non-question. As crystallized by the likes of Euclid it’s been assumed that space is ultimately just a kind of “geometrical background” into which any physical thing can be put—and then moved around.

But in our Physics Project we’ve developed a fundamentally different view of space—in which space is not just a background, but has its own elaborate composition and structure. And in fact, we posit that space is in a sense everything that exists, and that all “things” are ultimately just features of the structure of space. We imagine that at the lowest level, space consists of large numbers of abstract “atoms of space” connected in a hypergraph that’s continually getting updated according to definite rules and that’s a huge version of something like this:

&#10005

The Physicalization of Metamathematics and Its Implications for the Foundations of Mathematics

Stephen Wolfram — Mon, 07 Mar 2022 18:45:50 +0000

Mathematics and Physics Have the Same Foundations

One of the many surprising (and to me, unexpected) implications of our Physics Project is its suggestion of a very deep correspondence between the foundations of physics and mathematics. We might have imagined that physics would have certain laws, and mathematics would have certain theories, and that while they might be historically related, there wouldn’t be any fundamental formal correspondence between them.

But what our Physics Project suggests is that underneath everything we physically experience there is a single very general abstract structure—that we call the ruliad—and that our physical laws arise in an inexorable way from the particular samples we take of this structure. We can think of the ruliad as the entangled limit of all possible computations—or in effect a representation of all possible formal processes. And this then leads us to the idea that perhaps the ruliad might underlie not only physics but also mathematics—and that everything in mathematics, like everything in physics, might just be the result of sampling the ruliad.

The Concept of the Ruliad

Stephen Wolfram — Wed, 10 Nov 2021 18:35:52 +0000

The Entangled Limit of Everything

I call it the ruliad. Think of it as the entangled limit of everything that is computationally possible: the result of following all possible computational rules in all possible ways. It’s yet another surprising construct that’s arisen from our Physics Project. And it’s one that I think has extremely deep implications—both in science and beyond.

In many ways, the ruliad is a strange and profoundly abstract thing. But it’s something very universal—a kind of ultimate limit of all abstraction and generalization. And it encapsulates not only all formal possibilities but also everything about our physical universe—and everything we experience can be thought of as sampling that part of the ruliad that corresponds to our particular way of perceiving and interpreting the universe.

We’re going to be able to say many things about the ruliad without engaging in all its technical details. (And—it should be said at the outset—we’re still only at the very beginning of nailing down those technical details and setting up the difficult mathematics and formalism they involve.) But to ground things here, let’s start with a slightly technical discussion of what the ruliad is.

Pregeometric Spaces from Wolfram Model Rewriting Systems as Homotopy Types

Xerxes D. Arsiwalla — Thu, 04 Nov 2021 16:24:12 +0000

How do spaces emerge from pregeometric discrete building blocks governed by computational rules? To address this, we investigate non-deterministic rewriting systems (multiway systems) of the Wolfram model. We formalize these rewriting systems as homotopy types. Using this new formulation, we outline how spatial structures can be functorially inherited from pregeometric type-theoretic constructions. We show how higher homotopy types are constructed from rewriting rules. These correspond to morphisms of an n-fold category. Subsequently, the n→∞ limit of the Wolfram model rulial multiway system is identified as an ∞-groupoid, with the latter being relevant given Grothendieck's homotopy hypothesis. We then go on to show how this construction extends to the classifying space of rulial multiway systems, which forms a multiverse of multiway systems and carries the formal structure of an (∞,1)-topos. This correspondence to higher categorical structures offers a new way to understand how spaces relevant to physics may result from pregeometric combinatorial models. The key issue we have addressed here is to formally relate abstract non-deterministic rewriting systems to higher homotopy spaces. A consequence of constructing spaces and geometry synthetically is that it removes ad hoc assumptions about geometric attributes of a model such as an a priori background or pre-assigned geometric data. Instead, geometry is inherited functorially from globular structures. This is relevant for formally justifying different choices of underlying spacetime discretization adopted by various models of quantum gravity. Finally, we end with comments on how the framework of higher category-theoretic combinatorial constructions developed here, corroborates with other approaches investigating higher categorical structures relevant to the foundations of physics.

Multicomputation with Numbers: The Case of Simple Multiway Systems

Stephen Wolfram — Thu, 07 Oct 2021 16:54:09 +0000

A Minimal Example of Multicomputation

Multicomputation is an important new paradigm, but one that can be quite difficult to understand. Here my goal is to discuss a minimal example: multiway systems based on numbers. Many general multicomputational phenomena will show up here in simple forms (though others will not). And the involvement of numbers will often allow us to make immediate use of traditional mathematical methods.

A multiway system can be described as taking each of its states and repeatedly replacing it according to some rule or rules with a collection of states, merging any states produced that are identical. In our Physics Project, the states are combinations of relations between elements, represented by hypergraphs. We’ve also often considered string substitution systems, in which the states are strings of characters. But here I’ll consider the case in which the states are numbers, and for now just single integers.

And in this case multiway systems can be represented in a particularly simple way, with each state s just being repeatedly replaced according to:

s → (s), … , }

For a “binary branching” case the update rule is

and one can represent the evolution of the system by the multiway graph which begins:

and continues (indicating by red and by blue):

With arbitrary “symbolic” this (“free multiway system”) tree is the only structure one can get. But things can get much less trivial when there are forms for , that “evaluate” in some way, because then there can be identities that make branches merge. And indeed most of what we’ll be discussing here is associated with this phenomenon and with the “entanglements” between states to which it leads.

It’s worth noting that the specific setup we’re using here avoids quite a lot of the structural complexity that can exist in multicomputational systems. In the general case, states can contain multiple “tokens”, and updates can also “consume” multiple tokens. In our case here, each state just contains one token—which is a single number—and this is what is “consumed” at each step. (In our Physics Project, a state corresponds to a hyperedge which contains many hyperedge tokens, and the update rule typically consumes multiple hyperedges. In a string substitution system, a state is a character string which contains many character tokens, and the update typically consumes multiple—in this case, adjacent—character tokens.)

With the setup we’re using here there’s one input but multiple outputs (2 in the example above) each time the update rule is applied (with the inputs and outputs each being individual numbers). It’s also perfectly possible to consider cases in which there are multiple inputs as well as multiple outputs. But here we’ll restrict ourselves to the “one-to-many” (“traditional multiway”) case. And it’s notable that this case is exceptionally easy to describe in the Wolfram Language:

Multiway Systems Based on Addition

As our first example, let’s consider multiway systems whose rules just involve addition.

The trivial (“one-input, one-output”) rule

gives a multiway graph corresponding to a “one-way number line”:

The rule

gives a “two-way number line”:

But even

gives a slightly more complicated multiway graph:

What’s going on here? Basically each triangle represents an identity. For example, starting from 1, applying twice gives 3, which is the same result as applying once. Or, writing the rule in the form

the triangles are all the result of the fact that in this case

For the “number line” rule, it’s obvious that we’ll eventually visit every integer—and the +1, +2 rule also visits every integer.

Consider now instead of +1 and +2 the case of +2 and +3:

After a few steps this gives:

Continuing a little longer gives:

It’s a little difficult to see what’s going on here. It helps to show which edges correspond to +2 and +3:

We’ll return to this a little later, but once again we can see that there are cycles in this graph, corresponding to simple “commutativity identities”, such as

and

as well as “LCM identities” such as

(Note that in this case, all integers above 1 are eventually generated.)

Let’s look now at a case with slightly larger integers:

After 6 steps one gets a simple grid

essentially made up of “commutativity identities”. But continuing a little longer one sees that it begins to “wrap around”

eventually forming a kind of “tube” with a spiral grid on the outside:

The “grid” is defined by “commutativity identities”. But the reason it’s a “closed tube” is that there are also “LCM identities”. To understand this, unravel everything into a grid with +4 and +7 directions—then draw lines between the duplicated numbers:

The “tube” is formed by rolling the grid up in such a way as to merge these numbers. But now if we assume that the multiway graph is laid out (in 3D) so that each graph edge has unit length, application of Pythagoras’s theorem in the picture above shows that the effective circumference of the tube is .

In another representation, we can unravel the tube by plotting numbers at {x, y} according to their decomposition in the form :

(From this representation we can see that every value of n can be reached so long as .)

For the rule

the multiway graph forms a tube of circumference which can be visualized in 3D as:

And what’s notable here is that even though we’re just following a simple discrete arithmetic process, we’re somehow “inevitably getting geometry” out of it. It’s a tiny, toy example of a much more general and powerful phenomenon that seems to be ubiquitous in multicomputational systems—and that in our models of physics is basically what leads to the emergence of things like the limiting continuum structure of space.

We’ve seen a few specific example of “multiway addition systems”. What about the more general case?

For

a “tube” is generated with circumference

where = {a, b}/GCD[a, b]

After enough steps, all integers of the form k GCD[a, b] will eventually be produced—which means that all integers are produced if a and b are relatively prime. There’s always a threshold, however, given by FrobeniusNumber[{a, b}]—which for a and b relatively prime is just a b – a – b.

By the way, a particular number n—if it’s going to be generated at all—will first be generated at step

(Note that the fact that the multiway graph approximates a finite-radius tube is a consequence of the commensurability of any integers a and b. If we had a rule like , we’d get an infinite 2D grid.)

For

a tube is again formed, with a circumference effectively determined by the smaller pair (after GCD reduction) of a, b and c. And if GCD[a, b, c] = 1, all numbers above FrobeniusNumber[{a, b, c}] will eventually be generated.

Pure Multiplication

One of the simplest cases of multiway systems are those based on pure multiplication. An example is (now starting from 1 rather than 0):

In general, for

we’ll get a simple 2D grid whenever a and b aren’t both powers of the same number. With d elements in the rule we’ll get a d-dimensional grid. For example,

gives a 3D grid:

If the multipliers in the rule are all powers of the same number, the multiway graph degenerates to some kind of ladder. In the case

this is just:

while for

it is

and in general for

it is a “width-m” ladder graph.

Multiplication and Addition: n ⟼ {a n, n + b}

Let’s look now at combining multiplication and addition—to form what we might call affine multiway systems. As a first example, consider the case (which I actually already mentioned in A New Kind of Science):

Considering the simplicity of the rule by which it was generated, this result looks surprisingly complex. One immediate result is that after t steps, the total number of distinct numbers reached is Fibonacci[t – 1], which increases exponentially like . Eventually the ensures that every integer is generated. But the often “jumps ahead”, and since the maximum number generated at step t is the “average density” of numbers falls exponentially like .

Continuing the evolution further and using a different rendering we get the very “geometrical” (planar) structure

What can we say about this structure? Apart from the first few steps (rendered at the center), it consists of a spiral of pentagons. Each pentagon (except the one at the center) has the form

reflecting the relation

Going out from the center, each successive layer in the spiral has twice the number of pentagons, with each pentagon at a given layer “spawning” two new pentagons at the next layer.

Removing “incomplete pentagons” this can be rendered as:

What about other rules of the general form:

Here are the corresponding (“complete polygon”) results for through 5:

The multiway graphs in these cases correspond to spirals of ()-gons defined by the identity

or equivalently

At successive layers in the spiral, the number of ()-gons increases like .

Eventually the evolution of the system generates all possible integers, but at step t the number of distinct integers obtained so far is given by the generalized Fibonacci series obtained from

which for large t is

where is the k-nacci generalized golden ratio, which approaches for large k.

If we consider

it turns out that one gets the same basic structure (with ()-gons) for as for . For example, with

one gets:

The Rule n ⟼ {2n + 1, 3n + 1}

For the rule

there are at first no equivalences that cause merging in the multiway graph:

But after 5 steps we get

where now we see that 15 and 31 are connected “across branches”.

After 10 steps this becomes:

At a visual level this seems to consist of two basic components. First, a collection of loops, and second a collection of tree-like “loose ends”. Keeping only complete loops and going a few more steps we get:

Unlike in previous cases, the “loops” (AKA “polygons”) are not of constant size. Here are the first few that occur (note these loops “overlap” in the sense that several “start the same way”):

As before, each of these loops in effect corresponds to an identity about compositions of functions—though now it matters what these compositions are applied to. So, for example, the 4^th loop above corresponds to (where k stands for the function ):

In explicit form this becomes:

where both sides evaluate to the same number, in this case 26815.

Much as in the Physics Project, we can think of each “loop” as beginning with the creation of a “branch pair”, and ending with the merger of the different paths from each member of the pair. In a later section we’ll discuss the question of whether every branch pair always in the end re-merges. But for now we can just enumerate mergers—and we find that the first few occur at:

(Note that a merger can never involve more than two branches, since any given number has at most one “pre-image” under and one under .)

Here is a plot of the positions of the mergers—together with a quadratic fit (indicated by the dotted line):

(As we’ll discuss later, the numbers at which these mergers occur are for example always of the form .)

Taking second differences indicates a certain apparent randomness:

What can we say about the overall structure of the multiway graph? One basic question is what numbers ever even occur in the evolution of the system. Here are the first few, for evolution starting from 0:

And here are successive differences

Dividing successive m by the number gives a progressive estimate of the density of numbers:

On a log-log scale this becomes

showing a rough fit to —and suggesting an asymptotic density of 0.

Note, by the way, that while the maximum gap grows on average linearly (roughly like 0.17 m)

the distance between gaps of size 1 shows evidence of remaining bounded:

(A related result from the 1970s states that the original sequence contains infinite-length arithmetic progressions—implying the presence of infinite runs of numbers whose differences are constant.)

The More General “Affine” Case: n ⟼ {a n + b, c n + d}

Not every rule of the form

leads to a complex multiway graph. For example

just gives a pure binary tree since 2n just adds a 1 at the beginning of the binary digit sequence of n, while adds one at the end:

Meanwhile

gives a simple grid

where at level t the numbers that appear are simply

and the pattern of use of the two cases in the rule makes it clear why the grid structure occurs.

Here are the behaviors of all inequivalent nontrivial rules of the form

with constants up to 3:

“Ribbons” are seen only when . “Simple webs” are seen when . “Simple grids” are seen whenever the two cases in the rule commute, i.e.

which occurs whenever

“Simple trees” are seen whenever

In other cases there seems to be irregular merging, as in the case above. And keeping only nontrivial inequivalent cases these are the results after removing loose ends:

Note that adding another element in the rule can make things significantly more complicated. An example is:

After 8 steps this gives

or in another rendering:

After a few more steps, with “loose ends” removed, one gets the still-rather-unilluminating result (though one that we will discuss further in the next section):

The Phenomenon of Confluence

Will every branching of paths in the multiway graph eventually merge again? If they do, then the system is confluent (which in this case is equivalent to saying that it’s causal invariant—an important property in our Physics Project).

It turns out that all rules of the following forms are confluent:

But among rules of the form

confluence depends on the values of a, b, c and d. When multiway graphs are “simple webs” or “simple grids” there is obvious confluence. And when the graphs are simple trees, there is obviously not confluence.

But what about a case like the rule we discussed above:

We plotted above the “positions” of mergers that occur. But are there “enough” mergers to “rejoin” all branchings?

Here are the first few branchings that occur:

For the pair 3, 4 one can reach a “merged” end state on the following paths:

which are embedded in the whole multiway graph (without loose ends) as:

For the pair 9, 13 both eventually reach 177151, but 9 takes 13 steps to do so:

Here’s a summary of what we know about what happens with the first few branchings:

So what about the total number of branchings and mergings? This is what happens for the first several steps:

The number of branchings at step t approximates

while the number of mergings seems to grow systematically more slowly, perhaps like 1.:

And based on this it seems plausible that the system is not in the end confluent. But how might we show this? And what is the best way to figure out if any particular branch pair (say 21, 31) will ever merge?

One way to look for mergings is just to evolve the multiway graph from each member of the pair, and check if they overlap. But as we can see even for the pair {3, 4} this effectively involves “treeing out” an exponential number of cases:

Is there a way to do this more efficiently, or in effect to prune the trees? A notable feature of the original rule is that the numbers it generates always increase at each step. So one thing to do is just to discard all elements at a particular step in one graph that cannot reach the “minimum frontier” in the other graph. But on its own, this leads to only very minor reduction in the size of graph that has to be considered.

To find what is potentially a much more effective “optimization” let’s look at some examples of mergings:

It’s clear that the final step has to consist of one application of and one of (i.e. one red edge and one blue edge). But these examples suggest that there are also further regularities.

At the merging point it must be true that

for some integers u and v. But for this to be true, the merged value (i.e. or ) must for example be equal to 1 mod 2, 3 and 6.

Using the structure one level back we also have:

implying that the merged value must be 3 mod 4, 7 mod 12, 13 mod 18 and 36 mod 31. Additional constraints from going even further back imply in the end that the merged value must have the following pattern of residues:

But now let’s consider the whole system modulo k. Then there are just k possible values, and the multiway graph must be finite. For example, for we get:

Dropping the “transient parts” leaves just:

These graphs can be thought of as reductions of the multiway graph (and, conversely, the multiway graph is a covering of them). The graphs can also be thought of as finite automata that define regular languages whose elements are the “2” and “3” transformations that appear on the edges. Any sequence of “2” and “3” transformations that can occur in the multiway graph must then correspond to a valid word in this regular language. But what we have seen is that for certain values of k, mergers in the multiway graph always occur at particular (“acceptor”) states in the finite automata.

In the case , every merger occurs at the 7 state. But by tracing possible paths in the finite automaton we now can read off what sequences of transformations can lead to a merger:

And what’s notable is that only a certain fraction of all possible sequences of length m can occur; asymptotically, about 28%.

The most stringent analogous constraints come from the graph:

And we see that even for sequences of length 3 fewer are allowed than from the graph:

Asymptotically the number of allowed sequences is about 3% of the possible. And so the conclusion is that if one wants to find mergings in the multiway graph it’s not necessary to tree out all possible sequences of transformations; one only needs at most the 30× smaller number of sequences “accepted by the mod-144 finite automaton”. It’s possible to do a little better than this, by looking not just at sequences allowed by the finite automaton for a particular k, but at finite automata for a collection of values of k (say as in the table above).

But while these techniques deliver significant practical speedups they do not seem to significantly alter the asymptotic resources needed. So what will it take to determine whether the pair {21, 31} ever merges?

I don’t know. And for example I don’t know any way to find an upper bound on the number of steps after which we’d be able to say “if it hasn’t merged yet, it never will”. I’m sure that if we look at different branch pairs, there will be tricks for particular cases. But I suspect that the general problem of determining merging will show computational irreducibility, and that for example there will be no fundamentally better way to determine whether a particular branch pair has merged after t steps than by essentially enumerating every possible evolution for that number of steps.

But if this is the case, it means that the general infinite-time question of whether a branch pair will merge is undecidable—and can never be guaranteed to be answerable with a bounded amount of computational effort. It’s a lower bar to ask whether the question can be answered using a finite proof in, say, Peano arithmetic. And I think it’s very likely that the overall question of whether all branch pairs merge—so that the system is confluent—is a statement that can never, for example, be established purely within Peano arithmetic. There are quite a few other candidates for the “simplest ‘numerical’ statement independent of Peano arithmetic”. But it seems at least conceivable that this one might be more accessible to proof than most.

It’s worth mentioning, by the way, that (as we have seen extensively in the Physics Project) the presence of confluence does not imply that a multiway system must show simple overall behavior. Consider for example the rule (also discussed at the end of the previous section):

Running for a few more steps, removing loose ends and rendering in 3D gives:

But despite this complexity, this is a confluent rule. It’s already an indication of this that mergings pretty much “keep up” with branchings in this multiway system:

The first few branchings (now all 3-way) are:

All the pairs here merge (often somewhat degenerately) in just a few steps. Here are examples of how they work:

Branchial Space and Numerical Value Space

Consider the first few steps of the rule

At each “layer” we can form a branchial graph by joining nodes that have common ancestors on the step before:

Continuing for a few more steps we get:

We can imagine (as we do in our Physics Project) that in an appropriate (if rather subtle) limit such branchial graphs can be thought of as defining a “branchial space” in which each node has a definite position. (One of many subtleties is that the particular branchial graphs we show here are specific to the particular “layering” of the multiway graph that we’ve used; different foliations would give different results.)

But whereas in our Physics Project and many other applications of the multicomputational paradigm the only real way to define “positions” for nodes in the multiway graph is through something like branchial space, there is a much more direct approach that can be taken in multiway systems based on numbers—because every node is labeled by a number which one can imagine directly using as a coordinate.

As an example, let’s take the multiway graph above, and make the horizontal position of each node be determined by its value:

Or, better, by the log of its value:

Continuing for more steps, we get:

Now, for example, we can ask—given the particular choice of layers we have made here—what the distribution of (logarithmic) values reached on successive layers will be, and one finds that the results converge quite quickly:

(By the way, in these results we’ve not included “path weights”, which determine how many different paths lead from the initial number to a particular result. In the example shown, including path weights doesn’t make a difference to the form of the final result.)

So what is the correspondence between the layout of nodes in “branchial space” and in “numerical value space”? Here’s what happens if we lay out a branchial graph using (logarithmic) numerical value as x coordinate:

Perhaps more useful is to plot branchial distance versus (logarithmic) numerical distance for every pair of connected nodes at a particular layer:

And at least in this case, there is perhaps a slight correlation to be seen.

Negative Numbers

The rules we’ve considered so far all involve only non-negative numbers. What happens if we include negative numbers? Generally the results are very similar to those with non-negative numbers. For example:

just gives

in which there is effectively both a “positive” and “negative” “web”.

A rule like

turns out to yield essentially only positive numbers, yielding after removing loose ends

gives a more balanced collection of positive and negative numbers (with positive numbers indicated by dark nodes), but the final graph is still quite similar:

So far we’ve considered only rules based on ordinary arithmetic functions. As a first example of going beyond that, consider the rule:

Running this for 50 steps we get:

A notable feature here is that only one “fresh” node is added at each step—and the whole thing grows like a Fermat spiral. After 250 steps the multiway graph has the form

which we can readily see is essentially a “binary tree superimposed on a spiral”.

Dividing by 3 instead of 2 makes it a ternary tree:

Using Round instead of Floor gives a mixed binary and ternary tree:

What about rules of the form:

Here are the results for a few values of a:

Continuing

for more steps we get:

has far fewer “loose ends”:

What are the “grid patches”? Picking out some of the patches we can see they’re places where a number that can be “halved a lot” appears—and just like in our pure multiplication rules above, and 3n represent commuting operations that form a grid:

Conditional Division, Inverse Iterations and the 3n+1 Problem

Including Floor[] is a bit like having different functions for even and odd n. What happens if we do this more explicitly? Consider for example

The result is essentially identical to the Floor case:

Here are a couple of other cases, at least qualitatively similar to what we’ve seen before:

But now consider as we did at the beginning:

What is the inverse of this? One can think of it as being

which gives for example

or continuing for longer:

How about

Now the “inverse” is:

But in this case since most numbers are not reached in the original iteration, most “don’t have inverses”. However, picking an initial number like 4495, which happens to be a merge point, yields:

Note that this “inverse iteration” always monotonically decreases towards 0—reaching it in at most steps.

But now we can compare with the well-known 3n+1 problem, defined by the “singleway” iteration:

And while in this case the intermediate numbers sometimes increase, all known initial conditions eventually evolve to a simple cycle:

But now we can “invert” the problem, by considering the rule:

equivalent to

which gives after 10 steps:

Continuing this to 25 steps one gets:

Removing loose ends this then becomes:

or after more steps, and rendered in 3D:

The 3n+1 problem now asks whether as the multiway graph is built, it will eventually include every number. But from a multicomputational point of view there are new questions to ask—like whether the “inverse-3n+1-problem” multiway system is confluent.

The first few branchings in the multiway graph in this case are

and all of these re-merge after at most 13 steps. The total number of branchings and mergings on successive steps is given by:

Including more steps one gets

which suggests that there is indeed confluence in this case—though, like for the problem of termination in the original 3n+1 problem, it may be extremely difficult to determine this for sure.

Other Kinds of Rules

All the rules we’ve used so far are—up to conditionals—fundamentally “linear”. But we can also consider “polynomial” rules. With pure powers, as in

the multiway graph is just the one associated with the addition of exponents:

In a case like

the graph is a pure tree

while in a case like

there is “early merging”, followed by a pure tree:

There are also cases like

which lead to “continued merging”

but when loose ends are removed, they are revealed to behave in rather simple ways:

In a case like

however, there is at least slightly more complicated merging (shown here after removing loose ends):

If we include negative numbers we find cases like:

But in other “polynomial” cases one tends to get only trees; a merging corresponds to a solution to a high-degree Diophantine equation, and things like the ABC conjecture tend to suggest that very few of these exist.

Returning to the “linear” case, we can consider—as we did above—multiway graphs mod k. Such graphs always have just k nodes. And in a case like

with graph

they have a simple interpretation—as “remainder graphs” which one can use to compute a given input number n mod k. Consider for example the number 867, with digits 8, 6 and 7. Start at the 0 node. Follow 8 red arrows, followed by a blue one, thus reaching node 3. Then follow 6 red arrows, followed by blue. Then 7 red arrows, followed by blue. The node that one ends up on by this procedure is exactly the remainder. And in this case it is node 6, indicating that Mod[867, 7] is 6.

Not too surprisingly, there is a definite structure to such remainder graphs. Here is the sequence of “binary remainder graphs” generated from the rule

for successive values of k:

Continuing a number-theoretical theme, we may note that the familiar “divisor graph” for a number can be considered as a multiway graph generated by the rule:

Here’s an example for 100:

Transitive reduction gives a graph which in this case is essentially a grid:

Other initial numbers can give more complicated graphs

but in general the transitive reduction is essentially a grid graph of dimension PrimeNu:

As an alternative to looking at divisors, we can look, for example, at a rule which transforms any number to the list of numbers relatively prime to it:

The transitive reduction of this is always trivial, however:

One general way to “probe” any function is to look at a multiway graph generated by the rule:

Here, for example, is the result for

starting with

Once again, the transitive reduction is very simple:

As another example, we can look at:

where each “efflorescence” corresponds to a prime gap:

As a final example we can consider the digit-reversal function:

Non-Integer Values

In almost everything we’ve discussed so far, we’ve been considering only integer values, both in our rules and our initial conditions. So what happens if we start a rule like

with a non-integer value? Rather than taking a specific initial value, we can just use a symbolic value x—and it then turns out that the multiway graph is the same regardless of the value of x, integer or non-integer:

What if the rule contains non-integer values? In a case like

the basic properties of addition ensure that the multiway graph will always have the same grid structure, regardless of a, b and the initial value x:

But in a case like

things are more complicated. For arbitrary symbolic a, b and initial x, there are no relations that apply, and so the multiway graph is a pure tree:

For a specific value of b, however, there are already relations, and a more complicated structure develops:

Continuing for more steps and removing loose ends we get

which is to be compared to the result from above for , :

What happens if we choose a non-integer value of b, say:

We immediately see that there are “special relations” associated with and its powers:

Continuing for longer we get the somewhat complex structure:

or in a different rendering with loose ends removed:

This structure is very dependent on the algebraic properties of . For a transcendental number like π there are no “special relations”, and the multiway graph will be a tree. For we get

and for :

Complex Numbers

There are many possible generalizations to consider. An immediate one is to complex integers.

For real numbers always generates a grid. But for example

instead generates

Continuing for longer, the graph becomes:

One feature of having values that are complex numbers is that these values themselves can be used to define coordinates to lay out the nodes of the multiway graph in the plane—giving in this case:

or after more steps:

Similarly

gives

The non-branching rule

yields

while

gives

If we combine multiplication with addition, we get different forms—and we can make some interesting mathematical connections. Consider rules of the form

where c is some complex number. I considered such rules in A New Kind of Science as a practical model of plant growth (though already then I recognized their connection to multiway systems). If we look at the case

the multiway graph is structurally just a tree:

But if we plot nodes at the positions in the complex plane corresponding to their values we get:

Continuing this, and deemphasizing the “multiway edges” we see a characteristic “fractal-like” pattern:

Note that this is in some sense dual to the typical “line segment iteration” nested construction:

Adding a third “real” branch

we get

And with

the result builds up to a typical Sierpinski pattern:

These pictures suggest that at least in the limit of an infinite number of steps there will be all sorts of merging between branches. And indeed it is fairly straightforward to prove this. But what about after, say, t steps?

The result from each branch for the rule

is a polynomial such as

So now the question of merging becomes a question of finding solutions to equations which equate the polynomials associated with different possible branches. The simplest nontrivial case equates branch {1, 1} with branch {2, 2}, yielding the equation:

with solution

We can see this merging in action with the rule:

The core of what it generates is the repetitive structure:

A few additional results are (where the decimals are algebraic numbers of degree 6, and a is a real number):

In a case like

there is an “early merger”

but then the system just generates a tree:

The family of rules of the form

shows more elaborate behavior. For we get:

Continuing for more steps this becomes:

For we get instead:

If we look at the actual distribution of values obtained by such rules we find for example:

If we go beyond multiway systems with pure “1 + c n” rules we soon get results very similar to ones we’ve seen in previous sections. For example

gives multiway graph (after removing loose ends)

Placing nodes according to their numerical values this then has the form:

Collections of Numbers, and Causal Graphs

In studying multiway systems based on complex numbers we’re effectively considering a special case of multiway systems based on collections of numbers. If the complex-number rules are linear, then what we have are iterated affine maps—that form the basis for what I’ve called geometric substitution systems.

As a slightly more general case we can consider multiway systems in which we take pairs of numbers v and apply the rule

where now a and b are matrices. If both matrices are the form then this is equivalent to the case of complex numbers. But we can also for example consider a rule like

which yields

or after more steps and in a different rendering:

Laying this out in 2D using the actual pairs of numbers as coordinates, this becomes:

Here are samples of typical behavior with 0, 1 matrices:

Beyond pure matrix multiplication, we can also consider a rule that adds constant vectors, as in:

We can also think in a more “elementwise” way, constructing for example simple rules such as

This generates the multiway graph:

Continuing for longer and removing loose ends yields:

Using values as coordinates then gives:

In our Physics Project and other applications of multicomputation, we often discuss causal graphs, that track the causal relationships between updating events. So why is it that these haven’t come up in our discussion of multiway systems based on numbers? The basic reason is that when our states are individual numbers, there’s no reason to separately track updating events and transformations of states because these are exactly the same—because every time a state (i.e. a number) is transformed the number as a whole is “consumed” and new numbers are produced. Or, in other words, the flow of “data” is the same as the flow of “causal information”—so that if we did record events, there’d just be one on each edge of the multiway graph.

But the story is different as soon as our states don’t just contain individual “atomic” things, like single numbers. Because then an updating event can affect just part of a state—and asking what causal relationships there may be between events becomes something separate from asking about the transformation of whole states.

With a rule of the form, say,

things are still fairly trivial. Yes, there are separate “x” and “y” events. But they don’t mix, so we’ll just get two independent causal graphs. Things can be less trivial in a case like the one above, of the form:

But now there is a different problem. Let’s say that the rule transforms {x, y} to {y + 1, x + 1}. How should we decompose that into “elementary events”? We could say there’s one event that swaps x and y, and others that add 1. Or something different. It’s hard to know.

So why haven’t we encountered this kind of problem in other multicomputational systems, say in hypergraph rewriting systems or string substitution systems? The point is that in these systems the underlying elements always have a certain unique identity, which allows their “flow” to be traced. In our Physics Project, for example, each hypergraph updating event that occurs affects certain particular “atoms of space” (that we can think of as being labeled by unique identifiers)—and so we can readily trace how the effects of different events are related. Similarly, in a string substitution system, we can trace which characters at which positions in the string were affected by a given event, and we can then trace which new characters at which new positions these affect.

But in a system based on numbers this tracing of “unique elements” doesn’t really apply. We might think of 3 as being . But there’s nothing that uniquely tags these 1s, and allows us to trace how they affect 1s that might make up other numbers. In a sense, the whole point of numbers is to abstract away from the labeling of individual objects—and just ask the aggregate question of “how many” there are. So in effect the “packaging” of information into numbers can be thought of as “washing out” causal relationships.

When we give a rule based on numbers what it primarily does is to specify transformations for values. But it’s perfectly possible to add an ancillary “causal rule”, that, for example, can define which elements in an “input” list of numbers should be thought of as being “used as the inputs” to produce particular numbers in an output list of numbers.

There’s another subtlety here, though. The point of a multiway graph is to represent all possible different histories for a system, corresponding to all possible sequences of transformations for states. A particular history corresponds to a particular path in the multiway graph. And if—as in a multiway system based on single numbers—each step in this path is associated with a single, specific event, then the causal graph associated with a particular history will always be trivial.

But in something like a hypergraph- or string-based system there’s usually a nontrivial causal graph even for a single path of history. And the reason is that each transformation between states can involve multiple events—acting on different parts of the state—and there can be nontrivial causal relationships between these events “mediated” by shared elements in the state.

One can think of the resulting causal graph as representing causal relationships in “spacetime”. Successive events define the passage of time. And the layout of different elements in each state can be thought of as defining something like space. But in a multiway system based on single numbers, there isn’t a natural notion of space associated with each state, because the states are just single numbers which “don’t have enough structure” to correspond to something like space.

If we’re dealing with collections of numbers, there’s more possibility of “having something like space”. But it’s easiest to imagine this when one’s dealing with very large collections of numbers, and when the “locations” of the numbers are more important than their values—at which point the fact that they’re numbers (rather than, say, characters in a string) doesn’t make much difference.

But in a multiway system one’s dealing with multiple paths of history, not just one. And one can then start asking about causal relationships not just within a single path of history, but across different paths: a multiway causal graph. And that’s the kind of causal graph we’ll readily construct for a multiway system based on numbers. For a system based on strings or hypergraphs there’s a certain wastefulness to starting with a standard multiway graph of transformations between states. Because if one looks at all possible states, there’s typically a lot of repetition between the “context” of different updating events.

And so an alternative approach is to look just as the “tokens” that are involved in each event: hyperedges in a hypergraph, or runs of characters in a string. So how does it work for a multiway system based on numbers? For this we have to again think about how our states are decomposed for purposes of events, or, in other words, what the “tokens” in them are. And for multiway systems based on single numbers, the natural thing is just to consider each number as a token.

For collections of numbers, it’s less obvious how things should work. And one possibility is to treat each number in the collection as a separate token, and perhaps to ignore any ordering or placement in the collection. We could then end up with a “multi-token” rule like

whose behavior we can represent with a token-event graph:

But given this, there is then the issue of deciding how collections of tokens should be thought of as aggregated into states. And in general multi-token numerical multiway systems represent a whole separate domain of exploration from what we have considered here.

A basic point, however, is that while our investigations of things like hypergraph and string systems have usually had a substantial “spatial component”, our investigation of multiway systems based on numbers tends to be “more branchial”, and very much centered around the relationships between different branches of history. This does not mean that there is nothing “geometrical” about what is going on. And in fact we fully expect that in an appropriate limit branchial space will indeed have a geometrical structure—and we have even seen examples of this here. It is just that that geometrical structure is—in the language of physics—about the space of quantum states, not about physical space. So this means that our intuition about ordinary physical space won’t necessarily apply. But the important point is that by studying multiway systems based on numbers we can now hope to sharpen our understanding and intuition about things like quantum mechanics.

Much More to Explore…

The basic setup for multiway systems based on numbers is very simple. But what we’ve seen here is that—just like for so many other kinds of systems in the computational universe—the behavior of multiway systems based on numbers can be far from simple.

In many ways, what’s here just scratches the surface of multiway systems based on numbers. There is much more to explore, in many different directions. There are many additional connections to traditional mathematics (and notably number theory) to be made. There are also questions about the geometrical structures that can be generated, and their mathematical characterization.

In the general study of multicomputational systems, branchial—and causal—graphs are important. But here we have barely begun to consider them. A particularly important issue that we haven’t addressed at all is that of alternative possible foliations. In general it has been difficult to characterize these. But it seems possible that in multiway systems based on numbers these may be amenable to investigation with some kind of mathematical techniques. In addition, for things like our Physics Project questions about the coordinatization of branchial space are of great significance—and the “natural coordinatizability” of numbers makes multiway systems based on numbers potentially an attractive place to study these kinds of questions.

Here we’ve considered only ordinary multiway systems, in which the rules always transform one object into several. It’s also perfectly possible to study more general multicomputational systems in which the rules can “consume” multiple objects—and this is particularly straightforward to set up in the case of numbers.

Here we’ve mostly looked at multiway systems whose states are individual integers. But we can consider other kinds of numbers and collections of numbers. We can also imagine generalizing to other kinds of mathematical objects. These could be algebraic constructs (such a polynomials) based on ordinary real or complex numbers. But they could also, for example, be objects from universal algebra. The basic setup for multiway systems—involving repeatedly applying functions—can be thought of as equivalent to repeatedly multiplying by elements (say, generators) of a semigroup. Without any relations between these elements, the multiway graphs we’ll get will always be trees. But if we add relations things can be more complicated.

Multiway systems based on semigroups are in a sense “lower level” than ones based on numbers. In something like arithmetic, one already has immediate knowledge of operations and equivalences between objects. But in a semigroup, these all have to be built up. Of course, if one goes beyond integers, equivalences can be difficult to determine even between numbers (say different representations of radicals or, worse, transcendental numbers).

In their basic construction, multiway systems are fundamentally discrete—involving as they do discrete states, discrete branches, and discrete notions like merging. But in our Physics Project and other applications of the multicomputational paradigm it’s often of interest to think about “continuum limits” of multiway systems. And given that real numbers provide the quintessential example of a continuum one might suppose that by somehow looking at multiway systems based on real numbers one could understand their continuum limit.

But it’s not so simple. Yes, one can imagine allowing a whole “real parameter’s worth” of outputs from the multiway rule. But the issue is how to “knit these together” from one step to the next. The situation is somewhat similar to what happens when one looks at ensembles of random walks, or stochastic partial differential equations. But with multiway systems things are both cleaner and more general. The closest analogy is probably to path integrals of the kind considered in quantum mechanics. And in a sense this is not surprising, because it is precisely the appearance of multiway systems in our Physics Project that seems to lead to quantum mechanics—and in a “continuum limit” to the path integral there.

It’s not clear just how multiway systems are best generalized to the continuum case. But multiway systems based on numbers seem to provide a potentially promising bridge to existing mathematical investigations of the continuum—and I think have a good chance of revealing some elegant and powerful mathematics.

I first looked at multiway systems based on numbers back in the early 1990s, and I always meant to come back and look at them further. But what we’ve found here is that they’re richer and more interesting than I ever imagined. And particularly from what we’ve now seen I expect them to have a very bright future, and for all sorts of important science and mathematics to connect to them, and flow from them.

Thanks

I worked on what’s described here during two distinct periods: May 2020 and September 2021. I thank for help of various kinds Tali Beynon, José Manuel Rodríguez Caballero, Bernat Espigule-Pons, Jonathan Gorard, Eliza Morton, Nik Murzin, Ed Pegg and Joseph Stocke—as well as my weekly virtual high-school “Computational Adventures” group.

Multicomputation: A Fourth Paradigm for Theoretical Science

Stephen Wolfram — Thu, 09 Sep 2021 17:40:18 +0000

The Path to a New Paradigm

One might have thought it was already exciting enough for our Physics Project to be showing a path to a fundamental theory of physics and a fundamental description of how our physical universe works. But what I’ve increasingly been realizing is that actually it’s showing us something even bigger and deeper: a whole fundamentally new paradigm for making models and in general for doing theoretical science. And I fully expect that this new paradigm will give us ways to address a remarkable range of longstanding central problems in all sorts of areas of science—as well as suggesting whole new areas and new directions to pursue.

How Inevitable Is the Concept of Numbers?

Stephen Wolfram — Tue, 25 May 2021 11:16:57 +0000

Based on a talk at Numerous Numerosity: An interdisciplinary meeting on the notions of cardinality, ordinality and arithmetic across the sciences.

Everyone Has to Have Numbers… Don’t They?

The aliens arrive in a starship. Surely, one might think, to have all that technology they must have the idea of numbers. Or maybe one finds an uncontacted tribe deep in the jungle. Surely they too must have the idea of numbers. To us numbers seem so natural—and “obvious”—that it’s hard to imagine everyone wouldn’t have them. But if one digs a little deeper, it’s not so clear.

It’s said that there are human languages that have words for “one”, “a pair” and “many”, but no words for specific larger numbers. In our modern technological world that seems unthinkable. But imagine you’re out in the jungle, with your dogs. Each dog has particular characteristics, and most likely a particular name. Why should you ever think about them collectively, as all “just dogs”, amenable to being counted?

Wolfram Institute Bulletins

Games and Puzzles as Multicomputational Systems

Humanizing Multicomputational Processes

Multicomputational Irreducibility

A Review of Computational Irreducibility

Multicomputational Irreducibility

A Pure n-Machine Definition

Approximation of Multicomputational Irreducibility Using a Branchial Lyapunov Exponent

Foliation as Multicomputational Choice

An Interlude: On Non-Archimedean Reductions and Rulial Primes

ϱ-Varieties: A Multicomputational Response to Arithmetic and Algebraic Geometry

Some Next Steps: Metamodeling Physics

Conclusion

A Note of Appreciation

Twenty Years Later: The Surprising Greater Implications of A New Kind of Science

From the Foundations Laid by A New Kind of Science

On the Concept of Motion

How Is It That Things Can Move?

The Physicalization of Metamathematics and Its Implications for the Foundations of Mathematics

Mathematics and Physics Have the Same Foundations

The Concept of the Ruliad

The Entangled Limit of Everything

Pregeometric Spaces from Wolfram Model Rewriting Systems as Homotopy Types

Multicomputation with Numbers: The Case of Simple Multiway Systems

A Minimal Example of Multicomputation

Multiway Systems Based on Addition

Pure Multiplication

Multiplication and Addition: n ⟼ {a n, n + b}

The Rule n ⟼ {2n + 1, 3n + 1}

The More General “Affine” Case: n ⟼ {a n + b, c n + d}

The Phenomenon of Confluence

Branchial Space and Numerical Value Space

Negative Numbers

“Floor” and Related Rules

Conditional Division, Inverse Iterations and the 3n+1 Problem

Other Kinds of Rules

Non-Integer Values

Complex Numbers

Collections of Numbers, and Causal Graphs

Much More to Explore…

Thanks

Multicomputation: A Fourth Paradigm for Theoretical Science

The Path to a New Paradigm

How Inevitable Is the Concept of Numbers?

Everyone Has to Have Numbers… Don’t They?