This is a mobile version, full one is here.

Yegor Bugayenko
21 October 2021

Objectionary: Dictionary and Factory for EO Objects

Since the time of Kernighan and Ritchie we share binary code in libraries. You need to print some text with printf() in C++? You get libc library with 700+ other functions inside. You need to copy a Java stream? You get Apache Commons IO with copy() and 140+ other methods and classes. The same happens in all languages I’m aware of, like Ruby, Python, JavaScript, PHP: you need an object, or a class, or a function, or a method—you have to add the entire library to your build. Wouldn’t it be more elegant to deal with individual objects instead?

The idea is not new and not mine. I got it from the book Object Thinking by David West, where he suggested creating an Objectionary (page 306), a “combination of dictionary and object factory,” with the following properties:

The total number of objects is less than 2000;
Each object is an autonomous executable entity;
Every object has a unique ID and a unique “address”;
Objects are nothing more than collections of objects;
Objects require hardware-specific VMs for execution.

Seventeen years later (the book was published in 2004), we implemented the idea on top of EO, our new programming language. The language is intentionally much simpler than Java or C++. You can read its more or less formal description here.

To turn an EO program into an executable entity and release it to the Objectionary, one has to go through the following mandatory steps, assuming the JVM is used as a target platform (the steps marked with 🌵 are implemented by our eo-maven-plugin):

Assemble🌵:
- Parse🌵: .eo ➜ .xmir
- Optimize🌵: .xmir ➜ better .xmir
- Discover🌵: find all foreign aliases
- Pull🌵: download foreign .eo objects
- Resolve🌵: download and unpack .jar artifacts
- Place🌵: move artifact .class files to target/classes/
- Mark🌵: mark .eo sources found in .jar as foreign
- ↑ Go back to Parse if some .eo files are still not parsed
Transpile🌵: .xmir ➜ .java
Assemble🌵: same as above, but for tests
Compile: .java ➜ .class
Test: run all unit tests
Unplace🌵: remove artifact .class files
Unspile🌵: remove auto-generated .java files
Copy🌵: copy .eo files to EO-SOURCES/ inside .jar
Deploy: package .jar artifact and put it into Maven Central
Push: send a pull request to yegor256/objectionary
Merge: we test and merge the pull request

It is an iterative process, which loops over and over again until all required .eo objects are parsed and their atoms are present as .class files. Then, all .xmir files are transpiled to .java and then compiled to .class binaries. Then, tested, packaged, and deployed to Maven Central. Then, merged to the master branch of Objectionary, via a pull request.

The first part of the algorithm can be automated with our Maven plugin, simply by placing .eo sources in src/main/eo/ and adding this to pom.xml:

<project>
  <build>
    <plugins>
      <plugin>
        <groupId>org.eolang</groupId>
        <artifactId>eo-maven-plugin</artifactId>
        <version><!-- Take it from Maven Central --></version>
        <executions>
          <execution>
            <goals>
              <goal>register</goal>
              <goal>assemble</goal>
              <goal>transpile</goal>
              <goal>copy</goal>
              <goal>unplace</goal>
              <goal>unspile</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
    [...]
  </build>
  [...]
</project>

The register goal will scan the src/main/eo/ directory, find all .eo sources, and “register” them in a special CSV catalog at target/eo-foreigns.csv. Next, the assemble goal will call the following goals: parse, optimize, discover, pull, and resolve. All these goals use the CSV catalog when they parse, optimize, pull and so on.

When all of them are done, assemble checks the catalog: do any .eo files still require parsing? If they do, another cycle starts, again with parsing. When all .eo files are parsed, the goal transpile is executed, which turns .xmir files into .java and places them into target/generated-sources. The rest is done by the standard maven-compiler-plugin.

Let’s discuss each step in detail.

Parse 🌵

Say, this is the .eo source code at src/main/eo/hello.eo:

+alias org.eolang.io.stdout
[] > hello
  "Jeff" > user
  stdout > @
    "Hello, %s!"
    user

It will be parsed to this XMIR (XML Intermediate Representation):

<program>
  <o name="hello" line="1">
    <o name="user" data="string" line="2">Jeff</o>
    <o name="@" base="stdout" line="3">
      <o data="string" line="4">Hello, %s!</o>
      <o base="user" line="5"/>
    </o>
  </o>
</program>

If you wonder what this XML means, read this document: there is a section about XMIR.

Optimize 🌵

At this step the XMIR produced by the parser goes through many XSL transformations, sometimes getting additional elements and attributes. Our example XMIR may get a new attribute @ref, pointing the reference to the object user to the line where the object was defined:

<program>
  <o name="hello" line="1">
    <o name="user" data="string" line="2">Jeff</o>
    <o name="@" base="stdout" line="3">
      <o data="string" line="4">Hello, %s!</o>
      <o base="user" line="5" ref="2"/>
    </o>
  </o>
</program>

Some XSL transformation may check for grammar or semantic errors and add a new element <errors/> if something wrong is found. Thus, if parsing didn’t find any syntax errors, all other errors will be visible inside the XMIR document, for example, like this:

<program>
  <errors>
    <error line=>The program has no package</error>
  </errors>
  <o name="hello" line="1">
    <o name="user" data="string" line="2">Jeff</o>
    <o name="@" base="stdout" line="3">
      <o data="string" line="4">Hello, %s!</o>
      <o base="user" line="5" ref="2"/>
    </o>
  </o>
</program>

By the way, this is not a real error, I just made it up.

Discover 🌵

At this step we find out which objects are “foreign”. In our example, the object user is not foreign, since it’s defined in the code we have in front of us, while the object stdout is not defined here and that’s why is a foreign one.

Going through all .xmir files we can easily judge which object is foreign just by looking at their names. Once we see the reference to org.eolang.io.stdout, we check the presence of the file org/eolang/io/stdout.eo in the directory with all .eo sources. If the file is absent, we put the object name into the CSV catalog and claim it to be foreign.

Pull

Here we simply try to find source code .eo files for all foreign objects in Objectionary, by looking at its GitHub repository. For example, this is where we would find stdout.eo. We find them there and pull to the local disc.

Pay attention, we pull the sources. Not binaries or compiled XMIR documents, but the sources in .eo format.

Resolve 🌵

This is what stdout.eo may look like, after the pull:

+package org.eolang.io
+rt jvm org.eolang:eo-runtime:0.10.2
[text] > stdout /bool

The object is an atom. This means that even though we have its source code, it’s not complete without a piece of platform-specific binary code. An atom is an object implemented by the runtime platform, where the EO program is executed (also known as FFI mechanism). The line that starts with +rt (runtime) explains where to get the runtime code. The jvm part is the name of the runtime.

We go to Maven Central, find there the artifact org.eolang:eo-runtime:0.10.2, and unpack it (it’s a zip archive with .class files after all).

By the way, a program may contain a number of +rt meta instructions, for example:

+package org.eolang.io
+rt jvm org.eolang:eo-runtime:0.10.2
+rt ruby eo-core:0.5.8
+rt python eo-basics:0.0.3
[text] > stdout /bool

Here, three runtime platforms will know where to get the missing code for the stdout atom: EO➝Java will go to Maven Central for the JAR artifact, EO➝Ruby will go to RubyGems trying to find the gem by the name eo-core and version 0.5.8, while EO➝Python will go to PyPi trying to find eo-basics package with the version 0.0.3.

Place 🌵

Next we place all .class files found in the unpacked JAR, into the target/classes directory. We do this in order to help Maven Compiler Plugin find them in classpath.

Mark 🌵

In each JAR file that arrives we can find .eo sources. They are the programs this JAR file has had in classpath while it was built. We consider them as foreign objects too and add to the CSV catalog.

Transpile 🌵

When all foreign objects which are registered in the catalog are downloaded, compiled, and optimized, we are ready to start transpiling. Instead of compiling XMIR directly to Bytecode, we transpile it to .java and let Java compiler do the job of generating Bytecode.

We believe that there are a few benefits of transpiling to Java vs. compilation to Bytecode:

Output code is easier to read and debug,
Optimization power of existing compilers is reused,
Complexity of a transpiler is lower than of a compiler,
Portability of the output code is higher.

We already have two EO➝Java transpilers: canonical one and the one made by HSE University. We also have EO➝Python experimental transpiler made by students of Innopolis University. Most probably, when you read this article, there will be more transpilers available.

Even though we believe in transpiling, it’s still possible to create EO➝Bytecode, EO➝LLVM, or EO➝x86 compilers. You are more than welcome to try!

Compile

At this step, the standard Maven Compiler Plugin finds auto-generated .java files in target/generated-sources and turns them into .class files.

Unplace 🌵

Here, we remove all .class files unpacked from dependencies. This is necessary, in order to avoid getting them packaged into the final JAR.

We do placing and then unplacing simply because Maven Compiler Plugin doesn’t allow us to extend classpath in runtime. If it would be possible, we would just download dependencies from Maven Central and add them to classpath, without unpacking, placing, and then unplacing.

Unspile 🌵

Here, we delete all .class files from the target/classes/ directory, which were auto-generated from .eo. We don’t want to ship binaries, which can be generated from .eo sources. We only want to ship atoms, which are .java files originally.

Copy 🌵

At this step we take all .eo sources from src/main/eo/ and copy them to target/classes/EO-SOURCES/ directory. Later, they will be packaged together with .class files into a .jar, which will be deployed to Maven Central. While copying, we replace 0.0.0 in the runtime version to the currently deploying version. Take a look at the file stdout.eo, in its source repository:

+package org.eolang.io
+rt jvm org.eolang:eo-runtime:0.0.0
[text] > stdout /bool

The version at the +rt line is 0.0.0. When sources are copied to the JAR, this text is replaced.

The motivation to ship sources together with binaries is the following. When atom binaries are compiled from Java to Bytecode, they stay next to transpiled sources. They are compiled together. Moreover, unit tests also rely on both atom sources and auto-generated/transpiled sources. We want future users of the JAR to know what sources we had in place when the compilation was going on, to maybe let them reproduce it or at least know what were the surroundings of the binaries they get.

From a more practical standpoint, we need these sources in the JAR in order to let the Mark step understand what objects are worth pulling next to the atoms resolved.

Deploy

Here, we package everything from target/classes/ into a JAR archive and deploy it to Maven Central.

I suggest deploying sources to GitHub Pages too, to let users see them on the Web. Also, it will be helpful later when we make a pull request to Objectionary. Check this .rultor.yml script in one of my EO libraries, it deploys .eo sources to GitHub Pages, substituting 0.0.0 version markers in them correctly.

Push

When the deployment is finished and Maven Central updates its CDN servers, it’s time to submit a pull request to yegor256/objectionary. The .eo sources of objects go into objects/ and their unit tests go into tests/. Basically, we just copy src/main/eo/ and src/test/eo over there. But, stop… one important detail. In the sources, as was said earlier, we have +rt versions set to 0.0.0. Here, when we copy to Objectionary, versions must be set to real numbers.

Merge

When the pull request arrives, a GitHub Action pre-configured in the yegor256/objectionary repository transpiles all .eo sources to all known platforms and runs all unit tests. If everything is clean, we review the pull request and decide whether the objects suggested go along with others already present in the Objectionary.

Once the pull request is merged, the objects become part of the centralized dictionary of all objects of EO. Take a look at this pull request, where a new object was submitted to Objectionary, after its atom was deployed to Maven Central.