Python Redex Model

Intro

The goal behind this project was to implement the mathematical model behind a programming language - our language being Python. We thought this would be very easy since Python was such a popular language and we were pretty familiar with it. But there are are just so many intricacies about the Python language.

If you don't want to look at our Walmart remake of something beautiful, allow me to redirect you directly to the original paper and model which is much much much better and complete than what we created. Since this page is pretty long, this is your chance to leave.

Now that you've had your chance to leave, allow me to highlight some parts in our model.
All of the code is in this Github

What we did

Our main takeaways from the paper were regarding scope in Python, which is used in a way that's unique to Python and doesn't quite work the same in pretty much every other programming langauge. We also targeted the pass-by-reference data representation of data - a concept related to Python's weird scoping rules. We also implemented a two-register system - which returns the current expression and a store.

Model

This model was made using Racket (a super popular language) and using Redex, a package that is commonly used to implement models of programming languages.

If we start at the top,

Reference 1

The define-language section is our creation of this model. Here we're defining every possible thing that can be created, such as variables, expressions, types, primitive functions. Using these, we want to define the behavior of Python using this language base when we see certain keywords (see Github for examples).

So this is what's going on:

Line 6 refers to a store, a concept in programming languages that means the storage place for everythings. We can see here that our store is a list, which elements that are created as a pair consisting of a ref (reference value) and a v+undef (regular value).
Line 12 defines a value, which we see here can be a variety of expressions, but the two most notable ones include the two (triple ...) values. These say the same thing, except the former is describing a pure, predefined value, while the latter uses undefined variables. This essentially describes that a value in Python is stored as a combination of the variable name, an assigned metavalue (more on this later), and a dictionary.
Our metavalues (which are values that exist in our meta-language, which is Racket in this case) are defined on line 24. One of the fewtimes that we really needed these mvals are when we use them inside of the (triple ...) and we use them to define values.
Line 32-51 are all our expression values and is just a comprehensive list of all the possible types of data that may be implemented in Python. These types are most of the ones we'll be referencing throughout most of the model since expressions make up most of the Python language. We don't actually implement or use most of these expressions but they were included in our language for completeness.
Lines 57, 58, and 59 are just helper components we created to create and access various elements of our program since our stores are basically association list dictionaries, we needed a way to store different values, so these types are used so that our functions which call on these dictionaries only need to return one type of value.

There's some more various parts that aren't really too relevant because we either don't implement them or they're not very important, but this is our model language called λπ!

The boring stuff

There's a lot of stuff covered in the paper that we look over and didn't really cover in depth since they were pretty basic. They're all implemented in the model if you want to check it out but otherwise I'm only going to give a quick summary.

Figures 2, 3, 5, and 6 define creating and accessing identifiers and objects in Python. The paper defines the mathematics and logic behind these operations pretty clearly so we implemented them as is, for the most part. Figure 4 defines the primitive operations on a list, but since our focus for this project wasn't on lists, we decided not to implement this.

Here's a quick summary of these figures:

Figure 2: Creating identifiers with values bound to them and then getting the values bound to these identifiers. Basically, letting us define variables then using these variables.
Figure 3: The rules for creating objects. The 3 types of objects described here are objects in general, tuples, and sets. The objects are added into the store as triple values that are initialized with the value, the metavalue assigned to the object, and an empty (for now) dictionary. This is because a Python object value can be represented at it's basics as a Python type. This is the example of this from the paper:
Reference 2
So the type implementation we see here defines the object using the same three categories as defined in our triple values from Reference 1. The first field of the type is the name of the class (in this case X), the second field is the metavalue (where X is an instance of an object), and finally in the third field is the list of methods that this object holds, so we initialize our object using an empty dictionary since we aren't given what methods the object has.
Figure 5: The rules for accessing objects. The three functions defined here are fetch, set!, and alloc. These are just basic functions for accessing previously created objects.
- fetch passes in a reference value and gets the object located in the store at that reference.
- set! updates the value of the object at a passed in ref into a new value.
- alloc creates a new store with the new value added into it at the next value. Sort of like how malloc works in C and C++ but doesn't necessarily allocate the space but simply adds the value to it.
Figure 6: Allows access to fields within the objects. Since we'd want to populate the (probably) empty dictionaries in objects, Figure 6 allows us to edit, add, and get the fields within an object.

Figure 7

Now this is what most of our focus was on. Figure 7 was undoubtedly the most interesting and complicated part of this paper. Although it's just one function - and a pretty basic one at that - it took a lot of understanding to determine what it's doing. Figure 7 defines the algorithm for field lookups on classes

Reference 3

As you see here, our code pretty much matches exactly the description of the GetField-Class method in the paper. There's a lot of helper functions that we created, but they also match the mathematics closely.

When the model looks for a method within an object, it will either find it or it doesn't. If the expression exists, it will simply get you that method. But otherwise, it will use the "__mro__" (method resolution order) field to determine where to find the field. This field isn't ever defined by the programmer and is done automatically. The "__mro__" field is set to be the inheretence graph of the current object - all the parent objects for this current object (e.g. monkey -> mammal -> animal, so the monkey's "__mro__" field will be set to include the class and methods of the mammal class and animal class).

As seen in Reference 3, the GetField-Class also has helper functions class-lookup, class-lookup-mro, and fetch-pointer.

In order of most to least self-explanitory, these are what the helper functions do:

fetch-pointer: Just gets the value at a given ref.
class-lookup: Uses class-lookup-mro to get each parent class for the object.
class-lookup-mro: Gets the parent methods for the object using the dictionary of the object. If an object exists such that the string we're looking for is in the dictionary for the object at the passed in ref, then we'll return to pointer value that the string points to - otherwise, we'll recursively call class-lookup-mro on the rest of the dictionary.

Undefined values

As a quick side note, probably the most comedic thing about this paper was their description of undefined values. THe actual concept itself was pretty bland but the paper continuously referenced this nonexistent value as skull (no literally: ☠ was written everywhere - page 2 of the pdf). It's really humorous to consider that undefined values are literally symbolized by death (thinking about how much my programs error due to uninitialized variables). That's about it. There's not really much to say about it - I just wanted to highlight this aspect of the paper (great writing).

Other considerations

In our implementation of SetFieldUpdate and SetFieldAdd, we implement one restriction for our store by forcing the programmer to input the parameters in a certain order. This was because these two methods have to account for the scenario where object A could exist before object B OR B could exist before A. To account for this, we decided to specify the order that our objects exist in the store so we would need to write two nearly identical functions to account for both possibilities.
The paper doesn't specify regarding recursion in Python. We attempted to create tests and examples that implemented recursion but the model simply wouldn't allow it. Most attempts ended up in an error, but the few that didn't simply ran the program once through.
Similarly, the paper didn't specify how to link multiple functions together (i.e. one function calls another). So like our attempt at recursion, pretty much all attempts ended in error since the model couldn't handle these complex operations and would only be able to run the first instance of the function.
Our model also wasn't able to link together multiple setting and getting operations together. We came to realize this was partly due to #3, but also because we didn't implement a way to link things together. The paper describes the implementation in depth, but everything is done as a one-off don't really link together - which wasn't too big of an issue but it would have been cool to potentially model a whole program at once rather than single operations.

Examples

Here are some examples of the traces (basically Redex breaking down our program to show what's happening) of our model. There are some helpful arrows that show the order the program is running. The most important thing to pay attention to is the top of each box, or the first element of each expression, which shows what function is about to run.

Update field

Create object then get object

Create variable then get variable (while store is empty)

Create variable then get variable (while store is not empty)

Get object