Blog

IMC interns tackle real-world projects

At IMC, we expect our interns to make a genuine impact. That's why they're given real projects that solve real business problems.

IMC interns tackle real-world projects
At IMC, we expect our interns to make a genuine impact. That's why they're given real projects that solve real business problems. With over 20 Engineering interns, there are too many projects to recount, so we have selected two projects that highlight the breadth of work that was done over the summer.

The brief

Enhance the use of key internal tools to improve the visibility and analysis of key components within our trading system.

Introducing Pfilter

Feedweb is an internal tool at IMC used to view historical feeds from the financial exchanges we're connected to, stored in Packet Capture (PCAP) files. Each protocol IMC interfaces with has a decoder in Feedweb. This allows us to parse the PCAP into a human-readable display format. Viewing the feed is often helpful for debugging.

One pain point with Feedweb is that it doesn't have inbuilt filtering. We can get a lot of data from the exchange and browsing through thousands of messages sent every few seconds can make it hard to find points of interest.

A possible solution to this problem is to use grep or other filtering tools on Feedweb's output. However, this can be extremely slow, as this requires Feedweb to parse every packet in the file and allocate a human-readable version of it on the standard output before it can be filtered. It is also imprecise as these tools filter on text and aren't aware of the types of messages and their fields.

Our project objective was to extend Feedweb to be able to support a built-in data filtering function called Pfilter. Switching our filtering process from grep to Pfilter would mean we'd be able to filter packets as Feedweb processed them, avoiding the cost of outputting the packets removed by the filter.

A large part of this project was the construction of a bespoke query language for handling the statically typed exchange messages we're dealing with. This required us to write a grammar and a parser:

An example Pfilter expression:

With this language, filters can be expressed on the different types of messages we can receive and the fields on those messages.

The parser outputs a filtering tree that can be evaluated with a packet to validate it against the filtering expression.

IMC deals with hundreds of exchange protocols and it was not reasonable to hand code support for each message of every protocol. This challenge was compounded by the fact that the query, only becoming available at runtime, passed as a command line argument while the types of messages filtered on are available only at compile time. With no reflection in C++, we solved both these problems through code generation of our own reflection substitute. This allowed our project to parse the filter string given at the command line and look up the message types and their fields in our generated metadata, available for all protocols without handwriting required.

The end result is that Feedweb now has its own filtering support which, depending on the queries used, has shown itself in practice to be more than 40 times faster than the fastest grep implementations. Pfilter has been used in both the Sydney and Chicago offices. The solution is also proving valuable for live filtering of packets in another intern project, known as Ringtail.

Introducing Ringtail

One of the first adopters of Pfilter was another intern project in the same cohort: Ringtail.

Many systems at IMC are designed as split components. A controller communicates with a separate 'engine' process, which is responsible for most time-critical operations. For performance reasons, there is relatively little logging involved on either end. Even on the controller side, which is somewhat less sensitive, it is important not to hold up the updates sent to the engine, so that the engine isn't acting off outdated information.

IMC already has systems in place to log specific information. But prior to Ringtail, these systems could only go so far. Ideally, we needed to be able to retain all the communication between the engine and controller without the need for costly logging at either end.

These two components run as separate processes on the same machine, communicating by passing messages through a shared memory-mapped ringbuffer.

This is where Ringtail comes in. It takes the form of a separate component that attaches to the ringbuffer in order to capture all of its communications for processing further down the line. We elected to store the output as a (compressed) PCAP file. Even for data that isn't actually from packet capture, the format provides a simple and easily understood structure for message-oriented binary data. Many existing IMC systems make use of PCAP files in this way, so we were able to leverage a variety of useful existing tooling by doing so. We designed the recorder so that even if it stopped working completely, it would not interfere with the live ringbuffer. By default, the engine-to-controller ringbuffer we were reading was blocking. In other words, the writer is aware of the reader's progress and will fail to write a new message if there's no space in the buffer. This makes sense for the default use case, but it was obviously suboptimal for our reader. The worst-case scenario would be for Ringtail to crash and stop updating its index, locking up a live trading system.

To guard against this, we implemented a new non-blocking variant of the ringbuffer reader. This allowed us to snoop on the messages flowing through without ever touching the ringbuffer, with the controller and engine being none the wiser. A potential drawback of this approach was that if the Ringtail recorder was delayed for any reason, it could be overtaken and lose data. Although, in reality, this is highly unlikely as the recorder rarely does anything more than compress and write its output to a file. We decided that this possibility was preferable to the alternative, which could potentially result in the whole engine failing.

Towards the end of our Internship (and in the time since), Ringtail has grown from its original roots into somewhat of a 'Swiss army knife' of ringbuffer tools.

Since its launch, we added the ability to run the recorder in printing mode. This allows a developer or systems engineer troubleshooting a live system to connect live to a ringbuffer in order to instantly visualise the message stream.

We have also added support for Pfilter expressions in both the printing recorder and replay tools, allowing users to easily build queries based on message content.

More recent contributions (some of them from interns themselves) have extended Ringtail to:

  • Support a variety of other flavours of IMC ringbuffers.
  • Ship captured output over Kafka or directly over UDP to avoid clogging up the disk.
  • Go beyond the ringbuffer and capture directly from the network card.

One particularly interesting intern project made use of Ringtail's (by that time) existing ability to output to Kafka to provide a better storage solution for another project's own analysis recording system.

Ringtail is now widely deployed across IMC, and regularly used to investigate production issues across a wide range of systems.

These are just wo examples of countless real-world projects that IMC Interns have the opportunity to work on. Seeing their work make a genuine, positive impact to the business is the best way for our interns to learn and build on their skills.

Interested in becoming an IMC intern?


Careers