Contributing to Tapirx

February 23, 2019 ยท View on GitHub

Tapirx is an open-source tool that you can use to discover and identify medical devices on a network. That's a broad category of devices and a big use case, so it's a community effort!

Every contribution is valuable, but several categories of improvement are worth highlighting:

  • Support for protocols that Tapirx doesn't understand yet. In particular, some devices use proprietary protocols that are specific to manufacturers or device models.
  • Support for message fields that may include extra identifiers. Some devices may use fields in unusual ways.
  • Integrating other products with Tapirx by creating REST API endpoints.
  • Tuning performance on high-throughput links.

Even if you're not writing code, you can contribute by sharing anonymized network captures (contributors can help you anonymize them) and testing new features.

Building development versions

We use a Makefile in order to facilitate setting version information at compile time (via -ldflags). Note that make actually runs go install, so the resulting executable will actually be in $GOPATH.

$ make
$ $GOPATH/bin/tapirx -version

Testing and code quality

We use CircleCI for automated testing. CircleCI runs functional tests and also style and "lint" style tests to make sure that code remains easy to read, well formatted, and thoroughly tested.

If you're developing on this codebase, here's a good workflow to maximize the chances that your code will pass all of CircleCI's checks:

$ go test
$ go vet
$ golint
$ gofmt -w .

The gofmt commant might modify one or more files so that they conform. Check with git status and git commit -m 'gofmt' if necessary.

When writing new features, add unit tests in *_test.go. Open a pull request on this project if you would like help deciding what and how to test.

Architecture

Tapirx uses Google's capable gopacket library to listen on an interface and expose frames/packets/datagrams/payloads to upper layers that watch for specific byte sequences.

The input to tapirx is a sequence of Ethernet frames, which may or may not have VLAN tags on them. This is what you get when you receive data from a SPAN port.

Tapirx examines one frame at a time and does not reconstruct streams. For each input frame, tapirx's job is to figure out whether it contains data that includes device identifiers. The code in *_decode.go is relatively self explanatory.

Notes on specific protocols

For HL7, tapirx can find identifiers when an HL7 packet fits entirely inside an MTU (i.e., within one frame). Fields that commonly contain identifiers can be found in hl7_decode.go and include PRT-* and OBX-18.

For DICOM, identifiers can often be found in DICOM Associate Request packets. This type of packet includes a Calling Application Entity Title. We need only the 74 first bytes of an Associate Request to determine that it is a well-formed packet and extract the identifier, so for this protocol we do not need packet reassembly.

Future goals

In order to detect identifiers from a wider range of traffic types, it would be good to implement packet reassembly so we can parse payloads that span multiple frames/packets. Notes on gopacket TCP reassembly will come in handy.