This document defines the network protocol for communicating with the KNO.
In this specification, the term "number" means an ordered sequence of bytes (aka octets). When a number is transmitted the bytes are sent in order, and when a number is hashed, the bytes are fed to the hash function in order. Since the protocol is defined in terms of byte sequences, the concept of byte order does not apply at the protocol level (with the exception of the optional timestamp); interpreting the sequences of bytes as encodings of integers or bit streams is a higher level concept discussed separately in the section titled Number Encoding.
"client" means the entity that wishes to determine whether a number "N" is known or not.
"server" means the Known Number Oracle (KNO).
H(X) denotes the SHA-256 hash of X.
The vertical bar "|" denotes concatenation.
0 denotes the number consisting of 256/8 = 32 bytes of all zero bits.
0xAA denotes the single byte with hexadecimal value AA.
The server maintains a database of known numbers, indexed by their SHA-256 hash values.
The server also maintains a random 256-bit secret S which must remain fixed for the duration of the protocol interaction.
The client sends the message M0 = 0x00 | H(N).
The server receives M0, extracts the H(N) provided by the client, and constructs the value T = H(S | H(N)); this is a temporary secret to be withheld from the client until the reveal phase. The server then performs a database lookup with H(N) as the lookup key. If the lookup yields a known number K such that H(K) = H(N), the server constructs a commitment C = H(T | K). If the lookup is unsuccessful, it instead constructs a false commitment C = H(T | 0). The server then responds with the message M1 = 0x01 | C.
The client receives the server's response message M1 and extracts the commitment C.
The client sends the message M2 = 0x02 | N.
The server receives M2, extracts N, saves N in the database, constructs H(N) and T = H(S | H(N)), and responds with M3 = 0x03 | T; this is the reveal.
The client receives message M3, extracts T, and checks whether H(T | N) is equal to the commitment C received in message M1. If equal, the client determines that N must already have been known to the server at the time when it sent message M1, before N itself was transmitted in M3, and therefore N must be considered a "known number". If unequal, no such determination is made.
After a client has classified a number N as "known" by executing the protocol steps above, it may separately query the server for a timestamp indicating when the number became known to the server. This step is optional, and the timestamp provided is informational only; there is no way for the client to verify that the KNO is providing truthful timestamps or that they have not been modified in transit.
The client queries for the timestamp by sending the message M4 = 0x04 | N. The server stores N in the database and responds with M5 = 0x05 | TS, where TS is the time when N was first stored in the database, transmitted as an unsigned 64-bit integer count of seconds since the POSIX epoch, most significant byte first.
To execute the protocol over TCP, the client establishes a TCP connection to tcp.v1.kno.araneus.fi port 6350.
Each message is sent over the TCP connection unencoded, as a sequence of bytes. The server processes and responds to each request in order. Once the client has no further queries to perform, it should immediately close the TCP connection; connections should not be kept idle. To preserve resources, the server may close idle connections after a timeout of a few seconds.
Typically, the M0-M1 request/response pair and the M2-M3 request/response pair are executed over the same TCP connection, but this is not required.
The server should respond to each request within 30 seconds, and the client should send M2 within 30 seconds of receiving M1.
In case of a protocol error, the TCP connection is immediately closed.
To execute the protocol over HTTP, the client performs a HTTP POST to the URL http://http.v1.kno.araneus.fi/api1, with an application/octet-stream request body containing a single request message (M0, M2, or M4) as unencoded binary data. The server responds with a 200 OK response having an application/octet-stream body containing the corresponding response message as unencoded binary data.
If a number that you wish to submit to the KNO is not already a sequence of 16 bytes, you have to decide how to encode it. When the same number is submitted by two different users, the KNO will only be able to detect a match if they are encoded identically, so it is useful to agree on a common standard for encoding. This section defines such a set of encoding rules; this is version 1 ("v1") of the rules.
If the number is a bit stream, such as the output from a random bit generator, the first bit is sent as the most significant bit of the first byte, the next bit as the next-to-most significant bit of the first byte, etc.
If the number is an integer, it is first converted to binary form. The binary form is then converted to a 128-bit number by padding with zero bits at the most significant end (if the number is less than 128 bits) or by omitting the most significant bits (if the number is more than 128 bits). The number is then transmitted in big-endian order (the eight most significant bits in the first byte, etc).
The consequence of a possible disclosure of the secret S is that clients knowing S will be able to determine the known/unknown status of a number by executing the M0/M1 phase of the protocol only, without disclosing N itself to the server.
The value specified for the false commitment C is arbitrary; its purpose is to serve as a placeholder that the client cannot distinguish from a valid commitment until T is revealed, so any random-looking string of the appropriate length would work. The specified C could be interpreted as the commitment to a proof that the number 0 is known, which seems a reasonably truthful statement to make in the absence of more useful knowledge.
The timestamp query has the side effect of storing N in the database because if it did not, a client that does not care about cryptographic authentication of the Oracle's responses could abuse the timestamp query as a way of determine the known/unknown status of a number without it being added to the database.
A client that only wishes to submit numbers to the database and does not care about receiving their known/unknown status in return can do so by sending M2 messages and discarding the M3 responses, without executing the M0/M1 phase.