Elastic and Common Expression Language input

I'm currently using Hosted Elasticsearch for work, and wanting to grab data via a custom API to we can use it in reporting against events.

Of course, the remote API isn't structured in any way that makes this easy to achieve, so a little pre-processing is needed.

My POC approach is to use bash, curl and jq to restructure the data, but that's not a very robust production solution. It's also difficult to find a reasonable place to run ad-hoc scripts from, when in theory we don't have any infrastructure of our own for this piece of work.

So I've been recommended the “Custom API using CEL” as the Elastic way to do this. https://www.elastic.co/docs/current/integrations/cel

CEL is new to me, and it's been difficult to find any examples that match the environment and sort of work that I want to do. Running CEL scripts through the Elastic web UI is challenging, as the program input box doesn't have any understanding of the script syntax, and results are decoupled from that screen as well.

So, I was hunting to find a way to play with the language in a meaningful way, and I think I've got there now ...

Using mito

We start with Elastic's actual implementation of the engine, https://github.com/elastic/mito. I've got Go installed on my machine, even though I'd rarely use it ...

$ git clone https://github.com/elastic/mito.git
$ cd mito
$ cd cmd/mito
$ go build

This should give you an executable mito in your copy of the repo, I copied mine out to ~/bin which is on my PATH.

Then I switched to a tmp directory for testing, and tried a very simple CEL program, a Hello World :-

hello.cel:

"Hello World"

Executing this:

$ mito hello.cel
"Hello World"

So that works, and you have a functioning test environment!

CEL is an odd-feeling language for me. It seems like the whole program is “one single statement”, and semi-templated output is mixed in with function calls. Some functions chain using dot-notation, and others have to be invoked with arguments. I haven't worked out why this is the case, or how to read the docs to understand which to use, so I'm still doing a lot of trial and error ...

The 'state'

CEL (and therefore mito) is intended to be used to take some initial state, and to mutate it in a non-Turing-Complete environment, and output it. Although it can access external resources (file and network) the basic flow is “take an initial state, run a program over it, output the result” and that's all. Elastic's implementation adds a data-access cursor, but we can ignore that.

From the command-line, we can pop a JSON Object into a file, and introduce that to our program implicitly. If you invoke mito with -data <filename> the JSON object in that file will be available as 'state' in your program, in just the same way that the Elastic Web UI data fields are available as 'state'. An Elastic example is the API's resource URL field, and other examples would include authentication details.

Specific example – JumpCloud

For this article, I'm going to be asking the API for JumpCloud (an Identity provider) to provide me with a list of 'Applications' (these are per-service SSO setups), and then provide me with the list of Users that are allowed to access them. That's one API call to get the list of Applications, and then another call per Application to get the Users for each. I want the final output document to list all applications and each of their users, all in one go – so that's multiple API calls.

My state.json file is very simple :-

{ "url": "https://console.jumpcloud.com/api" }

My CEL/mito program on the other hand is not! It's basically viewed as a one-line command, but in that one line I can use a map function to do more work per-result ...

We start with an HTTP GET to the API, where I have to add some Headers to the request (specifically a hard-coded API key; there are ways to obscure this key but I'm not using them here).

request("GET", state.url+"/applications") will give us the correct URL for this first request, and then I add the headers using .with(). The argument to 'with' is a JSON object, but the header values have to be arrays – { "Header": { "Accept": ["application/json"], "x-api-key": ["hardcoded"] } }

(I could and probably should get the api key from something like “state.apikey”, but it isn't clear to me how to do this in the Elastic web UI right now)

Once I have the request object created, I call .do_request() to make the actual HTTP call, this returns JSON with the response, and I'm interested only in the Body of the response (I could/should be checking for HTTP 200 though). request(...).with(...).do_request().Body

Then, for “reasons” I need to cast the whole Body into bytes() (I could use string() if I just wanted to see the response in a human-readable form). These bytes(...) are then passed into .decode_json() bytes(request(...).with(...).do_request().Body).decode_json(...)

So this takes our API call response, and parses it as a JSON object (which is good, because that's what the API returns). I see that our response has an array called results so I'll take that and run map over it. For each of the result objects (which represent each of the “Applications”), I want to grab the 'id' and 'displayName' fields, renaming the latter to just 'name' ...

.decode_json().results.map(app, { "id": app.id, "name": app.displayName } )

That's great so far ... the next step is to add a “users” key to that, and in there put the results of another API call, one to the '/application/{id}/users' endpoint ...

This will be, similar to the original call, a request(...).with(...), calling do_request().Body to get just the part of the response we need, and then casting the whole thing as bytes(...).decode_json() ... then I invoke map() again, and select just the key/values that I want ...

The whole mess in one go

It really feels like CEL/mito just doesn't care about whitespace, so I can lay my program out any way I want to get readability ... so here's the whole thing. I hope you can follow it from the explanation above!

bytes(  request("GET", state.url+"/applications")
        .with( { "Header": { "Accept": ["application/json"] , "x-api-key": ["HARDCODED"] } } )
        .do_request().Body).decode_json().results.map(app,
                { "id": app.id
                , "name": app.displayName
                , "label": app.displayLabel
                , "users": bytes(       request("GET", state.url+"/v2/applications/"+app.id+"/users")
                                        .with( { "Header":{ "Accept": ["application/json"], "x-api-key": ["HARDCODED"] } })
                                        .do_request().Body).decode_json()
                                        .map(user, user.id)
} )

For performance, that's one API call per-application, plus one. From my desktop, on a JumpCloud org with 29 applications, that takes 13 seconds elapsed.

The (redacted) output looks like this :-

[
  {
    "id": "c160c315088d44ad80ce8464419119c5",
    "label": "platyfish-minionism-obsignatory",
    "name": "idiobiology-reoxidize-dooja",
    "users": [
      "d0e94fd60a884c17ac5ea904a8f71412",
      "27eb539596af466b8e1ff3d183d4b0b7"
    ]
  },
  {
    "id": "65a27cbb50bd473b85e20b700a834225",
    "label": "fibroneuroma-resalvage-pronomination",
    "name": "pseudobasidium-shinglewise-unmortal",
    "users": [
      "ced6d82df7a04ae7ac7768756471a950",
      "3dfd0a2adf2747cfb618f273f9a1410c"
    ]
  }
]