As solutions architects working for our company, Sopra Steria, we got involved in a project deploying and administrating a Openshift Container Platform Cluster over Red Hat Hyperconverged Infrastructure. Regarding monitoring, the client expressed their need to unify this platform with their corporate monitoring solution.
Our starting point was a Grafana console with Prometheus as a data source for Openshift metrics. To check infrastructure metrics we had to access Red Hat Virtualization Manager console, which does not allow us to create monitoring dashboards or to have a proactive monitoring.
Additionally the client had their corporate monitoring suite based on Nagios. So they saw the need to integrate this new OCP+RHHI infrastructure to their Nagios environment by adding the metrics captured from their OCP’s Prometheus and RHV consoles.
I consider myself a sysadmin, not a developer, but have always been very passionate trying to solve all challenges that I encounter. I have often used Nodejs and their massive package catalog to achieve my goals. Once again, it seemed to be the right choice to solve this scenario.
What it does
This project written in Nodejs aims to cover a gap detected in RedHat Openshift (3.11, 4.X) solutions deployed on RHHI with RHV, and it is the impossibility of centralizing infrastructure monitoring on a single console.
With RHV2Prometheus, metrics are generated based on the objects discovered in the RHV api and imported to OCP’s Prometheus. Once imported you can create dashboards on OCP’s Grafana and can be queried from Nagios official plugin, also can integrate third party solutions with API REST and other possiblilities.
It presents a secured SSO interface to query imported events generated on RHV console. You can also view metrics to be scraped from OCP’s Prometheus, edit metric thresholds, customize Nagios configuration templates or test an API query.
It also includes a functionality to simulate the response of a ovirt api (static values from a lab or your own). This is ideal to test if you don’t have an API to query. We had to use it in the Hackathon because there was not a RHV resource available to use for demonstration purposes.
- Red Hat Hyperconverged infrastructure
- RedHat Virtualization
- Red Hat Single Sign-On
- Red Hat 3scale
How we built it
As mentioned before, RHV2Prometheus is built in Nodejs. The first obstacle encountered was to understand how RHV api worked and its structure. On early stages, started to query RH lab environments, which where renewed each week, at a very slow pace. Once it was tested long enough, moved on to the client’s OCP cluster running on high availability RHHI infrastructure.
The launch iterations where executed on a pre-defined interval. Each iteration will recreate metrics collected and obtain its value. The process is automated, there is no need to configure each infrastructure change.
Challenges we ran into
The first challenge was to identify the metrics who are interesting to import to Prometheus, some API objects have a statistics class, but we expand it building metrics from any numeric value from its description, it was the best chance to leave it autonomous in the metrics creation.
Once the number of queries is increased, it sometimes ended by collapsing RHVM by denying of service (OUCH!). To solve this issue, we implemented an api call queue, setting priorities for each API call to RHV. This way, you cannot make more than one api call within the same 100ms ratio, the older calls get a higher priority than the most recent ones.
To reduce the number of queries sent to RHV, a cache array is created to translate object id to names. This way we reduced the number of queries by 50% and this data is inmutable.
Another challenge was to understand Openshift’s Operators, in this case the monitoring operator. Once understood, we edited Prometheus default configMap configuration so we can scrape from our app.
Regarding Nagios configuration, had the experience and knowledge to automate its configuration, since I had been a Nagios administrator in the past.
Accomplishments that we're proud of
Our best accomplishment was to build a product based on our client’s need, which has been my first app under a MIT license, and would be honored if this functionality is included or considered on future Red Hat solutions.
What we learned
What's next for rhv2prometheus
We defined our next steps to explore posibilities to build it as operator, without Nagios customization and thresholds. This way it can be deployed on RHHI+OCP infrastructures and try to automate its configuration so the user doesn’t need to add configuration parameters.
Depending on where we want to go, the requirements to test RHV2prometheus vary:
NODE.js Allows us to run RHV2Prometheus on our laptop or any node server.
Openshift Container Platform 3.11/4.X we can deploy and integrate it with the prometheus out-of-the-box, only tested in Openshift 3.11 with Prometheus-operator.
Red Hat Virtualization What we consult is the ovirt api, so it could be integrated with other solutions, only tested in RHV 4.0 of the labs in OPENTLC.com and RHV 4.0 over RHHI.
First you need to clone the repo to local and edit the .env file
RHEVserver="https://YOURSERVER" # User example "admin@internal" # Password example "r3dh4t1!" # Base64 encoded user@domain:password RHVCredentials="YWRtaW5AaW50ZXJuYWw6cjNkaDR0MSE=" RHSSO="true" RHSSOcert=""
then select your option [ Openshift | Node.js ]
Here are some steps to take it working, obiously you can take your own way.
# create the project oc new-project rhv2prometheus --description="Create prometheus metrics from RHV" --display-name="rhv2prometheus" # create build from source oc new-build --name rhv2prometheus --strategy=source --binary --image-stream=nodejs:latest # start the build oc start-build rhv2prometheus --from-dir=. --follow # create the app oc new-app rhv2prometheus # expose the service oc expose svc/rhv2prometheus
Then you can access to http://rhv2prometheus-rhv2prometheus.yourappdomain.com/metrics to check the metrics to scrape from another prometheus.
Also can check the events in: http://rhv2prometheus-rhv2prometheus.yourappdomain.com/events
For config the prometheus-operator out-of-the-box, here we use some steps that can be optimized, but the next works for us:
*Cluster admin rights are required
- Clone the configmap cluster-monitoring-config to a new one prometheus-custom-cfg in the project "openshift-monitoring" adding our RHV2prometheus svc route to the scape_configs:
- job_name: custom/rhv-monitoring static_configs: - targets: ['rhv2prometheus-rhv2prometheus.yourappdomain.com']
- The Stateful Set "prometheus-k8s" have defined 4 containers (prometheus, prometheus-config-reloader, prometheus-proxy, rules-configmap-reloader) inside each prometheus pods, we just need to edit the prometheus-config-reloader:
Mount the new configmap adding our config_custom mountPath:
in the app definition:
volumes: - configMap: defaultMode: 420 name: prometheus-custom-cfg name: prometheus-custom-cfg
in the container definition:
volumeMounts: - mountPath: /etc/prometheus/config_custom name: prometheus-custom-cfg readOnly: true
Edit the - args of the container to use our custom cfg with the flag --config-file:
Saving the Stateful Set will start the new deploy, and prometheus-config-reloader will send the new config to the container prometheus.
Then we can check our new metrics inside prometheus
then take a look at the package.json, scripts definition:
"start": "set RHEVtest=false&& set RHEVrecord=false&& node server.js", "start-no-sso": "set RHEVtest=false&& set RHEVrecord=false&& node server.js", "test": "node simulator.js&& set RHEVtest=true&& set RHEVrecord=false&& node server.js", "record": "set RHEVtest=false&& set RHEVrecord=true&& node server.js"
to start it connecting to the .env.RHVServer and use SSO:
if you want to record the api output for simulator run:
npm run record
if you want to test connecting to the api simulator, run:
npm run test
Then you can access to http://localhost:8080/metrics to check the metrics to scrape from another prometheus.
Also can check the events in: http://localhost:8080/events