[E-Book] Give Your Tasks To Machines

[E-Book] Give Your Tasks To Machines

Operational Process Automation is about getting the right systems and workflows in place to automate repetitive operational tasks, like troubleshooting steps, to improve efficiency and ensure consistency in operations teams. Get this E-Book to learn more.

Key Points Discussed:

  • How to use OPA to detect incidents, diagnose and troubleshoot those incidents to then act and resolve issues using corrective actions.
  • Learn the benefits of OPA for your technical teams
  • Where and how to get a demo of tools that support OPA solutions

Get the E-Book

[E-Book] Understanding the NMIS KPI interface

[E-Book] Understanding the NMIS KPI interface

Opmantek allows you to interpret the health of your nodes from a single metric. This E-book provides insights into Key Performance Indicators (KPIs) that were introduced into NMIS to provide insight as to why the health of a node was getting better or worse.

Key Points Discussed:

  • What is a KPI and why is it relevant it for network monitoring?
  • How to configure the value and resulting KPI Score value
  • Interpreting Health and KPI Values
  • Where to get insights into your network behaviour & data using opTrend and NMIS

Get the E-Book

Inteligencia redefinida, NMIS 9

Inteligencia redefinida, NMIS 9

NMIS consolida múltiples herramientas en un solo sistema, listas para que las usen los ingenieros de redes.  Escalable, flexible, abierto, fácil de implementar y mantener, son cualidades que caracterizan a NMIS como el sistema de administración de redes que sustenta las operaciones de más de 100,000 organizaciones en todo el mundo, convirtiéndolo en uno de los sistemas de administración de redes de código abierto más utilizados en la actualidad, por las ventajas que representa y que a continuación se detallan:

  • NMIS supervisa el estado y el rendimiento del entorno de TI de una organización, ayuda en la identificación y rectificación de fallas y proporciona información valiosa para que los departamentos de TI planifiquen los cambios de infraestructura y su inversión.
  • Se ha implementado globalmente en redes desde tan solo 5 dispositivos hasta cientos de miles, de los cuales más de 10,000 modelos están disponibles con los proveedores.
  • Aumenta tu eficiencia con la automatización a través del potente motor NMIS 9. Ideal para los MSP, NMIS 9 de Opmantek, el cual resolverá cualquier problema de escala mediante una arquitectura flexible y la integración con las herramientas existentes.

El aprovechar las características y ventajas del NMIS más rápido de la historia, con importantes mejoras listas para potenciar la administración de tu red en una sola plataforma, traerá consigo beneficios para tu empresa como:

  • Completa flexibilidad de arquitectura con más nodos por servidor que nunca
  • Consolida todas tus otras herramientas y automatiza su operación
  • Incorporación de Big Data con MongoDB reemplazando la base de datos del sistema de archivos NMIS 8
  • Mayor soporte para el almacenamiento centralizado de datos, lo que significa más disponibilidad de datos.
  • Entregado en una solución pre-configurada lista para usar, para una implementación rápida
  • Totalmente compatible con módulos comerciales para ampliar la plataforma.
  • Soporte comercial completo disponible

¡Si estas interesado en esta nueva versión de NMIS no dudes en contactarnos, dando click aquí!

7 Steps to Network Management Automation & Engineer Sleep Insurance

7 Steps to Network Management Automation & Engineer Sleep Insurance


Quietly, somewhere in an office downtown, bearings designed to last for 25k hours have been running non-stop for over forty-three-thousand. The fan was cheaply made by machine from components sourced over several years across a dozen providers. It sat boxed for weeks before it was installed in the router chassis, which itself was boxed-up. Two months at sea, packed tight in a shipping container, then more months bounced around and shuffled from truck to warehouse, and back to a parcel delivery. Finally, the device was configured, boxed and shipped to its final installation point. Stuffed into a too tight closet with no air circulation this mission critical router been running non-stop for the past five-years. It’s a miracle really that it worked this long.


Fan speed was the first thing to be affected by the bearing failure.

Building friction on the fan’s impeller shaft caused the amperage draw to increase to compensate and maintain rotational speed. When the amperage draw maxed out, rotations per minute (RPM) dropped. With the slower fan speed came less airflow, with lower airflow the chassis temperature increased.


Complex devices, like routers, require low operating temperatures. The cooler it is, the easier it is for electrons to move. As the chassis temperature increased the router experienced issues processing the data packets traversing the interfaces. At first it was an error here or there, then routine traffic routing ran into problems and the router began discarding packets. From there things got much worse.


It’s late Saturday evening and your weekend has been restful so far. A night out with your significant other, a movie and dinner. It’s late now and you’re ready for bed when your phone chirps. The text message is short;


Device: Main Router

Event: Chassis high temperature with high discard output packets

Action Taken: Rerouted traffic by increasing OSPF cost

Action Required: Fan speed low, amperage high. Engineer investigate for repair/replacement.


A fan went bad, what’s next?

The system had responded as you would – it rerouted traffic off the affected interface preventing a possible impact to system operation. Adding a note to your calendar to investigate the router first thing Monday morning you turned in for a good night’s sleep.


Our Senior Engineer in Asia-PAC, Nick Day, likes to refer to Opmantek’s solutions as “engineer sleep insurance”. Coming from a background in managed service providers I can appreciate the situation. Equipment always breaks on your vacation time, often when the on-call engineer is as far away as possible, and with little useful information from the NMS. This was a prime scenario we used when building out our Operational Process Automation (OPA) solution.


Building a Solution

Leveraging the combined ability of opTrend to identify operational parameters outside of trended norms, opEvents correlates events and automates remediation. With the addition of opConfig configuration changes to network devices are then able to be automated. Operational Process Automation (OPA) builds on this statistical analysis and rules-based heuristics, to automate troubleshooting and remediation of network events. This in turn reduces the negative impact on user experience.



Magicians never reveal their secrets…but we’ll make an exception.

Now let’s see how this was accomplished using the above example. At its roots opTrend is a statistical analysis engine. opTrend collects performance data from NMIS, Opmantek’s fault and performance system and determines what is normal operation. Looking back over several weeks, usually twenty-six, opTrend determines what is normal for each parameter it processes. It does this hour by hour, considering each day of the week individually. So, Monday morning 9-10am has its own calculation, which is separate from 3-4pm Saturday afternoon. By looking across several weeks opTrend can normalize things like holidays and vacation time.


Once a mean for each parameter is determined opTrend then calculates a statistical deviation for the parameter and creates a window of three standard deviations above and below the mean. Any activity above or below these windows triggers an opTrend event into NMIS. These events can be in addition to those generated by NMIS’s Thresholding and Alert system, or in place of.


In the example above, opTrend would have seen the chassis temperature exceed the normal window of operation. Had fan speed and/or amperage also been processed by opTrend (it is not by default but can be configured to be if desired) these would have reported as a low fan speed, and high amperage).


This event from opTrend would have been sent to NMIS, then shared with opEvents for processing. A set of rules, or Event Actions, looked for events that could be caused by high temperature; often related to interface packet errors or discards. With wireless devices (WiFi and RF) this may affect signal strength and connection speed. A similar result could be handled using a Correlation Rule, which would group multiple events across a window of time into a new parent event. Both methods are relevant and have their own pros and cons.


opEvents now uses the high temperature / high discards event to start a troubleshooting routine. This may include directing opConfig to connect to the device via SSH and execute CLI commands to collect additional troubleshooting information. The result of these commands can have their own operational life – being evaluated for error conditions, firing off new events and themselves starting Event Actions.


Let’s review the process flow:

  1. NMIS collects performance data from the device, including fan speed, temperature and interface performance metrics.
  2. opTrend processes the collected performance data from NMIS and determines what is normal/abnormal behavior for each parameter.
  3. Events are generated by opTrend in NMIS, which are then shared with opEvents.
  4. opEvents receives events from opTrend identifying out of normal temperature and interface output discards. These events are then correlated into a single synthetic event, given a higher priority, and evaluated for action
  5. An Event Action rule matches for a performance impacting event on a Core device running a known OS. This calls opConfig to initiate Hourly and Daily configuration backups, then execute a configuration change to increase the OSPF cost on the interface forcing traffic to be rerouted off this interface.
  6. opEvents also opens a helpdesk ticket via a RESTful API, then texts the on-call technician with the actions taken, and recommended follow-on activities.
  7. Once traffic across the interface drops the discards error will clear, generating an Up-Notification text to the on-call technician.


This is an example of what we would consider a medium complexity automation. It is comprised of several Opmantek solutions, each configured (most automatically) to work together. These seven solutions share and process fault and performance information, correlate resulting events, apply a single set of event actions to gather additional information and configure around the event. When applying solution automations, we advocate a crawl-walk-run methodology where you start by collecting troubleshooting information (crawl), then automate simple single-step remediations (walk), then slowly deploy multi-path remediations with control points (run).


Contact Us & Start Automating Your Network Management

Contact our team of experts here if you would like to know about how this solution was developed, or how Operational Process Automation can be leveraged to save on manhours and reduce Mean Time to Resolve (MTTR).