Packet Pushers: Detect, Diagnose, And Act Podcast

Packet Pushers: Detect, Diagnose, And Act Podcast


Podcast: Download (46.2MB)
Keith Sinclair, CTO and progenitor of NMIS, joins Greg Ferro on Packet Pushers

They discuss:

  • What NMIS does and how it works
  • Protocol support including SNMP, WMI, SSH, RESTful APIs, and more
  • The persistence of SNMP
  • Opmantek’s approach of detect, diagnose, and act
  • Automation capabilities
  • How NMIS uses dashboards, portals, and maps

Discovery to Monitoring, Automatic & On Your Terms

Discovery to Monitoring, Automatic & On Your Terms

Introduction

So you have this great discovery and auditing tool called Open-AudIT and you also have an amazing monitoring tool called NMIS. How can you automatically take your discovered devices and have NMIS monitor them…and why would you want to?

With version 4.2.0 of Open-AudIT, we have re-implemented Integrations in an extremely easy-to-use yet extremely configurable way.

Why?

Discovery provides network transparency. Monitoring provides network visibility. Both are essential to good network management and go hand-in-hand with diagnosing network performance issues and device management and lifecycle.

You cannot manage something if you don’t know it exists, and you cannot plan for the future if you don’t know the current performance of your devices – be they desktops, servers, switches, or routers.

Why wouldn’t you want the ability to automatically monitor select device types (for example) as they come online? You can set up a scheduled Integration and automatically include all discovered routers and switches.

 

Let that sink in for a moment.

Automatically monitor devices without having to set them up individually in your monitoring solution. From discovery to monitoring automatically, on your terms.

 

Less time spent entering details.

More accurate information with zero possibility of spelling mistakes mistyped credentials, etc.

No double handling of information between systems is required.

 

It just works.

Discover it in Open-AudIT, monitor it in NMIS – seamlessly.

 

How does it work?

Integrations take a list of devices from NMIS and a list of devices from Open-AudIT. They match the devices based on selected attributes, combine their attributes according to which system (NMIS or Open-AudIT) should be the point of truth, and update both systems based on any changes.

The list of devices may actually be empty on either side. We can restrict the device list on either side based on device attributes. We can select attributes to be stored – even if they don’t exist in Open-AudIT. NMIS and Open-AudIT don’t even need to be on the same server. There is so much flexibility!

But with great flexibility, comes (potentially) great complexity. This is an area we are particularly proud of. We’ve kept the creation of an Integration as easy as possible. At its most simple level, if NMIS and Open-AudIT are installed on the same server, you can click a ‘create’ button and everything is automatically done for you. You don’t need to supply any information. We’ve chosen sensible defaults and the Integration just works.

On the other end of the scale, you might have NMIS running on Debian and Open-AudIT running on Windows. You might wish to only integrate devices that are routers. You might even have some fields in NMIS that don’t exist in Open-AudIT, – but you wish to track and be able to edit them in Open-AudIT which then updates NMIS. It’s all completely achievable with just a few clicks.

More than the simple integration above, – but still very easy to accomplish.

No code to write, just a simple-to-use web interface. Oh, – and there is also the JSON RESTful based Open-AudIT API as well.

Questions

Now let’s back up a little bit and set the scene. You’ve been using Open-AudIT for a while and have discovered some devices on your network. You have working credentials for these devices and can see their configuration. You may have computers, switches, printers, routers, firewalls, etc.

How can we easily send some of these devices to NMIS for monitoring?
When you create an Integration in Open-AudIT, by default we include all discovered devices that have working SNMP credentials. However,  you might not want every device integrated with NMIS. Some of your servers, for example, may use SNMP – but you don’t need NMIS monitoring them. Integration has a section to select which devices to include from Open-AudIT. Every device is defaulted to have its “manage_in_nmis” attribute set to “y”. There is also a rule in Open-AudIT that sets this attribute if we talk to the device using SNMP.

 

But in this example, we don’t want every SNMP talking device, we only want our routers in NMIS.

In this instance, we can simply change the used attribute to “type” (instead of “manage_in_nmis”) and the value of that attribute to “router” (instead of “y”) – then we’re done!

What if I want the SNMP Community string to be defined in NMIS, not Open-AudIT?
An Integration contains a list of the fields used by both systems (NMIS and Open-AudIT). Each field has a flag that defines its ‘priority’. This can be set to either NMIS or Open-AudIT (actually stored as external or internal). Just select NMIS for the priority for the NMIS → configuration. community field and if this value is changed in NMIS, the next time the Integration is run Open-AudIT will be updated.

How can I automatically run the Integration?
Integrations can be scheduled within Open-AudIT just like discoveries, queries, baselines, et al. You can choose to run an Integration on whatever time frame you choose.

What if I’m an NMIS user, have just installed Open-AudIT, and don’t have any devices in it?
Simply run the default Integration. Your NMIS devices will be sent to Open-AudIT and discovered automatically. Open-AudIT stores more information about the make-up of a device, as opposed to NMIS’s performance data. When you run an Integration; Open-AudIT has the device’s IP and the device’s credentials.  You can then run a discovery and retrieve everything Open-AudIT can.

 

Again – this is configurable. You might not wish to run a discovery on the device – that’s up to you! To enable or disable a discovery is a single attribute. Click, done!

Making it Happen
As usual, the Open-AudIT wiki has all the technical details you should need. Check the Integrations page and if you still have questions, please do ask in the Community Forums.

What have we Learnt through Navigating in an Economic Downturn & Pandemic?

What have we Learnt through Navigating in an Economic Downturn & Pandemic?

Between suffering from uncertainty to naturally flourishing, the ability to adapt out of dead market space will have made all the difference for your business in the past year. Whether you are in a country that is on track for normality or being hit by a new wave of infections – All businesses have needed to evolve.

 

For those of us operating in the tech industry, we have experienced several significant economic events – especially the .com crash of 2000 and the 2008 Global Financial Crisis. We know what happens during an economic slowdown while there are some unique factors at play in relation to COVID-19; here is what we have learned.

 

Innovation should not be put off

Undergoing a 3-year change progression in a mere month was and continues to be a reality for many businesses across the globe. On countless occasions funds have been pulled from investments, particularly those in technology; deemed as a costly non-essential to cut in order to keep the boat floating. However, innovation needs to be cultivated and fed, without businesses prioritising technology their future fitness will remain grim. Opmantek’s automated network management tools were built on the premise of empowering companies. These tools give users the flexibility to operate in diverse environments with speed and scale at a fraction of the cost so you can keep innovating.

 

Having healthy finances is necessary

In the IT industry, an error caused by a triggered event in your network could cost a wave of rippling expenses. During periods of economic uncertainty what you don’t know can hurt you. Utilising technology such as one of Opmantek’s opEvents will reduce the impact of network faults and failures using proactive event management. Adding tools such as these to your arsenal allows you to gleam intelligent insights to make educated data and cost-effective driven decisions.

 

Optimising your data is the way forward

Your market no matter which side of it you are on has changed, so your business needs to change with it. More data, more data, more data, let’s face it cultivating and finding quality data is a superpower. So how is it possible to see it all? How can it be automatically configured and how can you keep up with it when it changes? Most organizations cannot give accurate location data of their assets, Open-AudIT gives you this information in seconds. Reduce the degree of uncertainty and make data-driven decisions, simply by running tools such as Open-AudIT to develop meaningful reports and resources. Optimising your data is the way forward, to learn how you can audit everything on your network with Open-AudIT book a demo session with our experts here.

 

Continual agility across all facets of business will be imperative to navigate through the next phase of this economic climate. Those that are familiar with nimble project management within the software development world – use similar methods in your financials too –be very conscious that your ability to plan twelve months is now a lot lower than it used to be and you need to undertake agile planning and forecasting. This will be a time of continual change however by; continually pushing innovation and utilising tools that give you the best possible view of your data to drive decision-making process, the path forward will be a lot clearer to navigate.

 

How to Manage Complex Event Responses

How to Manage Complex Event Responses

Managing complex event responses can seem like an overwhelming task, but with the right automated network management software, the process is simpler than ever. Let’s take a look at how an automated system can help you manage complex event responses.

 

What is a Complex Adaptive System (CAS)?

Complex Adaptive Systems (CAS) are made up of components (or agents) in a dynamic network of interactions that are designed to adapt and learn according to changing events. These interactions may be affected by other changes in the system and are non-linear and able to feed back on themselves. In the Australian healthcare system, for example, complex adaptive systems have been used to analyse systematic changes.

The overall behaviour of a CAS is not predicted by the behaviours of the agents individually. The past of CAS systems is partly responsible for their present behaviour and they are designed to evolve over time.

 

Event automation and remediation using opEvents

opEvents is an advanced fault management and operational automation system designed to make event management easier than ever. With opEvents, you can improve your business’s operational efficiency and decrease the workload of your staff by expanding on NMIS‘s efforts and improving automated response techniques using scientific methods.

opEvents elevates NMIS’s Notification, Escalation and Thresholding systems by blacklisting and whitelisting events, handling event flap, event storms and event correlation and supporting custom email templates for each of your contacts.

Basic event automation

In order to carry out event automation successfully, there are a few simple steps that you need to take:

1. Network management – identify the top network events you respond to frequently (daily, weekly, etc.)
2. List the steps you take – troubleshooting and remediating – when the issue occurs
3. Identify how these steps can be automated
4. Create an action to respond to the event

Let’s take a look at how opEvents handles events natively:

Event action policy

Event Action policy is a flexible mechanism that dictates how opEvents reacts when an event is created. The policy outlines the order of actions as well as what actions are executed by using nested if/then statements.

Event correlation

Setting event correlation helps reduce event storms inside opEvents. opEvents will use rules that are outlined to group events together and create a synthetic event that contains event information from all events that have been correlated.

Event escalation

opEvents allows for custom event escalations for unacknowledged events. You can set custom rules based on your business or customers.

Event scripts

Events can call scripts that can be used to carry out actions such as troubleshooting, integration or remediation.

Event deduplication

All events that are related to stateful entities are automatically checked against the recent history of events and the known previous state of this entity.

Developing a CAS system

In order to develop a CAS system, it’s essential to complete the following steps:

1. Identify an individual event
2. List the steps you take – troubleshooting and remediating – when the issue occurs
3. Decide what automated action(s) can and should be carried out (data collection, remediation)
4. Identify who needs to be contacted, when (working hours, after hours, weekends) and how (Email, text, service desk)
5. Decide what should happen over time if the event is not acknowledged (remains active)

 

If you would like to learn more about Opmantek’s event management services, don’t hesitate to get in touch with our team or request a demo.

Why we need a Dynamic Baseline and Thresholding Tool?

Why we need a Dynamic Baseline and Thresholding Tool?

With the introduction of opCharts v4.2.5 richer and and more meaningful data can be used in decision making. Forewarned is forearmed the poverb goes, a quick google tells me “prior knowledge of possible dangers or problems gives one a tactical advantage”. The reason we want to baseline and threshold our data is so that we can receive alerts forewarning us of issues in our environment, so that we can act to resolve smaller issues before they become bigger. Being proactive increases our Mean Time Between Failure. If you are interested in accessing the Dynamic Baseline and Thresholding Tool, please Contact Us.

Types of Metrics

When analysing time series data you quickly start to identify a common trend in what you are seeing, you will find some metrics you are monitoring will be “stable” that is they will have very repeated patterns and change in a similar way over time, while other metrics will be more chaotic, with a discernible pattern difficult to identify. Take for example two metrics, response time and route number (the number of routes in the routing table), you can see from the charts below that the response time is more chaotic with some pattern but really little stability in the metric, while the route number metric is solid, unwavering.
meatball-responsetime - 750
meatball-routenumber - 750

Comparing Metrics with Themselves

This router meatball is a small office router, with little variation in the routing, however a WAN distribution router would be generally stable, but it would have a little more variability. How could I get an alarm from either of these without configuring some complex static thresholds?

The answer is to baseline the metric as it is and compare your current value against the baseline, this method is very useful for values which are very different on different devices, but you want to know when the metric changes, example are route number, number of users logged in, number of processes running on Linux, response time in general, but especially response time of a service.

The opCharts Dynamic Baseline and Threshold Tool

Overall this is what opTrend does. The sophisticated statistical model it builds is very powerful and helps spots these trends with the baseline tool. We have extended opTrend with some additional functionality so that you can quickly get alerts from metrics which are important to you.

What is really key here is that the baseline tool will detect downward changes as well as upward changes, so if your traffic was reducing outside the baseline you would be alerted.

Establishing a Dynamic Baseline

Current Value

Firstly I want to calculate my current value, I could use the last value collected, but depending on the stability of the metric this might cause false positives, as NMIS has always supported, using a larger threshold period when calculating the current value can result in more relevant results.

For very stable metrics using a small threshold period is no problem, but for wilder values, a longer period is advised. For response time alerting, using a threshold period of 15 minutes or greater would be a good idea. That means that there is some sustained issue and not just a one off internet blip. However with our route number we might be very happy to use the last value and get warned sooner.

Multi-Day Baseline

Currently two types of baselines are supported by the baseline tool, the first is what I would call opTrend Lite, which is based on the work of Igor Trubin’s SEDS and SEDS lite, this methods calculates the average value for a small window of time looking back the configured number of weeks, so if my baseline was 1 hour for the last 4 weeks and the time now is 16:40 on 1 June 2020 it would look back and gather the following:

  • Week 1: 15:40 to 16:40 on 25 May 2020
  • Week 2: 15:40 to 16:40 on 18 May 2020
  • Week 3: 15:40 to 16:40 on 11 May 2020
  • Week 4: 15:40 to 16:40 on 4 May 2020

With the average of each of these windows of time calculated, I can now build my baseline and compare my current value against that baseline’s value.

Same-Day Baseline

Depending on the stability of the metric it might be preferable to use the data from that day. For example if you had a rising and falling value It might be preferable to use just the last 4 to 8 hours of the day for your baseline. Take this interface traffic as an example, the input rate while the output rate is stable with a sudden plateau and is then stable again.

asgard-bits-per-second - 750

If this was a weekly pattern the multi-day baseline would be a better option, but if this happens more randomly, using the same-day would generate an initial event on the increase, then the event would clear as the ~8Mbps became normal, and then when the value dropped again another alert would be generated.

Delta Baseline

The delta baseline is only concerned with the amount of change in the baseline, for example from a sample of data from the last 4 hours we would see that the average of a metric is 100, we then take the current value, for example, the spike of 145 below, and we calculate the change as a percentage, which would be a change of 45% resulting in a Critical event level.

amor-numproc - 750

The delta baseline configuration then allows for defining the level of the event based on the percentage of change, for the defaults, this would result in a Major, you can see the configuration in the example below, this table is how to visualize the configuration.

  • 10 – Warning
  • 20 – Minor
  • 30 – Major
  • 40 – Critical
  • 50 – Fatal

If the change is below 10% the level will be normal, between 10% and 20% Minor, and so up to over 50% it will be considered fatal.

In practicality this spike was brief and using the 15 minute threshold period (current is the average of the last 15 minutes) the value for calculating change would be 136 and the resulting change would be 36% so a Major event. The threshold period is dampening the spikes to remove brief changes and allow you to see changes which last longer.

Installing the Baseline Tool

Copy the file to the server and do the following, upgrading will be the same process.

tar xvf Baseline-X.Y.tgz
cd Baseline/
sudo ./install_baseline.sh

Working with the Dynamic Baseline and Thresholding Tool

The Dynamic Baseline and Threshold Tool includes various configuration options so that you can tune the algorithm to learn differently depending on the metric being used. The tool comes with several metrics already configured. It is a requirement of the system that the stats modeling is completed for the metric you require to be baseline, this is how the NMIS API extracts statistical information from the performance database.

Conclusion

For more information about the installation and configuration steps required to implement opCharts’ Dynamic Baseline and Thresholding tool, it is all detail in our documentation – here.

Why IP Address Management Is Important

Why IP Address Management Is Important

Whether you’re a small organization or an enterprise, efficient management of IP addresses can be the difference between a functional network and an inaccessible service.

Increasing complexities, growing device numbers, Cloud Computing, IoT and BYOD continue to heighten the importance of managing your IP address space.

Relying on manual record keeping for network connectivity and core business functions can prove risky, even for the most organized of spreadsheets.

What’s Needed for an Efficient IP Address Management Strategy

Accurate75px

Accuracy

Accurate IP delegation and record keeping, ensuring no conflict or associated service outages.
opAddress can allocate and track IP addresses dynamically. Search, view and manage address information, ensuring a critical information baseline is established.

Simplicity75px

Simplicity

An easy to use system that minimizes data entry, making the process for IT teams faster, more efficient, and less tedious
With powerful out-of-the-box capabilities, opAddress requires little or no configuration. Automatically discover the network addressing of production networks and quickly edit or reallocate addresses as needed.

Security75px (1)

Security

Accurate, up-to-date data to help identify new devices and ensure only those authorized are on your network.
New data is captured and recorded by opAddress every thirty minutes.
Gain full visibility over IP address by device and analyze historical information.

Scaleable75px (1)

Scalability

Future proofing and capacity planning to accommodate increasing device numbers and network complexities.
opAddress is extensible to grow with your business. Handle complex environments such as multiple tenancies, subdomains and overlapping address spaces with ease.

Ready to see what opAddress and NMIS9 can do for your organization?

Book a Demo