7 Steps to Network Management Automation & Engineer Sleep Insurance

7 Steps to Network Management Automation & Engineer Sleep Insurance

 

Quietly, somewhere in an office downtown, bearings designed to last for 25k hours have been running non-stop for over forty-three-thousand. The fan was cheaply made by machine from components sourced over several years across a dozen providers. It sat boxed for weeks before it was installed in the router chassis, which itself was boxed-up. Two months at sea, packed tight in a shipping container, then more months bounced around and shuffled from truck to warehouse, and back to a parcel delivery. Finally, the device was configured, boxed and shipped to its final installation point. Stuffed into a too tight closet with no air circulation this mission critical router been running non-stop for the past five-years. It’s a miracle really that it worked this long.

 

Fan speed was the first thing to be affected by the bearing failure.

Building friction on the fan’s impeller shaft caused the amperage draw to increase to compensate and maintain rotational speed. When the amperage draw maxed out, rotations per minute (RPM) dropped. With the slower fan speed came less airflow, with lower airflow the chassis temperature increased.

 

Complex devices, like routers, require low operating temperatures. The cooler it is, the easier it is for electrons to move. As the chassis temperature increased the router experienced issues processing the data packets traversing the interfaces. At first it was an error here or there, then routine traffic routing ran into problems and the router began discarding packets. From there things got much worse.

 

It’s late Saturday evening and your weekend has been restful so far. A night out with your significant other, a movie and dinner. It’s late now and you’re ready for bed when your phone chirps. The text message is short;

 

Device: Main Router

Event: Chassis high temperature with high discard output packets

Action Taken: Rerouted traffic by increasing OSPF cost

Action Required: Fan speed low, amperage high. Engineer investigate for repair/replacement.

 

A fan went bad, what’s next?

The system had responded as you would – it rerouted traffic off the affected interface preventing a possible impact to system operation. Adding a note to your calendar to investigate the router first thing Monday morning you turned in for a good night’s sleep.

 

Our Senior Engineer in Asia-PAC, Nick Day, likes to refer to Opmantek’s solutions as “engineer sleep insurance”. Coming from a background in managed service providers I can appreciate the situation. Equipment always breaks on your vacation time, often when the on-call engineer is as far away as possible, and with little useful information from the NMS. This was a prime scenario we used when building out our Operational Process Automation (OPA) solution.

 

Building a Solution

Leveraging the combined ability of opTrend to identify operational parameters outside of trended norms, opEvents correlates events and automates remediation. With the addition of opConfig configuration changes to network devices are then able to be automated. Operational Process Automation (OPA) builds on this statistical analysis and rules-based heuristics, to automate troubleshooting and remediation of network events. This in turn reduces the negative impact on user experience.

 

 

Magicians never reveal their secrets…but we’ll make an exception.

Now let’s see how this was accomplished using the above example. At its roots opTrend is a statistical analysis engine. opTrend collects performance data from NMIS, Opmantek’s fault and performance system and determines what is normal operation. Looking back over several weeks, usually twenty-six, opTrend determines what is normal for each parameter it processes. It does this hour by hour, considering each day of the week individually. So, Monday morning 9-10am has its own calculation, which is separate from 3-4pm Saturday afternoon. By looking across several weeks opTrend can normalize things like holidays and vacation time.

 

Once a mean for each parameter is determined opTrend then calculates a statistical deviation for the parameter and creates a window of three standard deviations above and below the mean. Any activity above or below these windows triggers an opTrend event into NMIS. These events can be in addition to those generated by NMIS’s Thresholding and Alert system, or in place of.

 

In the example above, opTrend would have seen the chassis temperature exceed the normal window of operation. Had fan speed and/or amperage also been processed by opTrend (it is not by default but can be configured to be if desired) these would have reported as a low fan speed, and high amperage).

 

This event from opTrend would have been sent to NMIS, then shared with opEvents for processing. A set of rules, or Event Actions, looked for events that could be caused by high temperature; often related to interface packet errors or discards. With wireless devices (WiFi and RF) this may affect signal strength and connection speed. A similar result could be handled using a Correlation Rule, which would group multiple events across a window of time into a new parent event. Both methods are relevant and have their own pros and cons.

 

opEvents now uses the high temperature / high discards event to start a troubleshooting routine. This may include directing opConfig to connect to the device via SSH and execute CLI commands to collect additional troubleshooting information. The result of these commands can have their own operational life – being evaluated for error conditions, firing off new events and themselves starting Event Actions.

 

Let’s review the process flow:

  1. NMIS collects performance data from the device, including fan speed, temperature and interface performance metrics.
  2. opTrend processes the collected performance data from NMIS and determines what is normal/abnormal behavior for each parameter.
  3. Events are generated by opTrend in NMIS, which are then shared with opEvents.
  4. opEvents receives events from opTrend identifying out of normal temperature and interface output discards. These events are then correlated into a single synthetic event, given a higher priority, and evaluated for action
  5. An Event Action rule matches for a performance impacting event on a Core device running a known OS. This calls opConfig to initiate Hourly and Daily configuration backups, then execute a configuration change to increase the OSPF cost on the interface forcing traffic to be rerouted off this interface.
  6. opEvents also opens a helpdesk ticket via a RESTful API, then texts the on-call technician with the actions taken, and recommended follow-on activities.
  7. Once traffic across the interface drops the discards error will clear, generating an Up-Notification text to the on-call technician.

 

This is an example of what we would consider a medium complexity automation. It is comprised of several Opmantek solutions, each configured (most automatically) to work together. These seven solutions share and process fault and performance information, correlate resulting events, apply a single set of event actions to gather additional information and configure around the event. When applying solution automations, we advocate a crawl-walk-run methodology where you start by collecting troubleshooting information (crawl), then automate simple single-step remediations (walk), then slowly deploy multi-path remediations with control points (run).

 

Contact Us & Start Automating Your Network Management

Contact our team of experts here if you would like to know about how this solution was developed, or how Operational Process Automation can be leveraged to save on manhours and reduce Mean Time to Resolve (MTTR).

COVID-19 Effects On Businesses: Your Business Is Not Dead, But Your Market Probably Is

COVID-19 Effects On Businesses: Your Business Is Not Dead, But Your Market Probably Is

Over this past week I’ve received a number of calls from CEO’s and founders seeking advice on how to navigate through the economic downturn that we are faced with. All businesses will need to change – some more than others – some will be significantly boosted by this period while others will be significantly harmed. In Australia, we haven’t had a recession since 1990/91, so if we do enter recession, the majority of people in the workforce will have never experienced one – I am 50 years old and was at University when Australia had its last recession. For those of us operating in the tech industry, we have experienced several significant economic events – especially the .com crash of 2000 and the 2008 Global Financial Crisis. We know what happens during economic slowdown while there are some unique factors at play in relation to COVID-19 also.

 

I’m going to focus my comments towards entrepreneurs and high growth companies (particularly tech companies), but there will be some relevance to all businesses.

Fundamentally what we have with the COVID-19 coronavirus, is a change in the marketplace and without doubt, economic slowdown. There are some unique factors relating to the virus pandemic, but there are also common forces at play that relate simply to an economic slowdown. Some businesses will naturally flourish, for example, if you’re producing products which help people work from home, you’re probably excited at the opportunity, if you’re in a business which requires mass gatherings, e.g. an events business, you might be wondering how you’re going to get through this. Every business has ways they can optimise the outcome for themselves. For some it’s about minimising losses for others it’s about maximising gains. Let’s look at fundamentally how this works.

 

I’m not going to cover off the simple things which we should already know or have read about, but I will talk about some fundamentals I don’t see being written about.

There are two fundamental parts to a business, not to oversimplify it, but a business has operational costs and incomes (which are typically sales).

Let’s start with the two combined together:

 

Cash Burn and OPEX

Quite simply as we all know the difference between the cash in and cash out is your burn rate. If you need to reduce your cash burn, then do it fast. We will all be implementing travel bans etc so those cost reductions will happen naturally. If people are working from home, try and sort something out with your rent to have it reduced or stopped. If you need to cut staff, then try and do it all at once – you don’t want people coming in each day worried, those that are left need to feel safe and positive – they will not feel that if a coworker is getting retrenched every second week (or day). You need to structure your business for the new reality, do not hang on to the past – it’s not your fault and you haven’t failed anyone. Alternatives to cutting staff include reducing all staff salaries, cutting incentive payments (especially for executives – show some good leadership and cut your own and other executive salary and incentives harder than the rest of the staff, it is common to cut exec incentives in full and salaries by as much as 30-40%), don’t forget that an employee’s leave will be paid out if you retrench them so also look at enforced leave (especially leave without pay) or reduced hours for staff in order to keep the team together. Also, remember that you’re going to get some good government assistance too, so factor that in.

 

Cash Incomes and Sales/Marketing 

During an economic slowdown, for most businesses, it is harder to raise capital and it’s also harder to generate sales, however some businesses will flourish in hard economic times.

In summary – your market no matter which side of it you are on has changed so your business needs to change with it.

 

The key on the revenue side (when you have economic slowdown or any other event which produces significant changes to the market) is that it becomes necessary to realign how you are selling your products and potentially which market segments you are selling them to. In the situation that we are in at the moment with COVID-19, we know there are a lot of home workers, pressure on the healthcare system, certain government departments are going to spend lots more and there will be boosts to many online businesses etc.

 

Step 1. Look at who is going to be busier or benefit from this new market – if they are potential clients for you, then target them.

You should also look at how buying processes and decisions change – again this is predictable. There will be less face to face meetings and more virtual. Adapt to online and virtual sales. For many smaller/high growth businesses this is fantastic as you won’t have to compete face to face on sales for the time being.  One clever tactic is to organise a “virtual tour” for one of your rarer people– believe me it works, for example, “our CEO will be conducting Zoom meetings with clients in San Francisco this week, so I’m reaching out to see if I can schedule some time before he moves on to London next week”.

 

As the remote workforce is on the rise it is crucial to stay connected to your business community to enable your growth. Business incubators such as the Gold Coast Innovation Hub are now offering virtual membership options; facilitating continued connections, collaborations, grow, investment opportunities and expansion into global markets.

 

Step 2. Adapt your sales strategies to the new manner in which your potential clients are working.

The other part of the buying process which always changes during an economic slowdown is that more businesses buy things that reduce their costs and less businesses buy things that increase their productivity – look at your potential clients, if they are a net beneficiary in the new market they will likely keep investing in growth – most will be losers (that’s what the slowdown is – net losers) so most of them will be looking to reduce costs. The “losers” don’t actually stop spending, they are happy to spend on products which help streamline their costs and assist in managing their pain.  A lot of products have multiple benefits (I’m sure yours do) and you can reproductise and remarket your products so that they realign with new decision making – especially cost-cutting decisions which are widespread during a significant economic slowdown. In the case of this particular slowdown, there are obvious changes – nobody is travelling, more people are working from home, businesses cutting costs so it’s about realigning and pivoting your marketing and productisation messages, and potentially making some tweaks to your products, but remember, we will get out of the economic slowdown so you are likely to want to focus on your marketing and sales to align with the shift in the buyer’s mindset than completely changing your products.

 

Step 3. Adapt your productisation and your sales and marketing messages to align with the new market.

Channels to market change with new markets also. Look at your resellers or channels to market, in this new market if they rely on face to face sales or mass gatherings at events, then likely they are no longer good channels – if they work mostly in sectors that are being ravaged (e.g. travel) then they may no longer be a good channel – move your sales to channels that make sense to the new market.

 

Step 4. Secure your current sales channels/sales partners if necessary or move your sales channels and partners to those that align with the new market.

 Let’s look at a simple example that everyone can relate to (and specifically steer around a tech business) – a bicycle (and please excuse lots of assumptions below – you should do market testing when repositioning anything).  You’re selling bikes and you’re selling them mostly through your shop and bike clubs. You’re selling awesome bikes, which are light in weight, strong and fast and that’s your main pitch. Your whole market just changed. Bike club memberships are going to drop, as will activity at the clubs – they aren’t the right channel anymore. There is also going to be less foot traffic in the store so you may need to close that or maybe it can survive or shrink – watch and act fast if you need to. You will likely want to be pushing more of your sales through online, social media and referrals. Bike ownership is probably not going to tank – in fact, it is quite possible it will rise as more people work from home and want to take a break and get out on a bike, and as more people look for socially isolated exercise rather than gyms. The benefits of the bike may well now also be that you can use your bike and stay off public transport in order to avoid the virus. Your whole pitch changes. Looking at your product set within your bike range – if you’re selling on socially isolated exercise and avoidance of public transport, it may well be that there is not enough extra benefit of a high-end bike over a low-end bike. The benefits of high-end bikes may now become secondary marketing items (they don’t go away; they just get pushed back). Sales of products like rollers (which allow you to ride your bike indoor at home for exercise rather than go to the gym) may increase and perhaps could be packaged with a bike in a new productisation effort if you expect to see an increase in purchases of home gym equipment including exercise bikes.

 

 

I’m sure you get the idea – your market is probably dead, but you are not – you are simply in a new market.  

 

These are unprecedented times with COVID-19, we know there are a lot of teleworkers, pressure on the healthcare system, certain government departments are going to be under pressure to boosts online businesses and they will need our expertise to do this.

Every business will need to change, Opmantek want to help our community to optimise their outcomes; this is where access to network management support is critical so that Australians can stay better connected.

As a final note – financially at this time – re-do your forecast, especially your cash flow forecast. The economy recovers slower than the virus – look at 12 months to start with.  You need to step back just like you would if you were starting a new company or a new division launching into a new market, look at how the market is reacting and adapt fast. Those that are familiar with agile project management within the software development world – use similar methods in your financials too –be very conscious that your ability to plan twelve months is now a lot lower than it used to be and you need to undertake agile planning and forecasting.

This will be a time of continual change however we do know that these things will be a constant.

 

Introducing opConfig’s Virtual Operator

Introducing opConfig’s Virtual Operator

Introduction

opConfig’s new Virtual Operator can be used to help create jobs comprised of commands sets that can be run on one/many nodes, reporting to see job results and troubleshooting to diagnose nodes, that raise conditions through opConfig’s plug-in system. Quick actions are templates that the virtual operator uses that saves you from having to constantly create commonly run jobs. It also gives operators easy access to run commands on remote systems without giving them full access to the machines.

New Virtual Operator Job

To create a new virtual operator job go to Virtual Operator menu option and click New Virtual Operator Job. You will need to select which nodes you are wanting to run commands on, these are auto-completed from the list of currently activated nodes in opConfig. Next, you can select which command sets should be run on the nodes, this is auto-completed from all command sets which opConfig has loaded. You can also use tags to select which command sets should be run. You can schedule this job to be run now or at a later time, by selecting later this will bring a time-picker to schedule when this job shall be run. A name is auto-generated from data you have already inputted but this can be amended to anything you desire. The details section is a free text field for keeping notes about this job. By clicking schedule this will add the Job to opConfig’s queue and take you to the report schedule.

opConfig New Virtual Operator Job - 700

Virtual Operator Report

A Virtual Operator Report is an aggregation of all data collected from your virtual operator job. On the left panel, you have meta-data about the job, how it was created, by whom and when it’s going to be run or when it was run. The commands panel is a paginated table of the successful commands which were run for the current job. If the command set is using a plug-in to show derived data or report conditions these results are shown inline by clicking the expand icon in the derived column. If the condition has a tag this can be used to help filter down command sets for creating linked virtual operator jobs off these conditions. All operations for the current job are shown to help diagnose connection or command issues that may have occurred.

opConfig Virtual Operator Result - 700

Virtual Operator Troubleshooting

If you have clicked the troubleshoot button from a report condition (see screenshot above for the green button), you are taken to the new virtual operator job screen, but there are a couple of key differences. The node has already been filled out and the command sets have been filtered down using a tag, in this example, we have three command sets with the tag disk. This can help to create workflows where conditions are tagged to limit what the operator can select for the next steps in the troubleshooting process. When this job is created the parent’s job ID is also recorded and the parent’s job name is shown in the newly created report.

opConfig Create Linked Job - 700

Virtual Operator Results & Schedules

There are two final pages that are new, one that shows all scheduled virtual operator jobs and one that shows completed virtual operator jobs. Scheduled shows user-created running jobs and ones which are scheduled in the future. Results show all the completed jobs which were user created and CLI run.

opConfig Virtual Operator Results View - 700

Quick Actions

Quick actions are templates for new virtual operator jobs, we have shipped four sample jobs but you can create your own. Clicking the quick action button will take you to a new virtual operator screen and fill out the specified fields. Create a new json file under
/usr/local/omk/conf/table_schemas/opConfig_action-elements.json
{

"name": "IOS Hourly Collection",

"description": "Hourly baseline collection for Cisco IOS.",

"command_sets": ["IOS_DAILY"],

"buttonLabel": "Collect Now",

"buttonClass": "btn-primary"

}

 

Key Datatype About
name string Name which is shown at the top of the quick action element
description string Text shown under the quick action name, useful to describe what the action does
command_sets array of strings Command set keys which you wish to be run
nodes array of strings Names of nodes which you wish the command sets to be run against
buttonLabel string Text of the run button
buttonClass string Css class applied to the button to colour it. btn-default, btn-primary (default), btn-success, bnt-warn, btn-danger

This is the final result of a dashboard that your organization could use today.

opConfig Virtual Operator Dashboard Full - 700

Book a Demo

opConfig v3.2.0 New Release

opConfig v3.2.0 New Release

This is a major release for opConfig it introduces a GUI refresh for the whole application and the Virtual Operator tool to help run troubleshooting commands without giving staff control over devices.

The Virtual Operator is a tool which helps create new jobs comprised of nodes and commands. You can create a new job with nodes, command sets or tags, schedule it for now or later and also annotate the job with a name and description. See the creation screen below;

opConfig New Virtual Operator Job - 700
To access the Virtual Operator dashboard, there is a new menu option seen here;
opConfig New Menu Options - 500
This will take you to the Virtual Operator dashboard, where you can launch commands or command sets. Of course you can change which jobs show on this screen;
opConfig Virtual Operator Dashboard - 700
The Virtual Operator Results View shows all run commands, derived data and conditions from a virtual operator job;
opConfig Virtual Operator Dashboard - 700
An example of the result screen for a job can be seen here;
opConfig Virtual Operator Result - 700
There was an updated dashboard that gives an operational view of opConfig. This allows for a greater view of important information as well as shortcut buttons to execute commands that have been developed in the Virtual Operator.
opConfig Home Dashboard - 700
This is quite an extensive update with many new features. If you would like one of our award-winning engineers to demo the new features, fill out the form below.

Book a Demo

A simple and effective CMDB solution

A simple and effective CMDB solution

A configuration management database (CMDB) is an important component of an organization to ensure they are aware of what assets are in the organization and also the relationships and interdependencies that are in place. Despite being at the core of the ITIL process, many organizations fail to implement a CMDB, this can be due to resource limitations; time, knowledge or money.

These perceived limitations are not as valid as anticipated, implementing a CMDB solution is quite straightforward and cost-effective compared to the risk that is prevalent without one. Opmantek has an extensive CMDB solution that will benefit your organization while reducing the severity of the resource limitations that other implementations can face.

 

Time Limitations

As IT departments have grown in responsibility there has been a decline in resources and staff in some organizations. These companies don’t view IT as an asset, they view it as a liability and a cost that they wish to reduce. There are resources that showcase how managing IT as a business will actually improve net revenue. A major challenge for these organizations is finding the time to implement solutions today, that will save them tomorrow, it is hard to think ahead when you are fighting daily fires.

Open-AudIT is the perfect solution for teams that are in this situation, the application can be installed on Windows, Linux or on a CentOS 7 VM, up and running in under 10 minutes. After supplying credentials, Open-AudIT will automatically discover everything that is connected to your network and then proceed to audit it. In 10 minutes you will have an effective CMDB storing every asset that is connected to your network.

Further, opConfig acts as a configuration and compliance management tool that will monitor your network devices for configuration changes. Once configured, it will alert on change and can also compare configuration states to its own history or even other machines. Once you have downloaded opConfig, there is minimal set up required and it will start monitoring your network, you will know of anomalies in minutes.

 

Knowledge Limitations

The same departments mentioned above that have been given more responsibilities, generally aren’t afforded the extra time to thoroughly learn new systems and processes, they have the time to learn to fight the fire not prevent it. This can lead to applications being partially configured, used differently to intentions or not used at all.

Opmantek offer 30-day free support for all new customers to help them get our software working and optimised, Opmantek even offers on-site training to help all staff easily grow accustomed to our platform.

 

Financial Limitations

The bottom-line when implementing new software is to keep in mind the bottom-line. A CMDB solution will provide a better change management process while also protecting your organization against any disaster scenarios because you will have the ability to roll back. Finding budget to use software for prevention purposes becomes really difficult to justify the ROI because they are systems designed to provide value in a crisis.

However, the combination of opConfig and Open-AudIT boast a wide array of business benefits for immediate, Open-AudIT can be used for asset management and to ensure that you are within license parameters, opConfig can be used to ensure that all of your network devices are configured correctly and compliant to numerous standards. This combination of modules is saving organisations thousands of dollars in licensing fees each year by automating device discovery and audit, storing configurations, monitoring changes and pushing configuration changes out to sets of devices.