Leading the Way: Insights from the Kore-Tek Leadership Team
This is a series of deep dive Q & A sessions with Kore-Tek Optical Network Specialists providing their insights and perspectives on building, implementing, and operationally maintaining critical network infrastructure. In this edition, we catch up with Kore-Tek’s Director of Managed Services who discusses our Network Operations Center (NOC), Remote Engineering Support (RES), and Front-Line Maintenance (FLM).
Q: Can you describe the scope of your department’s responsibilities within the organization?
The Managed Services division is dynamic. Our scope is threefold:
We provide monitoring, troubleshooting, and repair for customer networks 24/7/365 days a year.
We also provide technical assistance and training for customers. These services are for customers with no experience to experienced customers who want someone with expertise to assist with a migration or upgrade event.
We provide consulting services to customers that don’t staff optical transport specialists. This includes troubleshooting operational issues, advisory services for best practices, and design and implementation consulting.
Q. Please describe the various programs the managed services division offers
The Network Operations Center (NOC) focuses on alarm monitoring, troubleshooting, customer notifications and root cause analysis.
Remote Engineering Services (RES) focuses on Method of Procedure (MOP) development, scheduled technical assistance and advisory services
Front Line Maintenance (FLM) focuses on physical site assistance for install, migration and maintenance events.
Q. What are some of the considerations a customer might have when evaluating these different services and how do they provide value to the end user?
First off, staffing an entire NOC is expensive. Second, while an organization may already have a NOC in place, training an existing NOC staff that needs Optical or Network monitoring experience can result in inconsistent results and avoidable outages. So, established customers needing more expertise should consider using Kore-Tek as a critical stop-gap while internal expertise is developed. Finally, mid-size organizations new to network monitoring should consider Kore-Tek until the network size (daily traffic) justifies the implementation of an entire NOC or their existing NOC develops enough expertise on the new equipment or network implementation.
Concerning remote engineering services, this is a comprehensive service specifically for customers that don’t have Optical Transport expertise on staff and can’t justify hiring or recruiting a specialist. We also work with experienced customers that want a sounding board or second opinion on designs developed by specific vendor’s sales teams. They seek the expertise to provide pros and cons for various vendors and design requirements. Our most common function for presales discussions is to set expectations for a specific type of design or vendor in relation to the sales pitch. We work with the client to understand what a customer’s expectation is before reviewing a network design to ensure the vendor is addressing the client’s needs while not over/under-engineering or artificially inflating a network. Our most common post-sales function is training and assisting with network changes, typically simultaneously. This service covers everything an in-house optical specialist would typically address.
Front Line Maintenance is a service for customers preparing for a large project that may need assistance installing equipment in a data center or over several sites. This service also covers local help, including troubleshooting, maintenance, or migration services. Customers that don’t staff local engineers/techs or customers with projects whose activities will stress existing staffing resources should consider this service.
Q. Can you describe the role of the NOC in particular, and how it fits into the broader landscape of supporting Kore-Tek customers?
The NOC provides technical expertise for all monitoring and failure events within a specified portion of a network. We have the experience and capabilities for 24x7x365 network monitoring. We also specialize in tracking recurring events typically lost in response-only monitoring. The benefits for clients are that this has resulted in a lower incidence of impacting events and reduced Mean-Time-To-Repair (MTTR).
Q. What tools and IT resources does the NOC use to manage the network infrastructure?
The Automated Alarm handling and ticketing are at the core of the service and allow us to maintain a Time-To-Ticket of under 15 minutes to provide initial notifications.
Secondly, the customer web portal provides customers with near real-time ticket updates and alarm reporting. It also allows customers to review the ticket-handling process, scheduled maintenance events, or request assistance with network activities.
Q. Can you walk us through the department’s incident response process? How do you prioritize and triage issues?
Each customer network is given equal value for incident response, and Kore-Tek retains a staff capable of working on several incidents simultaneously. Each incident is given a different value in a customer’s network depending on several factors that may or may not match other customer’s networks. The type and amount of redundancy is the most significant aspect.
Q. How do you handle network outages or other major incidents, and what is the role of the NOC in the response?
All events follow the same template: Source identification, resource allocation, corrective action, and documentation.Source Identification: Determine as quickly as possible if the event is a failure internal to the network devices or external to a service provider or fiber network. A quick resolution requires determining who owns the repair activities.
Resource Allocation/Corrective Action: Identifying the possible cause determines the allocation of resources and action taken. Some issues are only correctable by customer procedures and require only notifications, while others require the full attention of a qualified engineer for the duration of the event.
Documentation: All tickets include an event summary, action taken, and duration. Events requiring network configuration changes are documented, baseline network drawings are updated, and interested parties are notified.
Q. Can you discuss any specific examples of how the NOC services helped support our customers during an event which could’ve seriously impacted the customers network?
Any alarm event can become impacted if left without resolving a probable cause. Outages are not preventable in most scenarios, but proper network design should result in loss of redundancy events and not node isolations or service interruptions.
Also, general preventative maintenance can reduce the impact of some network events. These include light level evaluations after a fiber cut, monitoring alarm frequency to isolate potentially failing chipsets, and equipment replacements for recurring, short-impact events. Network outages should be limited to equipment failures, physical damage, or controlled maintenance.
Q. How do you ensure that the NOC team is adequately trained and staffed to handle a range of network issues?
Each member of the NOC team is assigned a mentor for 6-12 months and makes decisions after consulting with their mentor. Mentors typically review after-action reports with each other to determine if alternate actions should have occurred. Education and review are ongoing for all team members.
Q. Can you discuss any long-term strategic goals or plans for the NOC, and how you see the role of network operations evolving in the coming years?
My goal is straightforward. Be the trusted resource to provide our customers with the requested information or knowledge. We want to be the best NOC provider. I want our customers to want to call if they have a question and not dread having to deal with our team.
Q. Why should customers invest in an outside NOC service instead of trying to manage these processes internally?
NOC services are expensive to set up and run. Successful NOC’s must include optical and Transport specialists. Optical Transport is a very specialized skill, and none of the existing certifications can indicate if an engineer is viable in a NOC environment or even knowledgeable on a specific network technology. We have several different types of specialists, each with a solid theoretical foundation behind how transport networks work.
Remote Engineering Services (RES)
Q. You discussed earlier what RES is, can you elaborate on the types of support that you offer the customers?
RES covers any scenario that isn’t specifically NOC operations or Field Services. We support customers interested in building their own network and provide advice and guidance on how to design and implement the new network and migration planning services for reconfiguration events.
We also provide network audits for network health checks and acquisitions or to review existing NOC service providers. Other services include technical support for upgrades, migrations, and network extensions. We can also provide insight and expertise for vendor selection and RFP publication. RES covers anything a customer needs where they would typically have to hire a specialist/consultant or rely on the honesty of a vendor sales team to resolve.
Q. Who are the engineers within the organization that support the RES services and what type of technical support specialists is the customer getting?
There are different levels. We have Layer 1 specialists for everything under Transport Optical. This includes DWDM (Dense Wave Division Multiplexing), RON (Routed Optical Networking), and dark fiber services.
Next is Layer 2/3 specialists to support more advanced RON infrastructure and switched networks that rely on dark fiber for Transport.
Please note that the entire team handles a variety of challenges outside of typical network design, including advising on power requirements, physical infrastructure, and service provider reviews.
Q. Please describe a recent remote engineering services project that your department led, and how you used RES to support the needs of the customer.
How about two? Let’s compare and contrast.
In the first example, a customer upgraded their network to a fixed-grid design using flex-grid hardware. The vendors TAC resisted supporting the network because it wasn’t clearly documented. Kore-Tek worked as the go-between for the customer to show the TAC team how the network functioned and why the existing documentation supported it. Further, we worked with the vendor product management team to implement critical changes to the documentation to reflect the customer’s design. In this case, we provided the customer with expertise more significant than the TAC team to handle the challenging issue of a vendor unaware of how their equipment could be used.
In the second example, a customer wanted to install their equipment but needed more than the vendor training courses. Kore-Tek provided on-demand training on the equipment as they installed it and supported their team when they ran into technical challenges. This turned out to be cheaper for the customer because they didn’t have to pay for the vendor training course or the travel expenses associated with it. Our training was very “hands-on” and specific for their use case (bypassing concepts they weren’t using or cared about). This customer found the hands-on training approach so beneficial that they still keep a few rolling hours active to cover eventualities where they need extra assistance for highly technical problems.
Q. Why should customers invest in an outside RES service instead of trying to manage these processes internally?
In a word, cost. The number of transport specialists has decreased dramatically in the last ten years. Hiring someone knowledgeable about transport design and operations is challenging, and a quality applicant will likely cost more than many expect.
Front-Line Maintenance (FLM)
Q. Please describe Front-Line Maintenance (FLM) and what the services provide.
To keep it simple, consider FLM a service covering anything physical. Cabling, power, installation, migrations and things that require moving fibers or cables. It also includes maintenance requiring any physical change to modules or equipment.
Q. How does FLM integrate with other services like NOC and RES?
It is usually the last action taken. RES and NOC define a location’s physical needs, and FLM implements those changes. The controlling engineer for a NOC event or RES case manages the FLM to ensure proper documentation is provided to the local tech, and the engineer performs the quality checks after the work is complete, including closing out notifications or documents after the FLM tech leaves the site.
Q. Who should consider having FLM support added to their managed services contract?
Basically, anyone that needs extra assistance with the physical work involved with a network implementation or migration. Anyone that doesn’t have the familiarity and expertise in-house to support the network equipment (hardware).
Q. How do you measure the performance and effectiveness of your department’s services?
We measure in a couple of ways.
Time on task: Each task has a specific amount of time it should take. Deviations are standard, but we keep an eye on how long tasks take and what kinds of complications are observed to try and prevent them in the future.
Site revisits: Each event that requires FLM should complete in one visit. Revisits are expensive and typically mean an extended outage or recurring outage occurred. Revisits are uncommon, but each is reviewed to see if the initial actions or thought process could have prevented it.
Q. Why should customers invest in an outside FLM service instead of trying to manage these processes internally?
Cost. Depending on the number of activities within a network, it may not make sense to staff the number of techs and engineers needed to maintain a network. This is more common when a specific planned activity (migration, augmentation, or equipment refresh) is going to occur, and the project will need an increase in personnel for a short duration.
Any final thoughts or insights which we didn’t touch on today?
Yes. Two things:
Someone needs to watch and maintain a customer’s network. It doesn’t have to be Kore-Tek, but someone needs to do it. Further, the watcher needs to understand the equipment and what the alarms mean when they report. This can be developed internally, but it is time-consuming and expensive. Finding a person with the right mental attitude to research issues can be challenging until they understand why they occurred. It’s critical to have staff that has an acute appreciation for detail. Minor problems can turn into big ones at the drop of a hat, and prevention goes a long way to limit those kinds of events. Also, networks never stop working. Whoever is watching the network should understand that problems occur on their own time and not within allotted workdays or outside of vacations?
Founded by optical network specialists, Kore-Tek is widely recognized for their unparalleled expertise in critical network infrastructure, providing everything from fiber network architectures to multi-technology optical networking, routing and switching implementations.
Trusted and credentialed by major equipment manufacturers and technology service providers — and backed by decades of experience — Kore-Tek engineers have planned, designed, and managed some of the most relied-upon public and private networks in use today, improving network operations and making complex, next generation digital transformations, simple.
Businesses depend on networks. Network professionals depend on Kore-Tek.