Measuring the Effectiveness of Autonomous Network Functions that Use Artificial Intelligence Algorithms

Premnath K. Narayanan1,* and David K. Harrison2

1BDGS SA OSS PDU OSS S&T RESEARCH & PCT LM Ericsson Ltd., Athlone, Ireland

2Glasgow Caledonian University, Glasgow, United Kingdom

E-mail: premnath.k.n@ericsson.com; D.Harrison@gcu.ac.uk

*Corresponding Author

Received 31 July 2020; Accepted 01 September 2020; Publication 30 January 2021

Abstract

Autonomous network functions such as Software Defined Networks (SDN), Self-Organizing Networks (SON), and virtual function network orchestrator plays a crucial role in 5G and beyond 5G wireless telecommunication network. Advancements in Artificial Intelligence (AI), Machine Learning (ML) algorithms, and frameworks have led to adequate adaption of stochastic algorithms for autonomous network functions, aimed at performing better than human capability. Measuring the effectiveness of such autonomous network functions is a challenge since stochastic algorithms are fundamentally generalized models and could potentially make malicious proposals. Traditionally effectiveness of network is measured through network assurance Key Performance Indicators (KPIs). Autonomous network functions are kept active when the KPIs are in the acceptable limit, and the network is showing improvement over time. This paper introduces

• Factors that are to be considered beyond KPIs for effective measurement of autonomous network functions that use stochastic algorithms.

• Adopting the right scales for measuring the effectiveness of autonomous network functions using the grading system from medical practices which are used for the treatment of critical illness.

Keywords: Network assurance, AI, ML, counters, stochastic algorithms, and KPIs.

1 Introduction

Autonomous network functions such as data flow control in an SDN controlled network, coverage, and capacity optimization functions in SON and auto-scaling of containerized or virtual or physical network functions in network Orchestration are crucial in day to day operations of LTE, 5G and beyond 5G wireless telecommunication networks. Such autonomous network functions have started embracing AI, ML algorithms in recent years. Though AI, ML algorithms can perform better than human performance [1], such artificial intelligence algorithms can make malicious proposals (at times) due to generalization of models and sensitivity (true positive rate or recall), specificity (true negative rate) trade-offs involved in stochastic algorithms [2]. Also, stochastic algorithms are generally black box in nature, and explaining the behaviour of such algorithms is emerging as a new field referred to as, explainable AI, interpretable models [3, 4, 5]. Today there is no standard unit of measurement and the scale for measuring the effectiveness of autonomous network functions that uses stochastic algorithms. This paper introduces a novel approach for measuring the effectiveness of autonomous network functions considering factors that influence the behaviour of network functions, adapting scales from medical practices that are used for treating patients.

2 Factors Influencing Autonomous Network Functions

Traditionally effectiveness of a network is measured through KPIs of a specific network behaviour such as Accessibility, Retainability, Integrity, Mobility, and Availability. Based on autonomous network function deployments across the networks in the past decade for LTE, the learning from network operators indicate several other factors that need to be considered apart from clinical network KPIs. The key factors such as Gap Measurement (GM), Trait Progress (TP), Stochastic algorithm Bias (SB), External Factors (EF), and Infrastructure Effectiveness (IE) are additional influencing factors as discussed below. These factors are vital in measuring the effectiveness of autonomous network functions.

2.1 Gap Measurement

For the given frequency, bandwidth, and radio access technology, the theoretical performance limits are available through simulations and best performing technology demonstrations. Since there are multiple parameter configurations available for a specific radio access technology, it is necessary to configure and run the network optimally. Also, 3GPP standards such as 38.201, 38.214, 38.321, 38.322, and 38.323 provide details on how to calculate bitrate calculations. Based on such standards, the potential throughput of a 5G network can be derived.

Figure 1 Potential theoretical throughput for a 5G network based on network configurations.

Figure 1 demonstrates potential throughput calculation based on general network attributes such as frequency, modulation, bandwidth, and 5G technology-specific attributes like special slot configuration and layout, as defined in 3GPP. Gap measurement can be derived based on the difference between potential throughput calculation and the actual value of the current network throughput KPIs.

 $G⁢M=∑k=0nPk-Ck$ (1)

where $Pk$ is a Potential calculation for a specific network event, and $Ck$ is the current network KPIs for a specific network event.

Examples for specific network events are “Throughput” (bits per second), “Latency” (total response time for an event such as handover, uplink, and downlink re-connections).

Table 1 Validation scores for ML/AI algorithms

 ML Algorithm Type Algorithm Name Brief Description Regression Mean Squared Error (MSE) / Root Mean Squared Error (RMSE) MSE indicates how close the regression line is to a set of points. Distance from the regression line to the points is calculated, and they are referred to as ”Errors.” The average of such a set of errors is referred to as MSE. RMSE is the standard deviation of prediction errors (residuals), also a precise measure of how spread out these residuals are. [6] Regression Mean Absolute Error (MAE) In the given set of predictions, the average magnitude of the errors is calculated. Direction is not considered in MAE. [6] Regression Adjusted R Squared Measure of effectiveness of independent variables that help in explaining the dependent variables. On the contrary, it also penalizes for adding independent variables that do not help in predicting the dependent variables.[7] Adjusted R Squared is used for comparing multiple models with a different number of independent variables. For selecting significant predictors (independent variables) of the regression model, Adjusted R Squared can be used. Regression Mean Absolute Percent Error (MAPE) / Mean Squared Percentage Error (MSPE) The accuracy of the forecasting system can be measured using MAPE (in percentage).[8] Classification Precision-Recall (P-R) [7] Precision $=$ True Positive / Actual Results, Actual Results $=$ True Positive $+$ False Positive, Recall $=$ True Positive / Predicted Results, Predicted Results $=$ True Positive $+$ False Negative. True Positive: Measure of relevant items that are selected e.g., How many poor performing cells are correctly identified for a condition. False Positive: Measure of not relevant items that are selected e.g., How many good performing cells are wrongly identified for a condition. True Negative: Measure of negative elements that are categorized as truly negative. E.g., How many good performing cells were not identified for a condition. False Negative: Measure of negative elements that are not categorized as truly negative. E.g., How many good performing cells were identified for a condition wrongly. Classification Receiver operator characteristic (ROC)-Area Under Curve (AUC) Performance measurement of classification algorithm at various threshold settings is measured using the ROC-AUC curve. Classification Accuracy and Log-loss[9] Accuracy is a measure of yes or no values, and it is the count of predictions where predicted value equals the actual value. Log loss captures the uncertainty of prediction based on how much it varies from the actual label. Unsupervised Within cluster sum of squares (WCSS) / Between Cluster sum of squares (BCSS)[10] In a cluster, the average squared distance of all the points to the cluster centroid is the measure of the variability of observations within each cluster. This is referred to as WCSS. The average squared distance between all the centroids is referred to as BCSS. Unsupervised Mutual Information Discovering useful representations is one of the core objectives of Deep learning. Deep InfoMax (DIM) simultaneously estimates and maximizes the mutual information between input data and learned high-level representations. Unsupervised Silhouette Co-efficient [11] Silhouette co-efficient analysis can be used to study the separation distance between the resulting clusters and finally deciding the number of clusters. When the number of clusters increases, the silhouette co-efficient score decreases typically. Natural Language Processing Bilingual Evaluation Understudy (BLEU) Score BLEU is a metric that evaluates the generated sentence with the reference sentence. All types of algorithms CV Error Cross-validation is used in the model selection to estimate the test error of the predictive model better. The cross-validation technique is an efficient data partitioning technique to evaluate the validation sets and predict the performance of the predictive model.

2.2 Trait Progress

Network operational goals are defined at specific levels by the operator. Typically, business goals are transferred into specific operational directives for effective achievement of network priorities. For example, a well-established operator could potentially look for capacity and quality over coverage. For such network goals adding more sites and reducing interference are the obvious step. Observing the current KPIs and optimizing the network towards the needed trait until GM (as specified in Equation 1) is close to zero is referred to as trait progress.

2.3 Stochastic Algorithm Bias

Different stochastic algorithms (e.g., Supervised-Regression/Classification, Unsupervised-ML/AI algorithms) have their validation scores. They are briefly discussed in Table 1.

These validation scores indicate that stochastic algorithms could potentially make malicious proposals (false positive, false negative), and this needs to be considered as part of the algorithm’s measurement.

2.4 External Factors

Urban areas have high rise buildings, rural areas have subscribers distributed across longer distances, and sub-urban has less high rise, sparse homes/offices. This kind of topography (urban, rural, and suburban) has different radio propagation characteristics. Additionally, vegetation in the area, water bodies, and elevation of land have an impact on radio signal attenuation. These dimensions bring different measurements for the same events, such as mobility, throughput, and latency. It is critical to measure the same events differently based on external factors.

2.5 Infrastructure Effectiveness

An autonomous function can be executed aligning to different architectures. SDN and O-RAN are two different architectures for core network (also cloud data centre) and radio nodes, respectively. Intentions are similar, and so far, their target network elements are different. In the case of SDN, the focus is on core network elements and data centers in the cloud. The focus of O-RAN is moving away from all in one Radio Access Network (RAN) equipment to hardware, software that can be easily procured from several commodity vendors and integrated at ease by a network operator [12]. An autonomous network function can be deployed on O-RAN RIC (RAN Intelligent Controller) or SDN’s Network Applications at the management plane layer. [13]

Running an autonomous network function with limited time and space complexity is very important for its effectiveness. The architecture and the way autonomous function uses the memory, CPU, and storage for accomplishing the use case determines the effectiveness of autonomous network functions. Key entities within architecture are

1. How modular are the autonomous network functions deployed?

2. How are the dependencies segregated? e.g., Is multiple functions share the same library or segregated with right namespace (such as containers).

All the above factors must be measured for the effectiveness of an autonomous network function.

3 Scale of Measurements for Autonomous network Function

As described in Section 2, every factor needs to be considered for measuring the effectiveness of an autonomous network function. The unit of measurement for each factor is different and can be generalized only as a scale factor.

Generally, such diverse units of scale are compared in the medical field like Glasgow Coma Scale (GCS), Apgar score (AS), and Gleason grading system (GGS). Such scales consider several factors of human sensory and other respiratory systems. Further, this paper proposes to use such a scale of measurement for measuring the effectiveness of autonomous network functions.

Table 2 Glasgow coma scale [14]

 Tests Observed Status Rating Score Eyes – Open before stimulus Yes/No Spontaneous 4 Eyes – After spoken or shouted request Yes/No To sound 3 Eyes – After fingertip stimulus Yes/No To pressure 2 Eyes – No opening at any time, no interfering factor Yes/No None 1 Eyes – Closed by local factor Yes/No Not Testable 0 Verbal – Correctly gives name, place, and date Yes/No Oriented 5 Verbal – Not oriented but communication coherently Yes/No Confused 4 Verbal – Intelligible single words Yes/No Words 3 Verbal – only moans/groans Yes/No Sounds 2 Verbal – No audible response, no interfering factor Yes/No None 1 Verbal – Factor interfering with communication Yes/No Not Testable 0 Motor – Obey 2-part request Yes/No Obeys commands 6 Motor – Bring hand above clavicle to stimulus on head neck Yes/No Localizing 5 Motor – Bends arm at elbow rapidly but features not predominantly abnormal Yes/No Normal flexion 4 Motor – Bends arm at the elbow, features predominantly abnormal Yes/No Abnormal flexion 3 Motor – Extends arm at elbow Yes/No Extension 2 Motor – No movement in arms/legs, no interfering factor Yes/No None 1 Motor – Paralysed or other limiting factors. Yes/No Not Testable 0

3.1 Glasgow Coma Scale

Table 2 – GCS is a neurological scale [14, 15]. The objective of GCS is to give a person’s consciousness state. Used as an assessment scale during the treatment of patients. Generally used in intensive care units for all acute medical and trauma patients. This type of scale can be used for measuring the network element’s current state based on all the autonomous network functions that are applied to the network element.

Based on the answers to the above observation status, summing all the scores for the observation’s status marked as “Yes” gives the overall GCS score.

Such a proven scaling technique can be adopted for measuring the effectiveness of autonomous network functions.

A GCS scale-based proposal for measuring the effectiveness of an autonomous network function is described in Table 3 (introduced in this paper as “Glasgow autonomous network function scale”):

The effectiveness of an autonomous function can be measured using Table 3. A higher score indicates that the autonomous network function is effective. This scale is very effective to analyse an autonomous network function in an offline mode.

Table 3 Glasgow autonomous network function scale

 Factor Level Observed Status Rating Score Gap measurement (GM) – Accessibility, Mobility, Retainability, Integrity Low Gap Yes/No Almost close or difference is close to zero. 2 GM – Accessibility, Mobility, Retainability, Integrity Medium Gap Yes/No The difference is close to 50% of the $Pk$ value. 1 GM - Accessibility, Mobility, Retainability, Integrity High Gap Yes/No The difference is less than 25% of the $Pk$ value. 0 GM – Availability Low Gap Yes/No 100% available 2 GM – Availability Medium Gap Yes/No $>$ 98% available 1 GM – Availability High Gap Yes/No $<$ 98% available 0 Overall Trait Progress (TP) Low Gap Yes/No Almost close or difference is close to zero between current KPI and business-related KPI. 2 TP Medium Gap Yes/No Difference is close to 50% of the $Pk$ value between current KPI and business-related KPI 1 TP High Gap Yes/No Difference is less than 25% of the $Pk$ value between current KPI and business-related KPI 0 Stochastic Algorithm Bias (SAB) Highly accurate (Better than human intelligence or 9 out of 10 predictions are right). Yes/No Algorithm specific metric (e.g., RMSE or Precision Recall). If none, then CV error. 2 SAB Medium accurate (e.g., equal to human intelligence or 7/10 proposals are right). Yes/No Algorithm specific metric (RMSE or Precision Recall). If none, then CV error. 1 SAB Low accuracy (e.g., less than human intelligence or 5/10 proposals are right). Yes/No Algorithm specific metric (RMSE or Precision Recall). If none, then CV error. 0 External Factors (EF) Performs equally regardless of the environment (e.g., urban or rural or suburban) Yes/No Output of autonomous function is consistent across environments 2 EF Performs only in a particular environment (e.g., urban or rural or suburban) Yes/No The output of the autonomous function is not consistent across environments and consistent in the majority of environments 1 EF Performs only in one environment. Yes/No Output of autonomous function is consistent only in one environment 0 Infrastructure Effectiveness (IE) Able to scale linearly as the network grows. (scale-up/scale-down) Yes/No Able to achieve autonomous network function effectively with a limited set of CPUs, memory, and storage. Scale horizontally and deployable on any cloud-native architecture. 2 IE A pre-defined set of hardware dimensioned for a specific set of network elements. Yes/No Able to achieve autonomous network function effectively for the pre-defined hardware dimensioning. 1 IE High CPU, Memory, and Storage. Not linearly scalable. Yes/No Not able to use autonomous function for complete network. 0

3.2 Apgar Score

The health of the newborn infant can be quickly summarized using the Apgar Score. Apgar score has survived the test of time and a sample is shown in Table 4 [16].

Table 4 APGAR score

 Acronym Score of 0 Score of 1 Score of 2 Skin colour and Appearance Blue all over Body pink, blue at extremities No cyanosis Pulse rate and Pulse Absent $<$ 100 beats per minute $>$ 100 beats per minute Reflex and Grimace No response (even to stimulation) Grimace on suction or aggressive stimulation Cry on stimulation Muscle tone and Activity None Some flexion Flexed arms and legs that resist extension Respiratory effort and Respiration Absent Irregular gasping, weak Strong, robust cry

Such a scoring table can be used for measuring the effectiveness of an autonomous function or the network element as such.

Table 5 shows as a proposed autonomous function measurement table based on the Apgar score and referred to as the MEGABITS score.

Table 5 MEGABITS score

 Acronym Description Score of 0 Score of 1 Score of 2 Gap Measurement Gap with theoretical estimate or best performing network element High gap value. Close to median. Low value or close to zero. Trait progress Trait Absent (sporadic curve) Showing signs of progress towards the goal. Key progress already achieved towards trait and continues to progress. Stochastic Algorithm Bias Algorithm Learning High bias (e.g., 5 out of 10 proposals are malicious). Equal to human score. Better than human score. External factors Natural factors Unknown factors affecting decisions Known factors affecting decisions External factors are in control. Infrastructure effectiveness Computation Not possible to scale for the complete network. Possible to scale for complete network (pre defined hardware setup). Scales linearly based on network size and scales horizontally.

The sum of scores of all the 5 factors in the MEGABITS score indicates the effectiveness of the autonomous network function.

Prognosis of men with prostate cancer is measured using the Gleason Grading System. Based on the microscopic appearance of prostate cancer Gleason score is assigned. Higher the score higher the risk of mortality. Figure 2 illustrates the Gleason’s pattern.

Figure 2 Gleason Pattern [17].

In the current form of the Gleason system, prostate cancer is rarely seen in pattern “one” and “two.” Hence to make the system more accurate, it is graded as primary, secondary, and tertiary grades. Primary grade is assigned to the dominant pattern of the tumor (greater than 50% of total patterns seen). The secondary grade is assigned to the next most frequent pattern (less than 50% and at least 5% of the pattern of total cancer observed). Generally, the more aggressive pattern is marked by the pathologist as tertiary grade.

For the case where only two patterns are visible:

Gleason score $=$ primary $+$ secondary.

For the case where three patterns are visible:

Gleason score $=$ primary $+$ (Highest pattern number of secondary or tertiary).

This way, the false-negative rate is minimized, and the Gleason system [17] can detect prostate cancer more accurately.

The Gleason system method can be adopted for measuring the effectiveness of autonomous network functions that uses stochastic algorithms. Since stochastic algorithms use statistical patterns and mathematical models, patterns can be derived for the influencing factors, as discussed in Section 2. Deriving different patterns and grading them from “one” to “five” and adopting the same strategy as primary, secondary, and tertiary grades could potentially indicate the effectiveness of autonomous network functions.

Figure 3 Gleason Pattern-based evaluation [17].

Figure 3 illustrates how the Gleason method can be used for evaluating the effectiveness of stochastic models. As the model degrades and autonomous network function does not cover the complete network, the Gleason score is higher. Higher the score lesser the effectiveness of the stochastic algorithm and, in turn, the autonomous network function. Each autonomous function has its use case (e.g., coverage and capacity optimization for radio, load balancing across network slices in core), and according to the use case, different stochastic algorithm is evaluated, deployed and continuously monitored. As part of continuous monitoring, the Gleason score will be a key measure of the effectiveness of autonomous network function. Regardless of the use case, the evaluation criteria can be adopted as proposed in the Gleason system for the prognosis of prostate cancer.

4 Conclusion

This paper introduces three new scales for measuring the effectiveness of autonomous network functions. The scales proposed in this paper can be further researched with real network data that uses autonomous network functions and proposed in standardization bodies for measuring the effectiveness of autonomous network functions in a telecommunication network.

Acknowledgement

The authors thank their respective organization, university (LM Ericsson Ltd, Glasgow Caledonian University), for supporting the research that would benefit the telecommunication industry. The authors additionally thank the University of Bolton and Amity [IN] London University for providing an opportunity to work on the thesis that measures the effectiveness of autonomous network functions. The authors thank Dr. M Lakshmi Sudha - MBBS DCP DNB(Path), consultant pathologist for introducing the authors to the medical system scales that are time-proven and widely used in day to day medical analysis and treatments for patients with chronic illness. Further, the scales proposed in this paper will act as a catalyst for introducing more autonomous network functions that could potentially reduce the increasing operational expenditure of a telecommunication network. Authors would like to thank Wireless World Research Forum (WWRF) for discussing the preliminary work of this journal as AI workgroup presentations at their respective conferences in London [18] and Denmark [19].

References

[1] Scott Mayer McKinney, Marcin Sieniek, Varun Godbole, Jonathan Godwin, Natasha Antropova, Hutan Ashrafian, Trevor Back, Mary Chesus, Greg C. Corrado, and Ara et al. Darzi. “International evaluation of an ai system for breast cancer screening”. Nature, 577.7788 (2020), pp. 89–94. DOI: 10.1038/s41586-019-1799-6.

[2] Premnath K Narayanan and David K Harrison. “Explainable AI for Autonomous Network Functions in Wireless and Mobile Networks”. International Journal of Wireless and Mobile Networks, 12.3 (2020), pp. 31–44. DOI: 10.5121/ijwmn.2020.12303. URL: https://aircconline.com/ijwmn/V12N3/12320ijwmn03.pdf.

[3] Scott M. Lundberg and Su-In Lee”. “A Unified Approach to Interpreting Model Predictions”. In: 31st Conference on Neural Information Processing Systems. NIPS, 2017.

[4] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why Should I Trust You?” Explaining the Predictions of Any Classifier”. In: 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 2016.

[5] Gregory Plumb, Denali Molitor, and Ameet Talwalkar. “Model Agnostic Supervised Local Explanations”. In: 32nd Conference on Neural Information Processing Systems. NeurIPS, 2018.

[6] CJ Willmott and K Matsuura. “Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance”. In: Climate Research 30 (2005), pp. 79–82. DOI: 10.3354/cr030079.

[7] Lonnie Magee. “R2Measures Based on Wald and Likelihood Ratio Joint Significance Tests”. In: The American Statistician 44.3 (1990), pp. 250–253. DOI: 10.1080/00031305.1990.10475731.

[8] Arnaud de Myttenaere et al. “Mean absolute percentage error for regression models”. Neurocomputing 192 (2016), pp. 38–48. DOI: 10.1016/j.neucom.2015.12.114.

[9] Shen Yi. Loss Functions for Binary Classification and Class Probability Estimation, 2005. URL: http://stat.wharton.upenn.edu/~buja/PAPERS/yi-shen-dissertation.pdf.

[11] Peter J. Rousseeuw. “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis”. Journal of Computational and Applied Mathematics 20 (1987), pp. 53–65. DOI: 10.1016/0377- 0427(87)90125-7.

[12] 2019. URL: https://www.o-ran.org/.

[13] Diego Kreutz et al., “Software-Defined Networking: A Comprehensive Survey”. In: Proceedings of the IEEE 103.1, pp. 14–76. DOI: 10.1109/jproc.2014.2371999.

[14] Graham Teasdale and Bryan Jennett. “Assessment of Coma and Impaired Consciousness”. In: The Lancet 304.7872 (1974), pp. 81–84. DOI: 10.1016/s0140-6736(74)91639-0.

[15] Florence C.M. Reit et al. “Differential effects of the glasgow coma scale score and its components: An analysis of 54,069 patients with traumatic brain injury”. In: Injury 48.9 (2017), pp. 1932–1943. DOI: 10.1016/j.injury.2017.05.038.

[16] Mieczyslaw Finster and Margaret Wood. “The Apgar Score Has Survived the Test of Time”. In: Anesthesiology 102.4, pp. 855–857. DOI: 10.1097/00000542-200504000-00022.

[17] Anders Bjartell. “The 2005 International Society of Urological Pathology (ISUP) Consensus Conference on Gleason Grading of Prostatic Carcinoma”. In: European Urology 49.4 (2006), pp. 758–759. DOI: 10.1016/j.eururo.2006.02.007.

[18] Premnath K Narayanan and David K Harrison. “BIAS mitigation methods for autonomous network functions in telecommunication networks (a presentation)”. In: Wireless World Research Forum. Wireless World Research Forum, 2019. URL: https://www.wwrf.ch/wwrf43.html.

[19] Premnath K Narayanan and David K Harrison”. “Measuring the Effectiveness of Autonomous Network Functions that Use Artificial Intelligence Algorithms (a presentation)”. In: Wireless World Research Forum. Wireless World Research Forum, 2020. URL: http://wwrf.ch/wwrf44.html.

Biographies

Premnath K. Narayanan is a seasoned Software Engineer (System Engineering, software architecture and development) with 22 years of practical experience in realizing Commercially of the shelf (COTS)/cloud products for ICT (Information and Communications Technology) industry. He has designed and developed products, trained users and mentored employees. Working as a master engineer at Ericsson primarily focused on researching and developing autonomous network functions for telecommunication network products.

David K. Harrison is currently Professor of Design and Manufacturing at Glasgow Caledonian University where he has held a range of managerial roles. He has spent his working career in manufacturing industry or industry facing academia. A graduate of UMIST, he has edited several books and conference proceedings and has published his work widely. He has supervised 81 PhD students through to graduation. Around half of these students have been based outside the United Kingdom.