unit5/unit5.typ


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358

#let title = [
  Unit 5: Control Layer
]
#let proverb = [
  Knowledge not shared, remains unknown.
]
#set text(12pt)
#set page(
  header: [ 
    #box()[#proverb]
    #h(1fr)
    #box()[#title]
    ],
)
#align(center, text(20pt)[
  *#title*
])
#show table.cell.where(y: 0): strong
#outline()
#pagebreak()
= Introduction to Control Layer
_The control layer includes software tools that are responsible for managing and controlling the underlying cloud infrastructure and enables provisioning of IT resources for creating cloud services._\
- Deployed on top of virtual or physical layer.
- Receives requests from service and orchestration layers
- Interacts with virtual and physical layers for provisioning IT resources.
- Exposes the resources and supports the service layer where coud services interfaces are exposed to the consumers.
- Key functions are:
  1. Resource Configuration
  2. Resource Porvisioning
  3. Resource Monitering
= Control Software
- Ties together underlying physical resources and software abstractions for resource pooling and dynamic allocation of resources.
- Provisions resources for services.
- Provides information about provisioned or consumed resources to cloud portal and billing system.
- Control software discovers all underlying resources to find total available resources.
- Provides complete view of all resources in cloud environment.
- Centralizes management of IT resources.
- There are two types, *element manage* and *unified manager*.
== Element Manager
#figure(image("./assets/eleman.png"))
- Infrastructure component vendors may provide them built in or as extenal software.
- Required to support initial component configuration such as zoning, RAID levels, LUN masking, firmware updates.
- Required when resource capacity needs to be expanded to meet demands.
- Used for performing security settings and policy configurations.
- Troubleshooting and monitering may also be performed.
- For large clouds, using element managers alone can become complex.
== Unified Manager
#figure(image("./assets/uniman.png"))
- Provides a single management interface for managing resources and provisioning resources to services.
- Interacts with all standalone infrastructure through native API calls.
- Discovers and collects information on configurations, connectivity, utilization of cloud infrastructure elements.
- Compiles this information and provides consolidated view of infrastructure resources.
- Identifies relationship between virtual and physical elements for easy management.
- Provides a topology of infrastructure.
- This enables administrators to quickly find and understand connections of components and services.
- Exposes APIs to interface with orchestration layer to automate service provisioning.
- Allows dynamic addition and removal of resources without impacting availability.
- Provides a dashboard that shows how infrastructure configuration and resource utilization.
- Enforces compliance by creating configuration policies for services.
- Tracks configuration changes and performance compliance checking.
== Key phases for provisioning resources
=== 1. Resource Discovery
- Create an inventory of resources.
- This allows unified manage to learn what resources are available.
- Provices information about assets, such as:
  - Configuration
  - Connectivity
  - Availability
  - Utilization
  - Physical-to-Virtual dependencies.
- Provides administrators visibility into each resource.
- Enables centralized monitering of resource.
- Typically, APIs used for discovery.
- Scheduled by setting an interval for periodic occurance.
- Frequency can be changed by administrators.
- Can be institiated by administrators when change occurs in infrastructure.
- Captures information such as:
  1. *Compute systems*
    - Number of blade servers
    - CPU speed
    - Memory capacity
    - CPU and memory pools
    - Mapping between physical and virtual compute.
  2. *Network components*
    - Switch model
    - Network adapters
    - VLAN IDs
    - VSAN IDs
    - Physical-to-virtual network mapping
    - QoS
    - Network topology
    - Zones
  3. *Storage Systems*
    - Type of storage system
    - Drive type
    - Total capacity
    - Free capacity
    - Used capacity
    - RAID level
    - Storage pools
    - Physical-to-virtual storage mapping
=== 2. Resource Pool Management
- Unified manager allows for management of virtual resources such as VM, virtual volume, virtual network, etc.
- Virtual resource are created from pools and provisioned to services.
- Allows administrator to grade pools.
- Resource grading is a process that categorizes these pools based on criteria like performance, capacity, availability.
- Pools of different grades are used to create service offerings for customers.
- Multiple grades may be defined for compute, storage, and network pools.
- Each grade is marked with a grade name.
- Number of grade levels depends on business requirements.
=== 3. Resource Provisioning
_Resource Provisioning involves allocating resources from graded resource pools to service instance._\
- Starts when consumers select cloud services from the service catalog.
- Service template facilitates comsumers to understand service capabilities.
- Service template provides guidelines to create workflows for service orchestration.
_The unified manager, on receiving a provisioning request, allocates the resources and integrates as per the service template to create an instance of the service._\
= Software-defined approach
#figure(image("./assets/sda.png"))
_Software-defined approach is the mechanism that helps in creating and implementing an optimized IT infrastructure that can help organizations achieve competitive advantage and higher value through speed and efficiency in delivering services._\
- Abstracts all infrastructure components and pools them into aggregated capacity.
- Separates control or management functions from the underlying components to external software.
- A physical infrastructure component has control path and data path.
- Control path deals with policies and data path deals with actual data transfer.
- Software-defined approach decouples control and data paths.
- This centralizes data provisioning and management tasks through software.
- Runs on centralized compute system called software-defined controller.
== Key Functions of software-defined controller
- Has builtin intelligence that automates provisioning and configuration based on defined policies.
- Enables organizations to dynamically, uniformly, easily modify and manage infrastructure.
- Discovers the available underlying resources and provides aggregated view of resources.
- Abstracts underlying hardware resources and pools them.
- Enables rapid provisioning of resources from pool that aligns with agreements with customers.
- Enables an administrator to manage resources, node connectivity, traffic flow, behaviour of components, apply policies uniformly across components, enforce security.
- Provides interfaces that enable applications to request resources and access them as services.
== Benefits of software-defined approach
- Enables provisioning resources based on policies in very short time.
- Delivers infrastructure resources to consumers via service catalog.
- Provide on-demand self-service access to consumers.
- Drastically improve business agility.
- Increases the flexibility by abstracting the underlying IT resources.
- This allows providers to use low-cost hardware or reuse it increasing capital expenditure.
- Improves utilization of resources reducing capital expenditure.
- Easy to aggregate new hardware into pools, increasing capacity.
- Provides central management of resources to ensure service levels and moniter resource utilizations.
- Allows creation of new innovative services that span underlying resources.
- Helps providers provide most efficent and scalabe cloud solutions.
= Resource Management
_Resource Management is the process of allocating resources effectivly to a service instance from a pool of resources and monitering the resources that help maintain service levels._\
- Multiple consumers share underlying resources.
- Controls utilization of resources.
- Prevents service instance from monopolizing resource.
- Resources managed from a centralized management server.
- Server defines policies and configures resources.
- Server can also pool resources, allocate, and optimize their utilization.
== Resource Allocation Models
=== Relative resource allocation
- Resource allocation for a service is not defined quantitively.
- Resource allocation is defined proportionally relative to the resources allocated to other services.
=== Absolute resource allocation
- Resources are allocated based on a defined quantitative bound for each service.
- Lower and upper bounds are defined.
- Lower bound garentees the minimum number of resources to a service and upper bound defines the maximum.
== Resource Management Techniques
#table(
  columns: (auto, auto, auto),
  table.header([ Compute ], [ Storage ], [ Network ]),
  [
    - Hyper-threading
    - Memory Page Sharing
    - Dynamic memory allocation
    - VM load balancing across hypervisors
    - Server flash-cache
  ], [
    - Virtual Storage Provisioning
    - Storage pool rebalancing
    - Storage space reclamation
    - Automated storage tiering
    - Cache tiering
    - Dynamic VM load balancing across volumes
  ], [
    - Balancing client workload across nodes
    - Network storm control
    - Quality of Service
    - Traffice shaping
    - Link aggregation
    - NIC teaming
    - Multipathing
  ]
)
== Compute
=== Hyperthreading
#figure(image("./assets/hyperthreading.png"))
- Make a single processor code appear as two logical processors.
- Allows OS to schedule two threads simultainously and avoid idle time.
- Both threads cannot be executed at the same time.
- When core resources are not in use, they are used to execute the next thread.
- Increases performance of services.
=== Memory Page Sharing
#figure(
  image("assets/mempagesharing.png", width: 50%),
)
- VMs may create redundant copies of memory pages.
- This results in increased memory consumption.
- Hypervisor scanned to find redundant memory pages.
- If one is found, VM memory pointer updated to point to a shared location.
- This reclaims redundant memory pages.
- Improves memory utilization.
=== Dynamic Memory Allocation
- Technique used to reclaim memory pages.
- Best to allow guest VM to select memory pages to free.
- Each VM has an agent installed in the guest OS that tells the hypervisor which memory page to free and which to keep.
- When compute is not under memory pressure, no action is taken by the agent.
- When memory is scarce, hypervisor tells agent to demand memory from VMs.
- This memory is put back into the memory pool.
- This memory is then given to other VMs that need it.
=== VM load balancing across hypervisors
- For redundency and load balancing, hypervisors are clustered.
- When a VM is powered on, management server checks the availability of resources in all hypervisors.
- Places VM in hypervisor where resources are available.
- This ensures load is balanced across hypervisors.
- Changes in VM load cause clusters to go out of balance.
- To overcome this, management server moniters all hypervisors to make decisions.
- Management server executes this by moving VM form over-utilized hypervisor to under-utilized hypervisor.
- These decisions are made based on configured threshold values.
- Threshold value is the amount of imbalance in hypervisor resources is acceptable.
=== Server flash-cache technology
- Flash memory cache card is installed to enhance application performance.
- Uses intelligent caching software and flash card.
- Cache software places most frequently used data on the flash card.
- This puts data closer to the application.
- Improves performance by reducing I/O access latencies.
- Increases performance for read-intensive workloads.
- Copy of the hottest data resides on the flash cache.
- Server flash cache needs warm up time before performance increase is observed.
- Warm up time is the time required to move significant amount of data into server flash cache.
== Storage
=== Virtual Storage Provisioning
#figure(image("./assets/virtstorpro.png", width: 50%))
_Virtual storage provisioning is the process that enables the presentation of a LUN to an application with more storage than is physically allocated to it on the storage system._\
- Administrators often over-anticipate storage requirements.
- This leads to unused space and lower capacity utilization.
- It also leads to excess storage capacity, which begets higher cost, increased power consumption, cooling and floor space.
- With virtual storage provisioning, physical storage is allocated on demand.
- More efficient storage utilization by reducing allocated but unused storage.
- Provides rapid elasticity by adapting to variations in workloads by dynamically expanding or reducing storage levels.
=== Storage pool rebalancing
_Storage pool rebalancing is a technique that allows to automatically rebalance allocated extents on physical disk drives over the entire pool when new drives are added to the pool._\
- When storage pool is expanded, sudden introduction of empty drives combined with old full drives causes data imbalance.
- Also causes performance impact because new data would be added mostly to new drives.
- Restripes data across all drives in the storage pool.
- This enables spreading out data equally on all physical disks within the shared pool.
- This ensures that used capacity of each drive is uniform across the pool and helps in increasing overall pool performance.
=== Storage space reclamation
- Identifies unused space in thin LUNs and returns it to the storage pools.
- Two options for storage reclamation are:
  1. *Zero extent reclamation*\
    - Commonly implemented at the storage level.
    - Provides the ability to free storage extents that only contain zeros.
  2. *API based reclamation*\
    - Uses APIs to communicate locations of unused space in LUNs to storage system.
    - Allows storage system to reclaim all unsued physical storage back into the storage pools.
=== Automated storage tiering
_Automated storage tiering is the technique of establishing a hierarchy of different storage types for different categories of data that enables storing the right data automatically to the right tier, to meet the service level requirements._\
- Many services have predictable spikes in activity with lower activity at other times.
- Automated storage solution addresses these cyclical fluctuations as well as unpredictable spikes.
- It can replace manul storage management and significantly benefit cloud environments.
- Tiers are differentiated based on protection, performance, cost.
- Example, using tier 0 SSDs to store hot data and tier 1 HHDs to store cold data.
- Optimizes complete use of all kinds of storage.
- Data is moved between tiers based on defined tiering policies.
- These policies are based on file type, frequency of access, etc.
- Data movement between tiers can happen within or between storage arrays.
=== Cache tiering
#figure(image("./assets/cachetier.png", width: 50%))
- Tiering can also happen at the cache tier.
- A large cache improves performance by retaining large amount of frequently accessed data.
- High proportion of reads happen from the cache.
- Configuring large cache can be costly.
- The size of cache can be increased by using SSDs on the storage system to create a large capacity secondary cache seperate from compute's primary cache.
- This enables tiering between DRAM primary cache and SSDs secondary cache.
- Enables storage system to store a lot of hot data on the cache tier.
- Most reads can now happen from cache tier increasing performance.
=== Dynamic VM load balancing across storage volumes
- During provisioning for VMs, volumes are selected randomly.
- This can lead to underutilized volumes.
- Dynamic VM load balancing across storage volumes enables intelligent placement of VMs during creation.
- It does so based on I/O load, available storage in hypervisor's native FS or NAS FS.
- Implementd in a centralized management server the manages virtualized environments.
- The server performs ongoing dynamic VM load balancing within a cluster of volumes.
_A cluster volume is a collection of pool of a hypervisor's native FS or NAS FS volumes that are aggregated as a single volume to enable efficient and rapid placement of new virtual machines._\
- User configurable space utilization and I/O latency thresholds are defined to ensure space efficiency.
- I/O bottlenecks avoided.
- Thresholds configured during configuration of clustering volumes.
== Network
_Network traffic flow in a cloud network infrastructure is controlled to optimize both performance and availability of cloud service._\
- Administrators use several traffic management techniques.
- Some enable distribution of traffic load across nodes or parallel network links.
- This prevents overutilization and underutilization of resources.
- Othes enable automatic failover of network traffic from a failed network component.
- Some also ensure garenteed service levels for a class of traffic contending with other classes for network bandwidth.
=== Balancing client workload across nodes
#figure(image("./assets/bcwn.png"))
- Client connections are balanced across a group of nodes such as server clusters that process client requests simultainously.
- Client workload balancing service are provided by a load balancer.
- Load balancer splits client traffic across multiple nodes.
- The working principle is based on vendor implementation.
- A common technique is to place the load balancer between node cluster and Internet.
- This makes all traffic pass through the load balancer.
- Clients use load balancer addresses to send requests.
- The address of the load balancer abstracts the addresses of all the nodes in a cluster.
- The load balancer forwards requests to the required node in a cluster.
=== Network storm control
_Network storm control is a networking technique that prevents regular network traffic on a LAN or VLAN from being disrupted by a network storm. A network storm occurs due to flooding of frames on a LAN or VLAN, creating excessive traffic and resulting in degraded network performance._\
- The causes of a storm are error in network configuration or a DOS attack.
- Enabled on supported LAN switches.
- Moniters all incoming frames to switch ports over specific time interval. 
- The switch calculates total number of frames of a specific type.
- It then compares the sum with a preconfigured storm control threshold.
- The switch port then blocks the traffic and filters out subsequent frames until the interval ends.
=== Quality of Service (QoS)
_Quality of service is the capability of a nerwork to prioritize business critical and latency-sensitive network traffic and to provide better service to such traffic over less critical traffic. QoS enables applications to obtain consistent service levels, in terms of network bandwidth, latency variations, and delay._\
- Performed by raising the priority of critical classes of network traffic oer other classes.
- There are two approaches for QoS.
#table(
  columns: (auto, auto),
  table.header([ Approach ], [ Description ]),
  [ Integrated Services ], [
    - Applications signal the network to inform network components about required QoS.
    - Applications can transmit data through network only after receiving confirmation from network.
  ], [ Differentiated Services ], [
    - Priority specification to network packets are inserted by the applications or by switches or routers.
    - Network uses priority specification to classify traffic and then manage network bandwidth on the traffic class.
  ]
)
=== Traffic shaping
#figure(image("./assets/trash.png"))
- Limits the traffic rate at a network interface such as node or router port.
- Limits the rate of low priority traffic.
- Improves latency and increases available network bandwidth.
- Ensures required service levels are met for business-critical applications.
- Controls traffic rate per client, avoiding network congestion.
- Traffic shaping can be done by a node or interconnecting device.
- Allows administrator to set a limit on traffic rate on a network interface.
- During a traffic burst, traffic shaping retains excess packets in a queue and schedules excess packets for later transmission.
=== Link aggregation
- Combines two or more parallel network links into a single logical link.
- This link is called port-channel.
- It yields higher bandwidth than a single network link.
- Link aggregation enables distribution of traffic across the links in case of link failure.
- If a link is lost, all traffic of that link is redistributed across the remaining links.
- Can be performed between two switches or a switch and a node.
=== NIC Teaming
- Link aggregation technique that groups NICs so they appear as one logical NIC.
- Distributes traffic across NICs and provides traffic failover in the event of NIC or link failure.
=== Multipathing
- Can perform load balancing by distributing I/O across all active paths.
- Standby paths become active if one or more active paths fail. 
- Multipathing process detects the failed path and then redirects I/Os of failed path to another active path.