summaryrefslogtreecommitdiff
path: root/unit5/unit5.typ
blob: 0844ea17777c96548078cabf77850eb7a03fc4c6 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
#let title = [
  Unit 5: Control Layer
]
#let proverb = [
  Knowledge not shared, remains unknown.
]
#set text(12pt)
#set page(
  header: [ 
    #box()[#proverb]
    #h(1fr)
    #box()[#title]
    ],
)
#align(center, text(20pt)[
  *#title*
])
#show table.cell.where(y: 0): strong
#outline()
#pagebreak()
= Introduction to Control Layer
_The control layer includes software tools that are responsible for managing and controlling the underlying cloud infrastructure and enables provisioning of IT resources for creating cloud services._\
- Deployed on top of virtual or physical layer.
- Recieves requests from service and orchestration layers
- Interacts with virtual and physical layers for provisioning IT resources.
- Exposes the resources and supports the service layer where coud services interfaces are exposed to the consumers.
- Key functions are:
  1. Resource Configuration
  2. Resource Porvisioning
  3. Resource Monitering
= Control Software
- Ties together underlying physical resources and software abstractions for resource pooling and dynamic allocation of resources.
- Provisions resources for services.
- Provides information about provisioned or consumed resources to cloud portal and billing system.
- Control software discovers all underlying resources to find total available resources.
- Provides complete view of all resources in cloud environment.
- Centralizes management of IT resources.
- There are two types, *element manage* and *unified manager*.
== Element Manager
#figure(image("./assets/eleman.png"))
- Infrastructure component vendors may provide them built in or as extenal software.
- Required to support initial component configuration such as zoning, RAID levels, LUN masking, firmware updates.
- Required when resource capacity needs to be expanded to meet demands.
- Used for performing security settings and policy configurations.
- Troubleshooting and monitering may also be performed.
- For large clouds, using element managers alone can become complex.
== Unified Manager
#figure(image("./assets/uniman.png"))
- Provides a single management interface for managing resources and provisioning resources to services.
- Interacts with all standalone infrastructure through native API calls.
- Discovers and collects information on configurations, connectivity, utilization of cloud infrastructure elements.
- Compiles this information and provides consolidated view of infrastructure resources.
- Identifies relationship between virtual and physical elements for easy management.
- Provides a topology of infrastructure.
- This enables administrators to quickly find and understand connections of components and services.
- Exposes APIs to interface with orchestration layer to automate service provisioning.
- Allows dynamic addition and removal of resources without impacting availability.
- Provides a dashboard that shows how infrastructure configuration and resource utilization.
- Enforces compliance by creating configuration policies for services.
- Tracks configuration changes and performance compliance checking.
== Key phases for provisioning resources
=== 1. Resource Discovery
- Create an inventory of resources.
- This allows unified manage to learn what resources are available.
- Provices information about assets, such as:
  - Configuration
  - Connectivity
  - Availability
  - Utilization
  - Physical-to-Virtual dependencies.
- Provides administrators visibility into each resource.
- Enables centralized monitering of resource.
- Typically, APIs used for discovery.
- Scheduled by setting an interval for periodic occurance.
- Frequency can be changed by administrators.
- Can be institiated by administrators when change occurs in infrastructure.
- Captures information such as:
  1. *Compute systems*
    - Number of blade servers
    - CPU speed
    - Memory capacity
    - CPU and memory pools
    - Mapping between physical and virtual compute.
  2. *Network components*
    - Switch model
    - Network adapters
    - VLAN IDs
    - VSAN IDs
    - Physical-to-virtual network mapping
    - QoS
    - Network topology
    - Zones
  3. *Storage Systems*
    - Type of storage system
    - Drive type
    - Total capacity
    - Free capacity
    - Used capacity
    - RAID level
    - Storage pools
    - Physical-to-virtual storage mapping
=== 2. Resource Pool Management
- Unified manager allows for management of virtual resources such as VM, virtual volume, virtual network, etc.
- Virtual resource are created from pools and provisioned to services.
- Allows administrator to grade pools.
- Resource grading is a process that categorizes these pools based on criteria like performance, capacity, availability.
- Pools of different grades are used to create service offerings for customers.
- Multiple grades may be defined for compute, storage, and network pools.
- Each grade is marked with a grade name.
- Number of grade levels depends on business requirements.
=== 3. Resource Provisioning
_Resource Provisioning involves allocating resources from graded resource pools to service instance._\
- Starts when consumers select cloud services from the service catalog.
- Service template facilitates comsumers to understand service capabilities.
- Service template provides guidelines to create workflows for service orchestration.
_The unified manager, on recieving a provisioning request, allocates the resources and integrates as per the service template to create an instance of the service._\
= Software-defined approach
#figure(image("./assets/sda.png"))
_Software-defined approach is the mechanism that helps in creating and implementing an optimized IT infrastructure that can help organizations achieve competitive advantage and higher value through speed and efficiency in delivering services._\
- Abstracts all infrastructure components and pools them into aggregated capacity.
- Separates control or management functions from the underlying components to external software.
- A physical infrastructure component has control path and data path.
- Control path deals with policies and data path deals with actual data transfer.
- Software-defined approach decouples control and data paths.
- This centralizes data provisioning and management tasks through software.
- Runs on centralized compute system called software-defined controller.
== Key Functions of software-defined controller
- Has builtin intelligence that automates provisioning and configuration based on defined policies.
- Enables organizations to dynamically, uniformly, easily modify and manage infrastructure.
- Discovers the available underlying resources and provides aggregated view of resources.
- Abstracts underlying hardware resources and pools them.
- Enables rapid provisioning of resources from pool that aligns with agreements with customers.
- Enables an administrator to manage resources, node connectivity, traffic flow, behaviour of components, apply policies uniformly across components, enforce security.
- Provides interfaces that enable applications to request resources and access them as services.
== Benefits of software-defined approach
- Enables provisioning resources based on policies in very short time.
- Delivers infrastructure resources to consumers via service catalog.
- Provide on-demand self-service access to consumers.
- Drastically improve business agility.
- Increases the flexibility by abstracting the underlying IT resources.
- This allows providers to use low-cost hardware or reuse it increasing capital expenditure.
- Improves utilization of resources reducing capital expenditure.
- Easy to aggregate new hardware into pools, increasing capacity.
- Provides central management of resources to ensure service levels and moniter resource utilizations.
- Allows creation of new innovative services that span underlying resources.
- Helps providers provide most efficent and scalabe cloud solutions.
= Resource Management
_Resource Management is the process of allocating resources effectivly to a service instance from a pool of resources and monitering the resources that help maintain service levels._\
- Multiple consumers share underlying resources.
- Controls utilization of resources.
- Prevents service instance from monopolizing resource.
- Resources managed from a centralized management server.
- Server defines policies and configures resources.
- Server can also pool resources, allocate, and optimize their utilization.
== Resource Allocation Models
=== Relative resource allocation
- Resource allocation for a service is not defined quantitively.
- Resource allocation is defined proportionally relative to the resources allocated to other services.
=== Absolute resource allocation
- Resources are allocated based on a defined quantitative bound for each service.
- Lower and upper bounds are defined.
- Lower bound garentees the minimum number of resources to a service and upper bound defines the maximum.
== Resource Management Techniques
#table(
  columns: (auto, auto, auto),
  table.header([ Compute ], [ Storage ], [ Network ]),
  [
    - Hyper-threading
    - Memory Page Sharing
    - Dynamic memory allocation
    - VM load balancing across hypervisors
    - Server flash-cache
  ], [
    - Virtual Storage Provisioning
    - Storage pool rebalancing
    - Storage space reclamation
    - Automated storage tiering
    - Cache tiering
    - Dynamic VM load balancing across volumes
  ], [
    - Balancing client workload across nodes
    - Network storm control
    - Quality of Service
    - Traffice shaping
    - Link aggregation
    - NIC teaming
    - Multipathing
  ]
)
== Compute
=== Hyperthreading
#figure(image("./assets/hyperthreading.png"))
- Make a single processor code appear as two logical processors.
- Allows OS to schedule two threads simultainously and avoid idle time.
- Both threads cannot be executed at the same time.
- When core resources are not in use, they are used to execute the next thread.
- Increases performance of services.
=== Memory Page Sharing
#figure(
  image("assets/mempagesharing.png", width: 50%),
)
- VMs may create redundant copies of memory pages.
- This results in increased memory consumption.
- Hypervisor scanned to find redundant memory pages.
- If one is found, VM memory pointer updated to point to a shared location.
- This reclaims redundant memory pages.
- Improves memory utilization.
=== Dynamic Memory Allocation
- Technique used to reclaim memory pages.
- Best to allow guest VM to select memory pages to free.
- Each VM has an agent installed in the guest OS that tells the hypervisor which memory page to free and which to keep.
- When compute is not under memory pressure, no action is taken by the agent.
- When memory is scarce, hypervisor tells agent to demand memory from VMs.
- This memory is put back into the memory pool.
- This memory is then given to other VMs that need it.
=== VM load balancing across hypervisors
- For redundency and load balancing, hypervisors are clustered.
- When a VM is powered on, management server checks the availability of resources in all hypervisors.
- Places VM in hypervisor where resources are available.
- This ensures load is balanced across hypervisors.
- Changes in VM load cause clusters to go out of balance.
- To overcome this, management server moniters all hypervisors to make decisions.
- Management server executes this by moving VM form over-utilized hypervisor to under-utilized hypervisor.
- These decisions are made based on configured threshold values.
- Threshold value is the amount of imbalance in hypervisor resources is acceptable.
=== Server flash-cache technology
- Flash memory cache card is installed to enhance application performance.
- Uses intelligent caching software and flash card.
- Cache software places most frequently used data on the flash card.
- This puts data closer to the application.
- Improves performance by reducing I/O access latencies.
- Increases performance for read-intensive workloads.
- Copy of the hottest data resides on the flash cache.
- Server flash cache needs warm up time before performance increase is observed.
- Warm up time is the time required to move significant amount of data into server flash cache.
== Storage
=== Virtual Storage Provisioning
#figure(image("./assets/virtstorpro.png", width: 50%))
_Virtual storage provisioning is the process that enables the presentation of a LUN to an application with more storage than is physically allocated to it on the storage system._\
- Administrators often over-anticipate storage requirements.
- This leads to unused space and lower capacity utilization.
- It also leads to excess storage capacity, which begets higher cost, increased power consumption, cooling and floor space.
- With virtual storage provisioning, physical storage is allocated on demand.
- More efficient storage utilization by reducing allocated but unused storage.
- Provides rapid elasticity by adapting to variations in workloads by dynamically expanding or reducing storage levels.
=== Storage pool rebalancing
_Storage pool rebalancing is a technique that allows to automatically rebalance allocated extents on physical disk drives over the entire pool when new drives are added to the pool._\
- When storage pool is expanded, sudden introduction of empty drives combined with old full drives causes data imbalance.
- Also causes performance impact because new data would be added mostly to new drives.
- Restripes data across all drives in the storage pool.
- This enables spreading out data equally on all physical disks within the shared pool.
- This ensures that used capacity of each drive is uniform across the pool and helps in increasing overall pool performance.
=== Storage space reclamation
- Identifies unused space in thin LUNs and returns it to the storage pools.
- Two options for storage reclamation are:
  1. *Zero extent reclamation*\
    - Commonly implemented at the storage level.
    - Provides the ability to free storage extents that only contain zeros.
  2. *API based reclamation*\
    - Uses APIs to communicate locations of unused space in LUNs to storage system.
    - Allows storage system to reclaim all unsued physical storage back into the storage pools.
=== Automated storage tiering
_Automated storage tiering is the technique of establishing a hierarchy of different storage types for different categories of data that enables storing the right data automatically to the right tier, to meet the service level requirements._\
- Many services have predictable spikes in activity with lower activity at other times.
- Automated storage solution addresses these cyclical fluctuations as well as unpredictable spikes.
- It can replace manul storage management and significantly benefit cloud environments.
- Tiers are differentiated based on protection, performance, cost.
- Example, using tier 0 SSDs to store hot data and tier 1 HHDs to store cold data.
- Optimizes complete use of all kinds of storage.
- Data is moved between tiers based on defined tiering policies.
- These policies are based on file type, frequency of access, etc.
- Data movement between tiers can happen within or between storage arrays.
=== Cache tiering
#figure(image("./assets/cachetier.png", width: 50%))
- Tiering can also happen at the cache tier.
- A large cache improves performance by retaining large amount of frequently accessed data.
- High proportion of reads happen from the cache.
- Configuring large cache can be costly.
- The size of cache can be increased by using SSDs on the storage system to create a large capacity secondary cache seperate from compute's primary cache.
- This enables tiering between DRAM primary cache and SSDs secondary cache.
- Enables storage system to store a lot of hot data on the cache tier.
- Most reads can now happen from cache tier increasing performance.
=== Dynamic VM load balancing across storage volumes
- During provisioning for VMs, volumes are selected randomly.
- This can lead to underutilized volumes.
- Dynamic VM load balancing across storage volumes enables intelligent placement of VMs during creation.
- It does so based on I/O load, available storage in hypervisor's native FS or NAS FS.
- Implementd in a centralized management server the manages virtualized environments.
- The server performs ongoing dynamic VM load balancing within a cluster of volumes.
_A cluster volume is a collection of pool of a hypervisor's native FS or NAS FS volumes that are aggregated as a single volume to enable efficient and rapid placement of new virtual machines._\
- User configurable space utilization and I/O latency thresholds are defined to ensure space efficiency.
- I/O bottlenecks avoided.
- Thresholds configured during configuration of clustering volumes.
== Network
_Network traffic flow in a cloud network infrastructure is controlled to optimize both performance and availability of cloud service._\
- Administrators use several traffic management techniques.
- Some enable distribution of traffic load across nodes or parallel network links.
- This prevents overutilization and underutilization of resources.
- Othes enable automatic failover of network traffic from a failed network component.
- Some also ensure garenteed service levels for a class of traffic contending with other classes for network bandwidth.
=== Balancing client workload across nodes
#figure(image("./assets/bcwn.png"))

=== Network storm control
=== Quality of Service (QoS)
=== Traffic shaping
=== Link aggregation
=== NIC Teaming
=== Multipathing