Cloud Technologies Decoded: Facing Challenges with the Cloud Support Engineer

Unveiling the intricate world of the cloud, and the heroes behind its seamless operations.

Cloud Technologies Decoded: Facing Challenges with the Cloud Support Engineer
Photo by Growtika / Unsplash

Imagine an era where businesses had to rely on physical servers, manually updating software, and regularly grappling with space constraints. That's the pre-cloud era. But along with benefits, the cloud brings unique challenges. This article dives into the intricacies of cloud tech, urgent challenges the industry faces, and the critical role played by a Cloud Support Engineer.

The Rise of Cloud Technologies

What is Cloud Computing?
Cloud computing lets individuals and businesses to store and process data in third-party data centers over the Internet. It has led to increased agility, scalability, and cost-effectiveness.

Benefits of Cloud Technologies

  1. Scalability: Easily scale up or down based on demand.
  2. Cost Efficiency: Pay only for what you use.
  3. Remote Accessibility: Access data and applications from anywhere.

Urgent Industry Challenges

Security Concerns
The shared responsibility model means both cloud providers and clients are responsible for security. However, breaches still occur, making it a top concern.

Data Governance and Compliance
With data stored in multiple locations, often across borders, ensuring data sovereignty and compliance becomes a challenge.

Vendor Lock-in
Reliance on a single cloud service provider's tools and technologies can result in a lack of flexibility and potential migration challenges.

The Role of a Cloud Support Engineer

Who is a Cloud Support Engineer?
A Cloud Support Engineer aids businesses in navigating the complexities of cloud environments. They're the unsung heroes ensuring seamless operations, optimum performance, and mitigating potential pitfalls.

Key Responsibilities:

  1. Troubleshooting: Identify, diagnose, and fix any issues within the cloud environment.
  2. Optimization: Regularly tweak and optimize cloud resources to ensure efficient performance.
  3. Security: Implement security best practices, regularly patching and updating systems.

Step Into the Shoes of a Support Engineer

For a better understanding of everything we've covered so far, let's look at a real world example of a support engineer working through a case that was opened by a customer.

Case Study

Addressing a performance problem on Ubuntu 18.04

Initial Email from the Customer

Subject: Urgent Performance Issue on Ubuntu 18.04 Instance!

Hello Support,

We've been experiencing noticeable performance degradation on our Ubuntu 18.04 instance for the past few days. The slowdown seems to occur during peak times, causing our applications to lag. We haven't made any recent changes, and the logs don't reveal much. Can you help us pinpoint the issue and provide a solution?

Thanks,
John Smith
Lead SysAdmin, Redacted Corp.


Analysis by the Cloud Support Engineer

After assigning the case to herself, the Cloud Support Engineer (let's call her Sarah) begins her diagnostic journey.

First, she checks basic metrics such as CPU usage, RAM utilization, and disk I/O. While CPU and RAM seem fine, there is an irregularity in disk I/O. The disk is being continuously written to, which causes other processes to queue up, and likely leading to the performance problems the customer has described.

Sarah suspects some rogue processes or tasks might be causing the excessive disk writes. To investigate, she decides to run a BASH script that monitors the disk activity for the top processes.

#!/bin/bash

echo "Top processes causing high disk I/O:"
iostat -xz 2 10 | sed -n '7p' | awk '{ print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13 }' > /tmp/io_stat.log

for pid in $(ps -eo pid)
do
    proc_name=$(ps -p $pid -o comm=)
    io_stat=$(cat /proc/$pid/io | grep "write_bytes:" | awk '{ print $2 }')
    if [[ $io_stat -gt 1048576 ]]; then
        echo "$proc_name ($pid) - Write Bytes: $io_stat" >> /tmp/io_stat.log
    fi
done

sort -k5 -n -r /tmp/io_stat.log | head -10

BASH Script for Monitoring Disk Activity

This script first captures the overall disk statistics, then lists the top 10 processes causing the highest disk writes.

Script Output

Top processes causing high disk I/O:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.50    20.50    2.50   58.50     0.06     1.50    54.48     2.09   34.29    8.00   36.09   2.27  14.00

Detailed Processes:

customLogger (10345) - Write Bytes: 1049002752
rsyslogd (978) - Write Bytes: 25089920
mysqld (1103) - Write Bytes: 20965760
journald (450) - Write Bytes: 15728640
cron (1110) - Write Bytes: 5242880
sshd (1234) - Write Bytes: 1048576
systemd (1) - Write Bytes: 1048576
apache2 (2010) - Write Bytes: 1048576
NetworkManager (980) - Write Bytes: 1048576
polkitd (1056) - Write Bytes: 1048576

Findings

After running the script, Sarah discovers that a particular logging service is continuously writing logs, and consuming a significant amount of I/O operations. This is not a default service but something custom that Redacted Corp must have configured.

Resolution

Sarah provides Alex with her findings and recommends either configuring the logging service to a reduced logging level or directing the logs to an external storage solution. Alex realizes that verbose logging was set accidentally during a previous troubleshooting exercise. After adjusting the logging level, the performance is restored to its optimal state.


Conclusion

The journey from a manual, physical server-based infrastructure to cloud environments is nothing short of revolutionary. But the array of benefits the cloud brings comes with its own set of challenges. The case study in this post shows the vital role of Cloud Support Engineers in ensuring optimal performance in cloud environments.