Blog

  • Enhance reliability through robust security

    Use security controls and design patterns to stop attacks and bugs from overloading the system or locking people out.

    This approach helps keep the system up and running, even if someone tries to take it down with something like a distributed denial of service (DDoS) attack.

    Contoso’s challenge

    • The workload team and the workload’s stakeholders know that this system must be extremely reliable because hotel guests rely on it for both business and leisure travel. If it goes down, hotels can’t run properly.
    • The team has put a lot of effort into testing functional and nonfunctional requirements to make sure the system works well and stays operational, including using safe ways to roll out updates.
    • They’ve focused on keeping things reliable, but they haven’t paid as much attention to security. A recent update had a bug that hackers took advantage of, crashing the system for several hotels. The attack overloaded servers in one region for over four hours, causing major problems for guests and staff.
    • The attacker used the app’s servers to sneak in requests to a regional storage system and pull up fake folio data. One of those fake folios was huge and caused the servers to run out of memory. Then, when users tried again, it spread the problem to all the servers.

    Applying the approach and outcomes

    • The team changed the design so the app servers no longer handle folio requests directly. Instead, they’re using a Valet Key approach to limit access. This approach wouldn’t have stopped the attack completely, but it would have kept the damage contained.
    • They also added better input checks to clean up anything suspicious before it reaches the system.
    • With stronger input filtering and a smarter design, they’ve reduced the risk of this kind of attack happening again.

    Proactively limit attack vectors

    Set up controls ahead of time to block common ways that attackers try to break in, like bugs in your code, weak network setups, or missing antivirus.

    Regularly scan your code, install security updates, keep software current, and run antivirus tools. These practices help reduce the ways that attackers can get in, and they help keep things running smoothly.

    Contoso’s challenge

    • The system runs on Azure VMs (virtual machines) that use the latest Ubuntu images from Azure Marketplace. When each VM starts up, it installs some certificates, adjusts a few SSH settings, and loads the app code. But it doesn’t use any antivirus or anti-malware tools.
    • Azure Application Gateway fronts the solution, but it’s only used as an internet gateway. The web application firewall (WAF) function isn’t enabled currently.
    • These choices leave the system exposed to potential risks, like vulnerabilities in the code or accidental malware installs.

    Applying the approach and outcomes

    • After talking with the security team in Contoso, the VMs are now enrolled in an enterprise-managed antivirus solution.
    • The team also enables and fine-tunes the WAF function to block risky traffic, like SQL injection attempts, before it even reaches the app.
    • Both the app and its platform now have stronger layered defenses to help keep the system stable and secure.
  • Do threat modeling to find and resolve potential threats

    Analyze each part of your workflow and consider what could go wrong. Use an industry-standard methodology to classify the identified threats.

    Threat modeling helps you find and fix security threats before they become real problems. Analyzing your workload helps you put together a report that shows which attack paths are the most serious and helps you quickly find weak spots.

    Contoso’s challenge

    • Even though they haven’t had a security problem yet, the workload team doesn’t have a clear way to check if all possible threats are covered by their current security setup.
    • They realize that there’s a gap in their security, and if something goes wrong, they might not be ready.

    Applying the approach and outcomes

    • The team brings in a security consulting specialist to learn how to do threat modeling.
    • After their first exercise, they find that they have well-designed controls for most threat vectors, but there are some gaps:
      • One problem was in a data cleanup task that runs after Apache Spark jobs. It had two insider threat risks for data leaks.
      • An old system used by a race team that’s no longer active still had access to sensitive race data.
    • They’ve scheduled fixes for the next development cycle, including shutting down the old system.
  • Test controls yourself

    Have security experts try to ethically hack your system occasionally to find weak spots. Regularly scan your infrastructure, code, and tools to catch any vulnerabilities before they become real problems.

    Running security tests that mimic real-world attacks, like penetration testing, helps you see if your defenses actually work.

    Threats can sneak in during updates or changes, so it’s smart to build vulnerability scanners right into your deployment process. That way, you can catch problems early and even block risky code from going live until it’s fixed.

    Contoso’s challenge

    • The threat modeling exercise helped the team find some gaps in their security setup. Now they want to make sure their fixes are strong and that nothing was missed.
    • They’ve used open-source tools to test security and found it fun and useful. However, the team and stakeholders want to bring in security professionals to do thorough and rigorous testing regularly.

    Applying the approach and outcomes

    • The team contacts a well-known Microsoft partner that specializes in cloud security to talk about penetration testing.
    • The workload team signs a Statement of Work for quarterly penetration testing, including one white-box test each year for extra confidence.
    • The consulting team also helps the development team install anti-malware on dev boxes and the self-hosted build agents.
    • Now, both the team and stakeholders feel a lot more confident that they’re ready for potential threats.
  • Get current, and stay current

    Ensure that your systems always run the latest updates and security patches. Keep checking how things are working by using audit reports, benchmarks, and test results to spot areas to improve. Consider automation where possible. Use smart threat detection tools that can spot problems as they happen. And every so often, check that your setup still follows Security Development Lifecycle (SDL) best practices.

    Keeping your security strong takes ongoing effort. By learning from real-world attacks and test results, you can stay ahead of attackers who are always finding new ways to break in. Automating repetitive tasks also helps reduce human mistakes that could create risks.

    SDL reviews bring clarity around security features. They also help you keep track of your workload’s assets and their security reports, which cover where they came from, how they’re used, and any weak spots they might have.

    Contoso’s challenge

    • The developers that write the Apache Spark jobs are hesitant to make changes. They don’t think that it’s necessary. But this means that the Python and R packages they bring into the solution are likely to get stale over time.

    Applying the approach and outcomes

    • After the workload team reviews internal processes, they realize that if they don’t keep the Apache Spark jobs up-to-date, they could end up with unpatched components in their system.
    • The teams use a new standard for the Apache Spark jobs that all technologies in use must be updated, along with their regular update and patch schedules.
    • This method helps close the security gap and lowers the risk of the entire workload running outdated software. Plus, their PaaS and SaaS services help limit their exposure to this risk because they don’t have to patch underlying infrastructure.
  • Optimize the security of your backups

    Make sure your backups are encrypted and can’t be changed after they’re saved, especially when they’re being moved or copied.

    When you adopt this approach, if you ever need to recover data, you can trust that the backup wasn’t tampered with, either by accident or on purpose.

    Contoso’s challenge

    • Contoso generates the Environment Protection Agency emissions report every month, but they only need to submit it three times a year.
    • They store the report in an Azure Storage account as a backup, just in case something goes wrong with the main system.
    • The backup report isn’t encrypted and is sent over HTTPS to the storage account.

    Applying the approach and outcomes

    • After doing a security gap analysis, the team realizes that the unencrypted backup is a risk.
    • They now encrypt the report and store it in Azure Blob Storage by using the write-once, read-many (WORM) setting, which keeps the file from being changed.
    • They also add a check. The system now compares a Secure Hash Algorithm (SHA) hash of the report with the backup to make sure nothing is altered.
  • Defend your supply chain

    Make sure your tools, libraries, and build systems are safe from tampering. Scan for vulnerabilities during builds and while things are running.

    Knowing where your software comes from and checking that it’s legitimate throughout the life cycle helps you catch problems early and fix them before they reach production.

    Contoso’s challenge

    • The engineering team is setting up their build and release pipelines, but they haven’t made sure the build system is secure or reliable yet.
    • They’re using some open-source tools in both their firmware and cloud systems.
    • They’ve heard how supply chain attacks or insider threats can sneak in bad code that could mess with systems or leak data. If their customer’s environmental reporting gets compromised, it could be a huge problem for both Contoso and the customers.

    Applying the approach and outcomes

    • The team updates their build processes for both firmware and back-end cloud systems to include security scans for common vulnerabilities and exposures (CVEs) and malware in dependencies, code, and packages.
    • They also look at anti-malware options for their Azure Stack HCI setup, such as Windows Defender Application Control.
    • These steps help make sure the software and firmware that they ship doesn’t do anything unexpected, and that their customers’ reporting stays accurate and secure.

    Employ strong cryptographic mechanisms

    Use strong cryptography, like encryption, certificates, and code signing, to build trust. Make sure only trusted sources can decrypt these mechanisms.

    When you adopt this approach, only trusted sources can access or change your system and data.

    Even if someone intercepts encrypted data, they can’t read it without the right key. And digital signatures help confirm that nothing was tampered with along the way.

    Contoso’s challenge

    • The devices that they chose for sensing and data transfer don’t have enough processing power to support HTTPS or custom encryption.
    • The workload team plans to use network boundaries as their primary isolation technique.
    • A risk review flagged that unencrypted communication between IoT devices and control systems could be a big problem. Just segmenting the network isn’t enough.

    Applying the approach and outcomes

    • They worked with the device manufacturer to upgrade to a more powerful model. The new devices support certificate-based communication and can verify signed firmware before running it.
  • Apply encryption at every step of the data life cycle

    Use encryption to protect your data, whether it’s in storage, moving across the network, or being processed. Base your encryption strategy on how sensitive the data is.

    By following this approach, even if someone manages to get access, they can’t read anything without the right keys.

    Sensitive data includes configuration information that’s used to gain further access inside the system. Data encryption can help you contain risks.

    Contoso’s challenge

    • Contoso Rise Up backs up each PostgreSQL database by using the built-in point-in-time restores. To be safe, they also make a daily backup that’s consistent and store it separately in a storage account.
    • The disaster recovery storage account is restricted with just-in-time access and only a few Microsoft Entra ID accounts can access it.
    • During a recovery drill, an employee tried to access a backup and accidentally copied the backup to network share in the Contoso organization.
    • A few months later, this backup was discovered and reported to Contoso’s privacy team. They did a full investigation into how it was accessed and what happened to it up to the time when the incident was discovered. Luckily, no sensitive information was exposed, and the file was deleted after they finished their investigation and audit.

    Applying the approach and outcomes

    • The team now has a clear rule that all backups must be encrypted at rest by using Azure Storage Service Encryption. And the encryption keys must be secured in Azure Key Vault.
    • Even if a backup ends up somewhere it shouldn’t, the data inside it is useless without the decryption key. So a privacy breach is much less likely.
    • The disaster recovery plan now includes standard guidance about how to properly handle backups, including how and when to safely decrypt a backup.
  • Strictly limit access

    Only give access to people who really need it, and only for as long as they need it.

    Even trusted users shouldn’t have open-ended access. Keep permissions tight and time-limited, so the system stays protected from misuse or mistakes.

    Contoso’s challenge

    • Contoso Rise Up is known for great customer support. To help troubleshoot quickly, the support team has full access to customer data.
    • The support team is regularly trained on ethical access.
    • Unfortunately, one upset employee broke that trust. They copied and publicly shared a donor list. The person was fired, but the damage to Contoso Rise Up’s reputation was already done.

    Applying the approach and outcomes

    • Contoso Rise Up strictly grouped users in Microsoft Entra ID and set up role-based access (RBAC) to control who can access what.
    • All data access now requires approval, is time-limited, and gets logged.
    • These rules apply across the workload and customer support teams, so there’s no more standing access to customer data.

    Identify confidential data through classification

    Figure out what kind of data you have, how sensitive it is, and what could go wrong if it got out. Label the data accordingly so that you can apply the right level of protection where needed.

    This evaluation helps you rightsize security measures. You can also identify high-risk data and components that might affect your workload or be exposed. This exercise helps get everyone on the same page about how to handle different types of data.

    Contoso’s challenge

    • The donor management system stores many different types of data:
      • Internal information like Contoso Rise Up’s customer list
      • Customer-owned data like donor lists
      • Donor-specific data like mailing addresses
      • Nonsensitive data like stock images and document templates
    • The workload team hasn’t classified the data. They’ve applied security broadly across the dataset.

    Applying the approach and outcomes

    • The workload team follows Contoso’s data classification guidelines and flags data stores, columns, storage accounts, and other storage resources with metadata to indicate the type and sensitivity of the data.
    • This activity helps make sure that each level of sensitive data is properly handled throughout the entire system, including logging statements and backups.
    • The team finds relatively sensitive data in a lower security database and nonsensitive data in a higher security database. They’re reorganizing the data to match security levels with the data type.
    • They also plan to use data masking on key fields to better protect data confidentiality, so even authorized users only see what they need.
  • Respond to incidents efficiently

    Make sure there’s an incident response plan for your workload. Use industry frameworks that define the standard operating procedure for preparedness, detection, containment, mitigation, and post-incident activity.

    During a crisis, avoid confusion by having a clear security incident response plan. Responsible roles can focus on execution without wasting time on uncertain actions. A comprehensive plan helps you meet remediation requirements.

    Contoso’s challenge

    • The workload team is setting up retailer support channels, customer support channels, and technical on-call rotations for support escalations and outages.
    • They haven’t addressed security specifically and don’t know what Contoso offers for support.

    Applying the approach and outcomes

    • The workload team works with the Contoso security team to understand compliance requirements for handling personal data from both an organization perspective and external compliance perspective.
    • The team builds a security detection, mitigation, and escalation plan, including communication for incidents.
    • The team now feels just as comfortable with security incident preparedness as they do with their reliability support. They plan to practice handling incidents before they go live.

    Codify secure operations and development practices

    Set clear team-level security standards for your workload’s life cycle and operations, including how to write code, approve changes, release updates, and handle data.

    Having robust security habits helps avoid mistakes and keeps things running smoothly. When everyone follows the same approach, it’s easier to stay on track and work efficiently.

    Over time, sticking to these standards helps you spot ways to improve and maybe even automate steps to save time and boost consistency.

    Contoso’s challenge

    • After getting ready to handle incidents, the team realized they need to invest in preventing problems before they happen.
    • They don’t have a specific secure development process yet. They plan to reuse processes that they used on past projects.

    Applying the approach and outcomes

    • This workload doesn’t store highly sensitive data like credit card information, but the team still treats their customers’ data with care. They’re aware of local and federal regulations that must be followed for the types of data that they store.
    • The team invests in learning about current industry-standard secure development and operations practices and starts using measures that they hadn’t used before.
    • The team also shares their learnings with the Contoso security team so that everyone across the company can benefit from the improvements.
  • Optimize security through segmentation

    Use segmentation to plan security boundaries in the workload environment, processes, and team structure to isolate access and function.

    Base your segmentation strategy on business needs, like the importance of components, division of labor, privacy concerns, and other factors.

    To reduce operational friction, define roles and responsibility clearly. This exercise helps you identify the level of access for each role, especially for important accounts.

    Isolation limits exposure of sensitive flows to only roles and assets that need access. Too much exposure can lead to information leaks.

    Contoso’s challenge

    • In the spirit of simplicity, the team has historically favored low overhead approaches. These approaches have included grouping components and organizing individuals into security groups to simplify access management.
    • A QA intern had broad access because of their security group membership. Unfortunately, their account was compromised in a social engineering attack.
    • This attack compromised the confidentiality of that deployment and all other deployments on the same application platform.

    Applying the approach and outcomes

    • Luckily, the compromised environment was just an early test prototype for the new customer loyalty program, so no production systems were affected.
    • The security team plans to invest time and money to isolate components that handle personal data, like addresses and emails, from components that don’t, like coupons. They’ll design access controls that are need-to-know and just-in-time (JIT) where possible, and isolate networks within the workload and back into Contoso to protect the organization.
    • Segmentation helps contain the impact of a compromise.