What planning steps can I take when upgrading my Amazon EMR cluster?

6 minute read
0

I need to plan an Amazon EMR upgrade to keep pace with rapidly changing technology.

Short description

To keep up with the rapid changes in big data, you must upgrade your version of Amazon EMR. Migrating to a new version of Amazon EMR improves the operational excellence and efficacy of your workload. However, before you upgrade Amazon EMR, you must plan and prepare. There's information that you must review and procedures that you must follow.

Benefits of Amazon EMR version upgrades

Benefits of upgrading Amazon EMR include:

  • Increased productivity and lowered costs by leveraging the newest features.
  • Updated applications run faster.
  • Up-to-date bug fixes provide a stable infrastructure.
  • The latest security patches strengthen security.
  • Up-to-date access to open-source software features.

For example, with Amazon EMR version 6.6 and later, Log4j 1.x and Log4j 2.x are upgraded to Log4j 1.2.17 and Log4j 2.17.1 (or higher), respectively. In the higher versions, bootstrap actions aren't required to mitigate common vulnerabilities and exposures (CVEs).

Resolution

Amazon EMR performance optimization features for open-source applications

Amazon EMR offers performance optimization features for many open-source applications.

Spark:

Delta Lake:

Flink:

Hadoop:

HBase:

HCatalog:

Hive:

Hudi:

Iceberg:

Presto and Trino:

Planning for Amazon EMR version upgrades

Follow these steps to prepare for an Amazon EMR version upgrade:

  1. Research the issues that you're facing in your current Amazon EMR version.
  2. Isolate a small subset of applications or queries that you want to use to test your EMR cluster's performance.
  3. Set up an A/B testing strategy to decide the Amazon EMR version that's best for your solution. In A/B testing for Amazon EMR, you test two different versions of the service to compare how they perform in your environment.
  4. Gradually migrate the workload to the new version of Amazon EMR. If you discover major problems on the production version of Amazon EMR, you can end the migration process here.
  5. After migration is complete, terminate the old Amazon EMR cluster.

Fixing issues related to Amazon EMR version upgrades

Follow these steps to fix issues that you encounter when upgrading your Amazon EMR version:

  1. Reconfigure the application. Observe whether or not the changes improve your application's performance.
  2. Check if issues have been resolved by a newer version of the application.
  3. Change the application or queries to see if you can avoid issues.
  4. Check open defects and workarounds to improve the application. Contact AWS Premium Support to find out if there's a workaround.
  5. Stop the Amazon EMR migration until the issue is fixed or a workaround exists.

Considerations for Amazon EMR version upgrades

When you upgrade your version of Amazon EMR, performance regression might cause issues. Upgrades might change the API, which might affect your code's ability to run on a newer interface. Application slowness and failures might occur after an Amazon EMR version upgrade.

When you're thinking of upgrading your version of Amazon EMR, it's a best practice to read the What's new? section of the release guide. The What's new? section includes information about Amazon EMR release versions and dates, along with solutions to common issues with open-source applications.

Research open-source application changes and outstanding issues

Check the following release notes and open defects before deciding to migrate to a new Amazon EMR version. The following list of applications are based on Amazon EMR version 6.9.

Note: These hyperlinks take you to the third-party application websites, GitHub, or the Apache website.


AWS OFFICIAL
AWS OFFICIALUpdated a year ago