How it works?
EMR ClusterScaler is defined per EMR Job-Flow. In fact it enables you to add 1..N task nodes, from multiple instance-types, and lets you to define CloudWatch rule for scaling Up/Down Task nodes.
For example, you can connect EMR ClusterScaler to an existing EMR cluster which runs 1 master node, 20 Core-Nodes, and 5 Task-Nodes.
- Selecting your target EMR Job-Flow Id
- Defining the amount of Task Nodes, the instance types and sizes
- Define the CloudWatch rules for Scale Up and Scale Down
- It can be one of your EMR Metrics such as RemainingMapTasksPerSlot
- It can be per instance metric, for example- CPU Usage, Network bandwidth etc..
Once everything is configured, EMR ClusterScaler launches the desired amount of Task Nodes into your Job-Flow Id. and will automatically scale based on the metrics provided.
ClusterScaler makes an efficient and sophisticated use in the Spot Market, by launching Spot Instances as much as possible, to lower the cluster costs, and improve the ROI.
Spotinst ensures that your capacity won’t drop from your user-set minimum, by making a use in our in-house prediction algorithm in the Spot market. When Spotinst scales down resources, it scales first from “risky” Spot capacity, and always scales up from the most available Spot slots in AWS.
So, Autoscaling for Map Reduce application was around the corner for a long time. We are so happy that we have the chance to help cloud customers to make more efficient use with their resources.