
AWS Announces Tools To Manage Machine Learning Workloads
Amazon Web Services (AWS), in an effort to reduce the time, compute resource and cost of training machine learning models, announced new tools at Re:Invent this week.
The company is currently preparing new GPU instances for large-scale machine-learning training. The eight NVIDIA Tesla Tesla V100 GPUs will power the upcoming P3dn.24xlarge instances. Each has 32GB of storage and can support up to 100Gbps networking throughput. They will also be equipped with 96 Intel Xeon Skylake vCPUs.
According to AWS, “The faster networking, new processors and doubling of GPU memory allow developers to significantly reduce the time it takes to train their ML model or run more HPC simulations. They can scale out their jobs across multiple instances (e.g. 16 or 32 instances)” “[I]nside increasing the throughput for passing data between instances the additional network throughput P3dn.24xlarge can also be used to accelerate access to large amounts training data by connecting with Amazon S3 and shared file system solutions such as Amazon EFS.”
AWS also announced that it has made improvements to TensorFlow to make it more efficient for developers using the platform. This will allow them to scale across multiple GPUs and make machine learning training tasks more resource-efficient. These changes are now generally available.
AWS announced that the new AWS Optimized TensorFlow improves the distribution of training tasks across the GPUs. This allows for close to linear scaling when training multiple types neural networks (90 percent efficiency across all 256 GPUs compared to the previous norm of 65 percent).
Amazon Elastic Inference is another new service that’s now available. Inference is the process by which a machine-learning model makes predictions about new data using the information it has learned in its earlier training stages. According to AWS, inference can require significantly more resources and cost more than training. According to AWS, inference offers developers more options to control the amount of compute power they purchase, potentially cutting down on spending by up to 75 percent.
The company stated that instead of running on a large Amazon EC2 P2 and P3 instance with low utilization, developers could run on a smaller, general-purpose Amazon EC2 instance to provision just the right amount from Amazon Elastic Inference. Developers can increase or decrease inference performance by simply increasing or decreasing the TFLOP. They only pay for what they use.
AWS Inferentia is a new inference processor that can handle heavier workloads. Inferentia, a dedicated inference processor, is due in 2019. It promises to be cost-effective with low latency as well as high throughput.
According to the AWS product page, “Each chip provides hundreds (tera operations per seconds) of inference throughput that allows complex models to make quick predictions.” Multiple AWS Inferentia chip can be combined to provide even greater performance and thousands of TOPS of throughput.