Nearly 2.5 petabytes of data from 1 million customers every hour. This is the kind of data Walmart, the U.S. retail giant generates daily. But ever wondered how this massive information is being stored?

Well, the Cloud initiative is one aspect the retail giant has been using for years now. Such massive volumes of data are being stored in small data centers the company uses to boost their business.

In recent years, even small organizations have started generating large volumes of data. Thus, making use of such information has become critical for every organization. This is where data science comes to play.

A 2020 Kaggle survey demonstrated how significant it is to cover diverse topics like the cloud, cloud computing, and machine learning used in the data science industry.

We will further look at how popular cloud computing platforms, services, and products are amongst data science and machine learning (ML) professionals. The results are in the context of the survey conducted by Kaggle. Below are certain factors covered during the survey:

Cloud computing products used

While talking of cloud computing engines, the Amazon EC2 is found to be among the popular engines compared to Google Computing Engine and Azure Cloud Services. Whereas in terms of cloud functions, AWS Lambda is said to be a popular choice. And for cloud container runners, Amazon Elastic Container Service rules them all.

But Google is not far behind, it still stands second in position in cloud computing engine and cloud function segment and taking third place for the cloud container runner segment.

Here’s how the user roles are categorized based on the job category:

  • Data scientists
  • Software engineers
  • ML engineers
  • Data analyst

These products were categorized based on the programming experience of the candidate.

Image source: Kaggle survey 2020

The image above is an indication of the type of cloud products a cloud user i.e. data scientist uses based on the programming experience.

  • 3-5 years’ experience
  • 5-10 years’ experience
  • 10-20 years’ experience

Also, most of the juniors and the super seniors with 20+ years of experience in programming seem to have less knowledge covered in these areas.

Cloud platforms used

The modern data science industry requires the candidate to have the relevant skillset as there is a shortage of such skills. Since most companies have started investing heavily in ensuring their data stays in the cloud, the companies will also need a candidate with an exquisite skillset of the same nature.

The leader of the ML cloud product usage is Google Cloud AI platform or Google Cloud ML Engine while positioning Azure Machine Learning Studio and Amazon SageMaker to be the third-best cloud product usage.

However, for every cloud ML product that goes under investigation, the data scientists are said to be the top users.

How the cloud ML products are used by organization size?

Not every organization in every size category used cloud ML products. However, the below information might allow us to identify the tiny fraction of organizations using these products.

  • 0-49 employees (small organizations) – Google Cloud ML Engine and Google Cloud AI Platform.
  • 50-249 employees (middle-sized organizations) – Amazon SageMaker, Google Cloud ML Engine otherwise as Google Cloud AI Platform.
  • More than 250 employees (large organizations) – in such cases it is most likely for the data science team to split. For instance, smaller teams stick to working with Google Cloud AI Platform / Google Cloud ML Engine while a team with more than 20 data scientists is much more comfortable with Amazon SageMaker.

The Kaggle survey has also provided brief information related to big data usage based on patterns like programming experience and user experience.

These trends are often signs determining the relevant tools and technologies that are likely to rise in the upcoming year.