How to set up and run a small business

Leaderboard – Run – Technology

You are here: Home » Run a Business » Technology » 10 Data Science terms every business leader needs to know

10 Data Science terms every business leader needs to know

April 4, 2018

Key data science termsThere is a lot of talk about data science, big data, AI, and IoT these days, but what is the reality behind the hype? What do these terms actually mean, and what impact could these have on your business?

Let’s take a look at 10 of the highest profile tech terms affecting businesses in 2018;

1. Data Science

Data Science is a very general term used for many modern business applications of data, technology and analytics.

It generally involves collection and processing of a wide range of data: customer, marketing, web, financial, third party data, etc. After collection, the data is either analyzed for meaningful insights or used in recommender algorithms, fraud detection, churn prediction or any of several dozen such applications.

Data scientists generally use techniques borrowed from statistics, numerical optimization or machine learning, implemented using programming languages such as R, python, Java, SAS, or C/C++.

2. Big Data

Big Data is data that is abnormally large, fast moving, or diverse. In general, it’s data which technologies of the 20th Century were poorly equipped to handle.

Examples of big data include searches over internet websites (1.3 billion and growing), videos stored on YouTube (100’s of hours of video uploaded each minute), and data processed at the CERN particle physics research laboratory (150 million sensors delivering data from experiments 30 million times per second).

For today’s businesses, the most relevant types of big data might include web analytics (accumulating at gigabytes or even terabytes per day), video data and IoT sensor data (see below).

3. Machine Learning (ML)

Machine Learning (ML) is the head-line grabber these days. It is when a program self-improves by continuously learning from training data. An example is an image recognition program trained to recognize cats by being shown pictures labeled as containing or not containing a cat. The more pictures used for training, the more accurate the program (hopefully) becomes.

Machine learning has applications in many different areas, including Google’s smart reply feature, which gives Gmail users several recommended replies after each email. These are based on what Gmail has learned by reading millions of earlier responses to similar emails.

Most of the hype around AI these days is related to machine learning. Small businesses can quickly tap into certain advanced ML technologies by using pay-per-use offerings from companies such as Google and Salesforce’s Einstein.

4. Artificial intelligence (AI)

Artificial Intelligence (AI) is a general term for a machine that can respond intelligently to its environment. Much of machine learning is also considered to be AI, and many people use the terms interchangeably.

To illustrate the distinction, consider the IBM computer Deep Blue, which in 1997 bested the reigning world chess champion. Deep Blue used a combination of massive computing power and user-supplied playing rules. If the computer seemed to be doing something wrong, the programmers would reprogram its playing strategy between games.

Deep Blue played chess in a way that was considered to be artificial intelligence but not machine learning. It didn’t learn by itself.

However, when the program Alpha Go beat the world champion in the game of Go nineteen years later, the program had taught itself to play so well by playing against itself over and over and over again.

In the end, even its own programmers didn’t understand why it made some of its winning moves. This was machine learning.

5. Cloud Computing

Cloud Computing involves renting space or running applications on a remote computer. Amazon allowed users to rent digital storage space in its data centers starting in 2006, but had been running its applications as Software-As-A-Service, a form of cloud computing, since 1999.

Now X-as-a-Service offerings are everywhere, providing anything from hardware to platform to software (including Gmail).

6. Open-Source Software

Open-Source Software is software made freely available for use and modification (subject to some restrictions). One of the largest repositories of open-source software is the Apache Foundation, created in 1999.

Apache maintains much of the big data software used today, including Hadoop, Spark, and Kafka. Open-source has been extremely valuable in helping companies get up and running with data science and big data applications.

7. Deep Learning

Deep Learning is a powerful machine learning method which extends a method dating back to the 1950s.

Deep Learning uses carefully constructed networks of simple building blocks trained on massive amounts of data. These are then trained to do specialized tasks such as labeling images, playing games, or processing natural language.

8. Web Analytics

Web Analytics is the collection and analysis of the actions of visitors to your web sites and mobile applications. Because such a large portion of customer interaction takes place in a digital setting, web analytics plays a very important role in modern data science.

When we step past basic web analytics interfaces and APIs and begin collecting raw clickstream data, we are entering the world of big data and opening new opportunities to draw deeper insights into customer actions and product performance.

9. Data Warehouses

Data warehouses are centralized data bases carefully constructed to allow companies to draw the most value from their data.

A data warehouse will collect data from multiple operational data bases (e.g. finance systems, marketing efforts, web analytics data, etc.) and make it easy to link this data and ask holistic questions, such as how marketing efforts link to online activity, and subsequent sales.

10. The Internet of Things (IOT)

The Internet of Things refers to the billions of connected processors and sensors that are spread out in our cars, household devices, field equipment, industrial machines, etc.

The IoT makes it possible for us to gather tremendous amounts of real-time data, harvest insights and improve industrial and business operations.

Two key applications are predictive maintenance of machines and detailed monitoring of customer activity (e.g. for insurance or healthcare applications).

Final thoughts

Which of these ten terms will be most important for your business? For the smallest businesses, perhaps only one or two, with the others retaining their use primarily for cocktail-party conversation.

However, be aware that the rapid growth of cloud offerings, open-source software and easily available AI offerings has enabled almost any business to move into almost any of the above domains with minimal effort.

Once you have identified the relevant application for your business, you can start to move more quickly than you might have realized.

About the author

This guide has been written exclusively for ByteStart by David Stephenson PhD, an internationally recognised expert in the data science and big data analytics. He is the author of new book Big Data Demystified: How to use big data, data science and AI to make better business decisions and gain competitive advantage.

More from ByteStart

ByteStart is packed with help and tips on all aspects of starting, growing and funding your business. Check out some of our most popular guides;


Funding your business