Big Data Developers


Exploring this Job

There are many ways to learn more about education and careers in big data, including many hands-on activities. First, learn the basics of big data by reading books such as Data Science For Dummies, by Lillian Pierson and viewing glossaries of big data terms online. Additionally, the data science firm SAS offers a primer about the field at its Web site, www.sas.com/en_us/insights/big-data/what-is-big-data.html. Next, consider taking a class about data science, data analytics, programming, artificial intelligence, or a related field at your high school, local community college, or via an online learning platform such as Coursera, edX, Udacity, or Udemy. Some classes are even free. For example, Coursera offers Introduction to Data Science, a free online class that takes about one month to complete.

Consider participating in computer science–related summer camps that are offered by colleges and universities, high schools, park districts, museums, private tech organizations, and other groups. The National Student Leadership Conference offers a nine-day summer data science program each summer. Learn more at https://www.nslcleaders.org/dates-locations. The Michigan Institute for Data Science at the University of Michigan at Ann Arbor offers a five-day, residential camp for upper-level high school students who are interested in mathematics, programming, social networks, and social media, and who ideally have completed coursework in trigonometry. You can learn more at https://midas.umich.edu/camp.

Participate in open data science, programming, and related competitions for high school, students, college students, and professionals. Here are a few competitions to check out:

  • Kaggle: https://www.kaggle.com
  • American Statistical Association (ASA) DataFest: http://ww2.amstat.org/education/datafest
  • Association for Computing Machinery Special Interest Group on Management of Data Student Research Competition: https://www.acm.org/education/student-research-competition

Participating in information interviews with and/or job shadowing a Big Data professional is another great way to learn the ins and outs of this career.

The Job

Tweets and other social media posts. Cell phone GPS signals. Clickstreams from an app or a Web site. Info from credit card transactions. Videos. E-mails. Data collected from the Internet of Things. These are just a few examples of the wealth of data that we generate each day and that is collected by companies, government agencies, and nonprofit organizations. Big data consists of large amounts of data that cannot easily be collected, analyzed, and managed. It is used in a wide range of fields, including banking and financial services, accounting, health care, medical research, agriculture, consumer products and services, astronomy, transportation, human resources, security, shipping, law enforcement, and the military.

There are two types of big data: structured and unstructured. The U.S. Department of Labor classifies structured data as “numbers and words that can be easily categorized and analyzed. These data are generated by things like network sensors embedded in electronic devices, smartphones, and global positioning system devices… [and] sales figures, account balances, and transaction data. Unstructured data include more complex information, such as customer reviews from commercial Web sites, photos and other multimedia, and comments on social networking sites. These data cannot easily be separated into categories or analyzed numerically.”

There are five main qualities of data—called “The 5 Vs”:

  1. Value: The usefulness of the data
  2. Variety: The various types of data
  3. Velocity: The speed at which the data is created
  4. Veracity: The trustworthiness of the data
  5. Volume: The size of the data

There are two main areas of big data: data analytics and data science. No one can agree on universal definitions for each field, but data analytics involves the actual acquisition, organization, and analysis of data to meet a variety of goals, while data science focuses on the development of new types of data analytic methods by tapping increased computing power and using algorithms, predictive models, and other methods.

People with a variety of educational backgrounds and skill sets work in big data. These different professionals can be classified as big data developers even though they may follow different career paths.

Data processing technicians collect, clean, and prepare data for analysis. This process is known as data cleaning or data cleansing. Many people begin their careers in big data by working as data processing technicians.

Data analysts study various data sets to provide answers to questions posed by their employers. For example, they may be asked to assess data on customer web traffic to obtain a better understanding of customer demographics or buying preferences for a specific demographic group. The career of data analyst is often an entry-level job, but not always. Business intelligence analysts are specialized data analysts that study and identify patterns in data in order to produce financial and market intelligence for companies.

Database administrators, who are also known as data warehousing specialists, manage databases that store large amounts of data. They make sure that databases are operating correctly and can easily be accessed by users, backup and restore data to prevent data loss, modify the database’s structure when needed, and otherwise ensure that the database (or groups of databases) operates effectively.  

Data architects design and construct large relational databases, integrate new databases with existing data warehouse structure, and conduct tests to assess and improve system performance and functionality.

Data engineers build pipelines that transform data into formats that data scientists can use. Their duties vary based on their employer. They may perform data wrangling (making data easier to use), create and translate algorithms (a set of instructions that allows a computer to perform a specific task or group of tasks) into prototype code, create ways to more effectively gather and study data, and develop automated systems that are powered by artificial intelligence to retrieve and analyze data. Artificial intelligence is a field of computer science in which machines can be programmed to perform functions and tasks in a “smart” manner that mimics human decision-making processes. A subset of AI is machine learning, in which computers are taught to study data, identify patterns or other strategic goals, and make decisions with minimal or no intervention from humans. Data engineers may also be known as software developers or software engineers.

Data scientists write algorithms that are used to detect and analyze patterns in very large datasets with a goal of solving problem—such as analyzing infection rates during an epidemic or looking for patterns in traffic accident data to help planners prevent or reduce accidents. They also build machine-learning models and make predictions about the future based on past data. Depending on the employer, the duties of data engineers and data scientists often overlap.