Data Scientists
Requirements
Education and Training Requirements
High School
Take as many computer science classes as possible—especially those in programming, data analytics, and database management. Sign up for English and speech classes to build your communication skills. Mathematics (including statistics), history, social studies, and philosophy will help to develop your analytical and critical-thinking skills, which you’ll use frequently as a data scientist.
Postsecondary Education
To work as a data scientist, you’ll typically need a minimum of a master’s degree in data analytics, data science, computer science, mathematics, or statistics, although some people have degrees in engineering, economics, econometrics, business, operations research, neuroscience, or computational psychology. Some employers require applicants to have a doctorate in one of the aforementioned fields. In 2019, 94 percent of data scientists who were surveyed by Burtch Works (a data science and analytics recruiting firm) had an advanced degree. Forty-seven percent held a master’s degree, and 47 percent held a doctorate.
Typical classes in a data science or data analytics degree program include Introduction to Data Science, Statistical and Mathematical Methods for Data Science, Machine Learning and Computational Statistics, Big Data, Inference and Representation, and Capstone Project in Data Science.
Certification
Many colleges and universities offer undergraduate and graduate certificates in data science, data analytics, business intelligence, and related fields. For example, Georgetown University awards a certificate in data science to those who have a bachelor’s degree or equivalent; have completed of at least two college-level math courses (e.g., statistics, calculus); are familiar with Python, a general purpose programming language used for data analysis; and complete the following courses: Foundations of Data Analytics and Data Science, Software Engineering for Data, Data Sources & Storage, Data Ingestion & Wrangling, Data Analysis I: Statistics, Data Analysis II: Machine Learning, Visual Analytics, and Applied Data Science. Harvard University, Syracuse University, Indiana University at Bloomington, and other colleges and universities offer similar programs. Contact schools in your area for more information.
Other Education or Training
Professional development opportunities are provided by associations, for-profit and nonprofit schools (such as Global Knowledge Training LLC, Coursera, edX, and Udacity), and information technology companies (such as IBM). For example, the Association for Computing Machinery offers more than 1,000 online courses on topics such as Big Data, business intelligence, data management, data mining, data visualization, data warehousing, database design, SAS, SQL, and business skills. The IEEE Computer Society offers online courses on management, project management, and other topics. The International Web Association provides a variety of classes including Intro to Programming Concepts, Introduction to C#, and Introduction to JavaScript. The American Mathematical Society, American Statistical Society, INFORMS, and DAMA International also offer continuing education opportunities. Contact these organizations for more information.
Certification, Licensing, and Special Requirements
Certification or Licensing
TDWI, a membership association for data science professionals, provides the certified business intelligence professional credential to applicants who meet the following requirements:
- have a bachelor’s or master’s degree in information systems, computer science, accounting, business administration, engineering, mathematics, sciences, or statistics
- have two or more years of full-time experience in computer information systems, data modeling, data planning, data definitions, metadata systems development, enterprise resource planning, systems analysis, application development and programming, or information technology management
- pass examinations.
The Institute for Certification of Computing Professionals offers several certification credentials for data scientists who meet educational and experience requirements and pass an examination, including certified Big Data professional, business data management professional, and certified data scientist. DAMA International provides the certified data management professional (at four skill levels) to those who meet experience and educational requirements and pass an examination. INFORMS offers the associate-certified analytics professional and certified analytics professional credentials to those who meet educational and other requirements and pass an examination. Contact these organizations for more information.
Experience, Skills, and Personality Traits
Entry-level data scientists must have one to three years of experience working with large datasets, utilizing databases, and using general-purpose programming and statistical modelling languages such as Hadoop, R, and SAS. This type of experience can be obtained by participating in internships, summer jobs, or co-operative educational experiences at data analytics firms.
Data scientists have excellent communication skills, including the ability to explain technical concepts to executives. They must have intellectual curiosity (because Big Data is always changing), creativity (since wrangling Big Data into usable datasets often takes a lot of ingenuity and imagination), strong problem-solving and analytical skills, and the ability to work well both alone and as a member of a team. Familiarity with both theoretical and applied technical details of predictive modeling, machine learning, statistical analysis, and data visualization is also important. Other important traits include a detail-oriented personality, time-management skills, a strong work ethic, and a willingness to continue to learn throughout one’s career.
Data science professionals also need familiarity with many types of software languages and tools. According to the data-focused vendor Figure Eight, the following technical skills are in strongest demand: SQL (a special-purpose programming language), Hadoop (an open-source software framework for storing data), Python and Java (general-purpose programming languages), and R (a language and environment for statistical computing and graphics). Other in-demand software languages, platforms, data warehousing structures, and tools include C++, Tableau, Ruby, Clojure, MATLAB, Pig, Hive, Spark, SAS, Stata, and SPSS.