Data Scientists


About

Exploring this Job

Join the Technology Student Association, which provides students the opportunity to explore career opportunities in technology, science, engineering, and mathematics, enter academic competitions (including ones that focus on software development and coding), and participate in summer exploration programs. Visit http://www.tsaweb.org for more information. 

Participate in open data science, programming, and related competitions for high school, students, college students, and professionals. These competitions will allow you to learn more about the field, build your skills, and meet people with similar interests. Some well-known competitions include:

  • Kaggle: https://www.kaggle.com
  • American Statistical Association (ASA) DataFest: http://ww2.amstat.org/education/datafest
  • Consortium for the Advancement of Undergraduate Statistics Education/ASA Undergraduate Statistics Project Competition: https://www.causeweb.org/usproc
  • TopCoder: https://www.topcoder.com
  • Association for Computing Machinery Special Interest Group on Management of Data Student Research Competition: https://sigmod2019.org/student_rs_competition

Finally, check out the following resources to learn more about data science:

  • Computerworld: http://www.computerworld.com
  • Data Science Glossary: http://www.datascienceglossary.org
  • STATtr@k: http://stattrak.amstat.org
  • American Statistical Association journals: http://www.amstat.org/ASA/Publications/Journals.aspx
  • Data Science Association Blog: http://www.datascienceassn.org/blog

The Job

Advances in technology have created mountains of information (e.g., articles and books, blog posts, texts, videos, photographs, maps, electronic medical records, bank records, census and other survey data, and other types of data). And the mountains are just growing taller every year. In fact, the global market intelligence firm IDC reports that the amount of data on the planet is expected to grow 10-fold from 2014 to 2020—increasing from 4.4 zettabytes to 44 zettabytes. (One zetabyte is the equivalent of 180 million Libraries of Congress; the Library of Congress holds about 29 million volumes.)

These mountains of data (often referred to as Big Data) are like catnip to businesses—who see mountains of dollars in the effective analysis of this information, which will allow them to sell more goods and services to their customers, streamline manufacturing and order fulfillment departments, improve quality control and assurance, create new products and services, manage their brands more effectively, and otherwise better operate their businesses and serve their customers.

Data scientists work with two types of big data: structured and unstructured. The U.S. Department of Labor classifies structured data as “numbers and words that can be easily categorized and analyzed. These data are generated by things like network sensors embedded in electronic devices, smartphones, and global positioning system devices. Structured data also include things like sales figures, account balances, and transaction data. Unstructured data include more complex information, such as customer reviews from commercial Web sites, photos and other multimedia, and comments on social networking sites. These data cannot easily be separated into categories or analyzed numerically.”

Data scientists make sense of structured and unstructured data, developing software to extract and analyze valuable kernels of knowledge from massive datasets. They help companies and other organizations to identify what types of data will be useful, and then develop algorithms that can capture and analyze this data. Data scientists typically work directly with corporate-level executives, providing advice on how best to make use of business intelligence gleaned from data and/or to develop systems that will capture useful data.

Job responsibilities vary for data scientists depending on their employer and other factors, but most perform the following duties:

  • work with organizational leaders to identify data goals and expected outcomes of the use of this data (higher sales, reaching more constituents, etc.)
  • identify what types of data (e.g., text, images, clickstream or metering data, etc.) is available and relevant to the organization’s needs
  • identify/create the appropriate algorithms to discover patterns in data and extract useful information from large datasets
  • collect raw data and prepare it for review by data analysts and executives
  • integrate traditional structured data with unstructured data from the Web and social media
  • determine the validity of the data, how long the information will be useful, and how it relates to other information that has already been collected
  • analyze and integrate multiple datasets and provide recommendations based on their findings to support the implications of data for products, processes, and decisions
  • research improvements to data collection methods and algorithms   
  • identify new opportunities for enhancing their employer’s products or services, such as adding additional demographic segments and different data sources