Chapter 1 ‐ What's in it for Librarians? - sarahwsutton/Introduction_to_datascience_for_librarians GitHub Wiki

1.1 What is data science?

"Data science combines math and statistics, specialized programming, advanced analytics, artificial intelligence (AI) and machine learning with specific subject matter expertise to uncover actionable insights hidden in an organization’s data. These insights can be used to guide decision making and strategic planning" (IBM, 2021).

Don't let the first sentence of this definition of data science throw you off. You can begin to practice applying data science techniques without a degree in math, programming, or machine learning. Consider the second half of the definition, "specific subject matter expertise." Librarians' expertise is in information, obtaining, using, and preserving it. Moreover, the use of data to "guide decision making and strategic planning" is an activity that is as important to libraries as it is to any other institution.

Think of data science as another tool in the librarian's tool kit.

Figure 1.1 Data Science as a Discipline

(Viterbi School of Engineering, University of Southern California, 2021)

As a way of considering the breadth of data science, consider statistics. Statistics is a body of mathematical knowledge that’s applied in many, many different disciplines as well as being a discipline itself. Data science is the same. It’s a discipline itself, but it’s also a body of knowledge that can be applied in many, many other disciplines. If you look closely at the visualization in Figure 1, you’ll see words like privacy, ontologies, data streaming, and the social sciences included in the box that is meant to depict the breadth of data science. Those should be familiar terms to you because they’re all things that we’re interested in in library and information science.

1.2 Why learn about data science? What's in it for libraries and librarians?

We'll explore data science in terms of what it is and how to apply it to work in libraries and information centers from by taking by taking a two pronged approach. The first is considering how libraries might apply data science techniques to library problems like using data to support public library outreach efforts and as part of user needs analyses. It's about how can we USE data science in libraries.

The second prong is to consider providing patron services to researchers and scholars who work with data. This is often needed in academic libraries where it’s called data services or research data services. It refers to libraries’ mission to preserve knowledge. Libraries, especially academic libraries, have been providing this kind of service for quite some time. For instance, academic libraries are often closely associated with institutional archives and thus in charge of maintaining the records related to the history of the institution. They’re also very often in charge of institutional repositories, which refers to collecting, preserving, and making accessible the products of research and scholarship produced by faculty and researchers of the institution.

As long ago as the mid 2000s libraries were providing data services. The Texas Digital Library (TDL) was an effort begun by the big research libraries in Texas to create a single space for storing, preserving, and making freely available the results of research and scholarship by faculty and researchers from colleges and universities in the state. The idea was that rather than each institution spending dollars and man-hours on their own repositories, they could combine those efforts, share the workload, and make those research results more widely available. TDL's collection included not only the final reports of research like journal articles, but also the other research by products such as interim progress reports to grant funders of research, audio and video materials related to research, and even the raw data upon which research was being conducted.

Continue on to Chapter 2 - Python Programming for Data Science.