Advances in Data Mining and Machine Learning for Chat Sentiment and Library Account-Based Recommendations

"Library transactional data from chat transactions and subject metadata in checkout clusters represent hugely untapped areas for innovation. Two recent projects at a research library have highlighted the applicability of machine learning methods to reveal trends in large sets of library transactional data. This presentation will detail the machine learning methods utilized for two recent research projects, an account based recommender service and data mining chat transactions for sentiment analysis. A contention of this talk is that research library systems hold vast stores of use data whose size precludes regular analysis through traditional manual methods or basic search queries. Machine learning offers great potential to routinely analyze library big data and provide new sources of insight into user behavior and needs. The basis for the account-based recommendations begins with clusters of checked out items that the integrated library system records when items are checked out. Drawing on examples from “consumer data science” (e.g. Netflix) it is clear that large corpus data that receive millions of ratings daily are part of the strategy for creating compelling recommender algorithms. Topic metadata clusters, collected from transactional checkout data of items that are checked out together form the basis for generating a rule set. After nearly a year of data stream collection the system has collected over 250,000 rows of anonymized transactions representing checkouts with topic metadata. The research team used the data mining tool WEKA to run a machine learning process offline. Chat transcripts were analyzed using methods from sentiment mining social media data and product reviews to build and test an automated sentiment analyzer. Anonymized transcripts were human-coded for sentiment to produce a gold standard dataset. Freely available natural language learning tools utilizing Python and Scikit-learn were then trained and tested on the dataset to develop an automated sentiment classifier. The classifier reported high levels of precision and accuracy in analyzing the test set of data, and the study revealed a number of fruitful paths to study in refining and implementing analysis into routine assessment activities. "

Speaker(s)

Jim Hahn

David Ward

February 14^th

03:40 PM

20 minutes

View Slides