Revolutionizing Chinese Text Mining by Olivier Khorasani ’24
Wed, November 8th, 2023
1:10 am - 1:50 pm
- This event has passed.
Revolutionizing Chinese Text Mining: The Unsupervised Top-Down Analysis of Chinese Texts by Olivier Khorasani ’24, Wednesday November 8, 1:10 – 1:50pm, North Science Building 015, Wachenheim
Abstract:
In our current era of abundant digitized text data, the need for effective computational tools to extract information from Chinese texts is paramount. In this talk we will dive into a groundbreaking approach known as TopWORDS, an unsupervised method that simultaneously discovers and segments words and phrases in unstructured Chinese texts. TopWORDS addresses the challenge of dealing with unknown vocabularies, making it particularly valuable for mining online and domain-specific content, especially with East-Asian languages that do not have an alphabet. When coupled with context analysis tools, the results it yields rival or surpass those from previous supervised segmentation methods, revolutionizing Chinese text analysis as a whole.