The 24th annual ACM SIGKDD Conference on knowledge discovery and data mining, also known as the KDD conference, took place in London this year. Three members of team Twipe attended: Hannes Buseyne, Jasmien Lismont, and Joris Gielen. Hear from them what lessons they took away and what the highlights of this conference were.
Which research stood out to you?
“I Know You’ll Be Back“, a paper on churn prediction from Snapchat in collaboration with the University of Illinois. They defined churn as not returning in the 2nd week after registering. Although the context is very different, the publishing sector can definitely apply certain concepts from this research:
- For publishers working with freemium or registered access, it’s interesting to hear that Snapchat focuses on a very short on-boarding period for their churn definition.
- Through applying clustering to new users based on activity behaviour and social network analysis, they discovered 4 groups: tendrils, outsides, core and disconnected users. For each of these clusters, churning behaviour was occurring differently and, moreover, for some groups it was easier to predict churn than for others. The techniques they applied can – to a certain extent – be applied to the publishing sector, such as through our product EngageReaders in which activity behaviour can be tracked.
Was there anything that you think was overlooked?
The News sector This year, there was already a KDD workshop on Data Science, Journalism & Digital Media, a great start! It would be nice to have an AI conference focused on News. There are certain characteristics which makes the sector we work in different from a data point of view. What to do if you have very few responses, sparse data and news which changes every day (or even hour) and is outdated after a few hours?
What’s the key message from KDD?
The conference highlighted the very creative ways people are using data science and deep learning, both for problems I had not thought off and in ways I never thought of.
Which presentation stood out to you?
David Hand, who was one of the key note speakers, gave a very interesting presentation about how data science and machine learning are used in the financial service industry. He described how some features like ‘gender’ are forbidden to use by law, because it would be discrimination and discussed the implications of that.
What was your main takeaway?
Data science can be used everywhere. A lot of the presented papers proposed the use of data science in things that will directly impact everyday people (e.g. smart traffic lights).
What do you think was overlooked?
Maybe the cost or practicality of implementing some of the proposed solutions in a production environment. For example we wanted to implement one of the ideas from a paper that was presented, but it would take too much memory and processing power to make it work, rendering the presented idea impractical.