Last week, I attended the 25th International World Wide Web Conference (WWW2016) that took place from April 11 to 15, 2016 in Montréal, Canada. The main proceedings and the companion proceedings are both available online. Google was one of the gold sponsors and Google Director of Research Peter Norvig delivered one of the main keynotes. This is my trip report with personal highlights and observations.Workshops, Day 1
I started the conference with the Making Sense of Microposts workshop that began with an invited talk by Yahoo Research Scientist Mihajlo Grbovic on Leveraging Blogging Activity on Tumblr to Infer Demographics and Interests of Users for Advertising Purposes. As a ground truth for their gender prediction they have used US Census data on popular baby names and for female reached a precision of 0.806 (recall 0.838) and for male a precision of 0.794 (recall 0.689). I spent the rest of the day with session hopping between the Microposts workshop and the Computational Social Science for the Web tutorial.
My second day was fully dedicated to the Wiki Workshop that started with surprise guest and Wikipedia co-founder Jimmy Wales, which led to a short discussion of, among other topics, payment and reward models for authors on Wikipedia and Wikia.
The workshop had an interesting concept of invited talks that filled the day, and the actual papers being presented at a poster session during lunch. I want to highlight the paper With a Little Help from my Neighbors: Person Name Linking Using the Wikipedia Social Network by J. Geiß and M. Gertz on named entity linking and disambiguation based on their co-occurrence in Wikipedia pages, and the paper Finding Structure in Wikipedia Edit Activity: An Information Cascade Approach by R. Tinati et al. My own paper Wikipedia Tools for Google Spreadsheets introduces a Google Spreadsheets add-on that facilitates working with data from Wikipedia and Wikidata from within a spreadsheet context.
My invited talk at the workshop covered The Wiki(pedia|data) Edit Streams Firehose, which you can see visualized and audiolized in my Wikipedia Screensaver that I have developed for the talk and released as open source.Main Conference, Day 1
The main conference started with a keynote by Sir Tim Berners-Lee whose talk touched on the topic of mobile Web apps—which he prefers over native apps, because
when [one goes] native, [one] become[s] part of a value chain—and that Web apps need to get closer to the capabilites of native apps (he did not mention Service Worker specifically, but it was somewhat clear from the context that he was aiming at this API).
After the keynote, I saw the presentation of a paper titled Immersive Recommendation: News and Event Recommendations Using Personal Digital Traces on an approach to leverage cross-platform user profiles for news and event recommendations. The authors' demo worked very well when I tested it with my YouTube and Twitter accounts.
Next, I learned how the team at YouTube deal with spammy comments by analyzing the temporal graph based on the engagement behavior pattern between users and videos from the paper presentation of In a World That Counts: Clustering and Detecting Fake Social Engagement at Scale.
An interesting idea to prevent online trackers from tracking personally identifiable information was shown in the paper Tracking the Trackers by the makers of the Web browser CLIQZ. Their approach leverages concepts from k-anonymity by—rather than working with fixed block lists—having users collectively identify unsafe tracking elements in the background that have the potential to uniquely identify individual users, and by then removing such information from tracking requests.
The paper Crowdsourcing Annotations for Websites' Privacy Policies: Can It Really Work? tackles the issue of lengthy and hard-to-read privacy policies and whether crowdsourcing their annotation can help. The authors come to the conclusion that, if carefully deployed, crowdsourcing can indeed result in the generation of non-trivial annotations and can also help identify elements of ambiguity in policies. A demo with annotated privacy policies shows some examples.
From the poster session, I especially liked Visual Positions of Links and Clicks on Wikipedia that looked at the visual positions of clicked links on Wikipedia based on the Wikipedia clickstream dataset and Travel the World: Analyzing and Predicting Booking Behavior using E-Mail Travel Receipts that examined more than 25 million travel receipts from Yahoo Mail users to predict their booking behavior.
Day 2 started with a keynote by Mary Ellen Zurko, Principal Engineer at Cisco Systems, in which she provided a tour down memory lane through security from S-HTTP to Experimenting At Scale With Google Chrome's SSL Warning.
From the research track, I first want to highlight a Yahoo
Labs paper on Predicting Pre-click Quality for Native Advertisements. Native ads are defined as a specific form of online advertising, where ads replicate the look-and-feel of their serving platform. The authors introduce the notion of bad ads that have a high Offensive Feedback Rate (OFR), i.e., the relation between the number of times an ad was rated offensive and the number of impressions. According to the paper, the OFR metrics are more reliable than the commonly used click-through rate (CTR) metrics.
One of my favorite papers of the conference was Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes (lighter reading: slides, project homepage) that aims at identifying hoaxes on Wikipedia, i.e., deliberately fabricated falsehood made to masquerade as truth. Some famous hoaxes survived for more than nine years and were widely cited in the media.I continued with the presentation of our Industry Track paper From Freebase to Wikidata: The Great Migration, in which we describe our ongoing data transfer project for migrating the (now shut-down) structured knowledge base Freebase to Wikidata. We further report on the data mapping challenges, provide an analysis of the progress so far, and also describe the Primary Sources Tool that aims to facilitate this—and future—data migrations. The tool has been released as open source.
For me, the day ended with an interesting paper on The QWERTY Effect on the Web—How Typing Shapes the Meaning of Words in Online Human-Computer Interaction. I had never heard of the QWERTY effect before, but it is based on the hypothesis that on average words typed with more letters from the right side of the keyboard are more positive in meaning than words typed with more letters from the left. According to the paper, there is some evidence that this hypothesis also holds true for the Web.Main Conference, Day 3
In the paper Tell Me About Yourself: The Malicious CAPTCHA Attack, the authors show how fake CAPTCHAS (Completely Automated Public Turing tests to tell Computers and Humans Apart) can be used to trick users into unwillingly disclosing private information like one's Facebook name displayed in (social widget)
iframes embedded in attack pages that do not have access to this private data due to the Same Origin Policy by having users solve such fake CAPTCHAs consisting of many CSS-disguised
Google runs a service called Safe Browsing that alerts users when websites get compromised. In the paper Remedying Web Hijacking: Notification Effectiveness and Webmaster Comprehension, the authors provide a study that captures the life cycle of 760,935 hijacking incidents from July, 2014 to June, 2015, as identified by Google Safe Browsing and Search Quality. They observe that direct communication with webmasters increases the likelihood of cleanup by over 50% and reduces infection lengths by at least 62%.
Another paper on Wikipedia looked at Growing Wikipedia Across Languages via Recommendation by detecting missing articles, ranking them by local importance, and finally contacting potential Wikipedia editors via email and suggesting them to write the article in question. The authors have deployed the Wikipedia GapFinder that shows the appraoch in practice.Other Observations
One thing I noticed at the conference is that we (and I fully include myself here) from time to time still tend to unconsciously use stereotyped, gendered language where it is inadequate in the general case ("so easy my mom or grandma could use it", "to pass the 'mom test'", etc.). I called this out in a tweet. You may want to follow the interesting conversation it has started on Twitter or Facebook (if you are friends with me). This tweet led Christopher Gutteridge to create the imaginative naive Web user Rube.
Can we stop calling the cliché inexperienced Web user our mom or our grandma? #WWW2016— Thomas Steiner (@tomayac) April 14, 2016
Oh, and in the old days, there used to be more bananas… Next conference!