A Look Inside the Think Tank...
Last week, I attended the 25th International World Wide Web Conference (WWW2016) that took place from April 11 to 15, 2016 in Montréal, Canada. The main proceedings and the companion proceedings are both available online. Google was one of the gold sponsors and Google Director of Research Peter Norvig delivered one of the main keynotes. This is my trip report with personal highlights and observations.
Workshops, Day 1
I started the conference with the Making Sense of Microposts workshop that began with an invited talk by Yahoo Research Scientist Mihajlo Grbovic on Leveraging Blogging Activity on Tumblr to Infer Demographics and Interests of Users for Advertising Purposes. As a ground truth for their gender prediction they have used US Census data on popular baby names and for female reached a precision of 0.806 (recall 0.838) and for male a precision of 0.794 (recall 0.689).
I spent the rest of the day with session hopping between the Microposts workshop and the Computational Social Science for the Web tutorial.
Workshops, Day 2
My second day was fully dedicated to the Wiki Workshop that started with surprise guest and Wikipedia co-founder Jimmy Wales, which led to a short discussion of, among other topics, payment and reward models for authors on Wikipedia and Wikia.
The workshop had an interesting concept of invited talks that filled the day, and the actual papers being presented at a poster session during lunch. I want to highlight the paper With a Little Help from my Neighbors: Person Name Linking Using the Wikipedia Social Network by J. Geiß and M. Gertz on named entity linking and disambiguation based on their co-occurrence in Wikipedia pages, and the paper Finding Structure in Wikipedia Edit Activity: An Information Cascade Approach by R. Tinati et al. My own paper Wikipedia Tools for Google Spreadsheets introduces a Google Spreadsheets add-on that facilitates working with data from Wikipedia and Wikidata from within a spreadsheet context.
My invited talk at the workshop covered The Wiki(pedia|data) Edit Streams Firehose, which you can see visualized and audiolized in my Wikipedia Screensaver that I have developed for the talk and released as open source.
Main Conference, Day 1
The main conference started with a keynote by Sir Tim Berners-Lee whose talk touched on the topic of mobile Web apps—which he prefers over native apps, because
when [one goes] native, [one] become[s] part of a value chain—and that Web apps need to get closer to the capabilites of native apps (he did not mention Service Worker specifically, but it was somewhat clear from the context that he was aiming at this API).
After the keynote, I saw the presentation of a paper titled Immersive Recommendation: News and Event Recommendations Using Personal Digital Traces on an approach to leverage cross-platform user profiles for news and event recommendations. The authors' demo worked very well when I tested it with my YouTube and Twitter accounts.
Next, I learned how the team at YouTube deal with spammy comments by analyzing the temporal graph based on the engagement behavior pattern between users and videos from the paper presentation of In a World That Counts: Clustering and Detecting Fake Social Engagement at Scale.
An interesting idea to prevent online trackers from tracking personally identifiable information was shown in the paper Tracking the Trackers by the makers of the Web browser CLIQZ. Their approach leverages concepts from k-anonymity by—rather than working with fixed block lists—having users collectively identify unsafe tracking elements in the background that have the potential to uniquely identify individual users, and by then removing such information from tracking requests.
The paper Crowdsourcing Annotations for Websites' Privacy Policies: Can It Really Work? tackles the issue of lengthy and hard-to-read privacy policies and whether crowdsourcing their annotation can help. The authors come to the conclusion that, if carefully deployed, crowdsourcing can indeed result in the generation of non-trivial annotations and can also help identify elements of ambiguity in policies. A demo with annotated privacy policies shows some examples.
From the poster session, I especially liked Visual Positions of Links and Clicks on Wikipedia that looked at the visual positions of clicked links on Wikipedia based on the Wikipedia clickstream dataset and Travel the World: Analyzing and Predicting Booking Behavior using E-Mail Travel Receipts that examined more than 25 million travel receipts from Yahoo Mail users to predict their booking behavior.
Main Conference, Day 2
Day 2 started with a keynote by Mary Ellen Zurko, Principal Engineer at Cisco Systems, in which she provided a tour down memory lane through security from S-HTTP to Experimenting At Scale With Google Chrome's SSL Warning.
From the research track, I first want to highlight a Yahoo
Labs paper on Predicting Pre-click Quality for Native Advertisements. Native ads are defined as a specific form of online advertising, where ads replicate the look-and-feel of their serving platform. The authors introduce the notion of bad ads that have a high Offensive Feedback Rate (OFR), i.e., the relation between the number of times an ad was rated offensive and the number of impressions. According to the paper, the OFR metrics are more reliable than the commonly used click-through rate (CTR) metrics.
One of my favorite papers of the conference was Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes (lighter reading: slides, project homepage) that aims at identifying hoaxes on Wikipedia, i.e., deliberately fabricated falsehood made to masquerade as truth. Some famous hoaxes survived for more than nine years and were widely cited in the media.
I continued with the presentation of our Industry Track paper From Freebase to Wikidata: The Great Migration, in which we describe our ongoing data transfer project for migrating the (now shut-down) structured knowledge base Freebase to Wikidata. We further report on the data mapping challenges, provide an analysis of the progress so far, and also describe the Primary Sources Tool that aims to facilitate this—and future—data migrations. The tool has been released as open source.
For me, the day ended with an interesting paper on The QWERTY Effect on the Web—How Typing Shapes the Meaning of Words in Online Human-Computer Interaction. I had never heard of the QWERTY effect before, but it is based on the hypothesis that on average words typed with more letters from the right side of the keyboard are more positive in meaning than words typed with more letters from the left. According to the paper, there is some evidence that this hypothesis also holds true for the Web.
Main Conference, Day 3
In the paper Tell Me About Yourself: The Malicious CAPTCHA Attack, the authors show how fake CAPTCHAS (Completely Automated Public Turing tests to tell Computers and Humans Apart) can be used to trick users into unwillingly disclosing private information like one's Facebook name displayed in (social widget)
iframes embedded in attack pages that do not have access to this private data due to the Same Origin Policy by having users solve such fake CAPTCHAs consisting of many CSS-disguised
Google runs a service called Safe Browsing that alerts users when websites get compromised. In the paper Remedying Web Hijacking: Notification Effectiveness and Webmaster Comprehension, the authors provide a study that captures the life cycle of 760,935 hijacking incidents from July, 2014 to June, 2015, as identified by Google Safe Browsing and Search Quality. They observe that direct communication with webmasters
increases the likelihood of cleanup by over 50% and reduces infection
lengths by at least 62%.
Another paper on Wikipedia looked at Growing Wikipedia Across Languages via Recommendation by detecting missing articles, ranking them by local importance, and finally contacting potential Wikipedia editors via email and suggesting them to write the article in question. The authors have deployed the Wikipedia GapFinder that shows the appraoch in practice.
The Social Media Research Foundation provides a NodeXL-based visualization of the network of tweets that used the #WWW2016 hashtag, including all my #WWW2016 tweets.
One thing I noticed at the conference is that we (and I fully include myself here) from time to time still tend to unconsciously use stereotyped, gendered language where it is inadequate in the general case ("so easy my mom or grandma could use it", "to pass the 'mom test'", etc.). I called this out in a tweet. You may want to follow the interesting conversation it has started on Twitter or Facebook (if you are friends with me). This tweet led Christopher Gutteridge to create the imaginative naive Web user Rube.
Oh, and in the old days, there used to be more bananas… Next conference!
International Conference on Web Engineering (ICWE2015): Trip Report
Last week, I attended the 15th International Conference on Web Engineering in Rotterdam, the Netherlands. Google was one of the industry sponsors and Google Zurich's Enrique Alfonseca delivered one of the keynotes on news processing at Google and general advances in language understanding. As the name suggests, the focus of the conference was on Web engineering aspects; so below, in my list of personal paper highlights, I have also included a number of demo papers:
Beyond Graph Search: Exploring and Exploiting Rich Connected Data Sets: Discusses open questions and research directions for (knowledge) graph search (by a former Bing intern).
Conflict Resolution in Collaborative User Interface Mashups: Shows how to resolve conflicts in a collaborative iGoogle-like user interface using the operational transformation algorithm.
Collaborative Drawing Annotations on Web Videos: Web Components built with Polymer for WebRTC-based collaborative video annotation.
Tilt-and-Tap: Framework to Support Motion-Based Web Interaction Techniques: Nice demo of mobile device interaction patterns that might be interesting for photo gallery exploration.
SUMMA: A Common API for Linked Data Entity Summaries (best paper candidate): An API interface description for comparing entity summaries (i.e., ranked facts for an entity as presented in knowledge panels on Google, Bing, Yahoo!).
Curtains Up! Lights, Camera, Action! Documenting the Creation of Theater and Opera Productions with Linked Data and Web Technologies (disclosure: my paper): Web Components built with Polymer for the creation of hypervideos and the consumption of Linked Data Fragments.
conference in sunny Barcelona, Spain. The whole conference-related social network discussions and photos were captured on Eventifier. Below are some interesting links for your reading pleasure:
World Wide Web Conference (WWW2015)—Trip Report
The week before last, I attended the 24th International World Wide Conference ( WWW2015 ) in Florence, Italy. Google was a gold sponsor, and Google's Distinguished Scientist Andrei Broder delivered one of the main keynotes. The core proceedings and the companion proceedings are available online. This is my trip report with personal highlights and key take-aways.
Workshops, Day 1
I started the conference on Monday with the Workshop on Web APIs and RESTful Design ( WS-REST ) that I have co-organized together with Ruben Verborgh (University of Gent) and Carlos Pedrinaci (The Open University). We had three main themes in the workshop: testing, hypermedia and semantics, and REST in practice. The day started with a keynote delivered by Erik Wilde ( ex-Siemens ); one of his main points—that also got identified as a general workshop theme—was that the REST world, despite all self-descriptiveness, still needs service descriptions and better testability. Erik shared his keynote slides on his personal website. The WS-REST proceedings can be found online. Personally, I liked Ronnie Mitra 's (CA Technologies) slides and paper on his upcoming API design tool Rápido a lot.
One of the workshop attendants, Michael Petychakis , also wrote a workshop report .
On the same day, I also had an accepted paper in the Workshop Ad Targeting at Scale ( TargetAd ), co-organized by Googler D. Sculley . The title of my paper is AdAlyze Redux: Post-Click and Post-Conversion Text Feature Attribution for Sponsored Search Ads . In the paper, I describe a tool in use in my organization at Google to show large-scale advertisers what textual features work in their ads. The workshop triggered broad industry interest with presenters and speakers coming from Twitter, Yahoo!, Etsy, Adobe, eBay, Facebook, and Google (D. Sculley). The TargetAd proceedings are available online.
Workshops, Day 2
I spent the first half of Tuesday morning in the Workshop Linked Data on the Web ( LDOW ), and the second half in the Workshop on Web and Data Science for News Publishing ( NewsWWW ). From LDOW , I want to highlight DBpedia Atlas , an alternative visualization of DBpedia ( demo ). NewsWWW had an interesting paper on gender bias in news images . In the afternoon, I attended Facebook's Antoine Bordes ' and Google's Evgeniy Gabrilovich 's tutorial on Constructing and Mining Web-scale Knowledge Graphs (slides from KDD 2014 , but similar enough to the ones at WWW).
Main Conference, Day 1
The main conference began with a keynote by Jeanette Hofmann (Berlin University of the Arts), who raised a number of critical points that she named "dilemmas of digitalization". She especially mentioned the Right to be Forgotten and how (personal) data has become the currency we pay our free apps with.
A number of papers from Wednesday morning that I want to highlight are The Dynamics of Micro-Task Crowdsourcing: The Case of Amazon MTurk on crowdsourcing with Amazon's Mechanical Turk, a Facebook study on The Lifecycles of Apps in a Social Ecosystem where they study, among other things, app sustainability, and finally a Google paper on account recovery secret questions titled Secrets, Lies, and Account Recovery: Lessons From the Use of Personal Knowledge Questions at Google .
In the afternoon, I listened to Philipp Singer 's presentation of their paper HypTrails: A Bayesian Approach for Comparing Hypotheses about Human Trails on the Web (best paper award) , wherein they present "a general approach called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states" . Further of interest to me was a Google paper titled Getting More for Less: Optimized Crowdsourcing with Dynamic Tasks and Goals where the authors "optimize the crowdsourcing process by jointly maximizing the user longevity in the system and the true value that the system derives from user participation" . The Yahoo! paper Evolution of Conversations in the Age of Email Overload looked at 16 billion emails between 2 million users and studied the reply times and reply lengths as indicators of how people deal with email overload. The task of benchmarking entity annotation systems reproducibly was addressed in the paper GERBIL - General Entity Annotator Benchmarking Framework .
I follow privacy implications of Web tracking critically (probably due to my day job ), so the paper Cookies That Give You Away: The Surveillance Implications of Web Tracking was of great interest to me. I generally liked the track Security and Privacy 3 – Browsers a lot. Related to my PhD research on breaking news events and their perception in online social networks , I enjoyed the paper Crowdsourcing the Annotation of Rumourous Conversations in Social Media very much.
Main Conference, Day 2
I started Thursday after the keynote with an interesting Yahoo! paper on explorative entity search titled From "Selena Gomez" to "Marlon Brando": Understanding Explorative Entity Search that identified query patterns that lead to explorative searching. A somewhat emotional paper that certainly raises privacy warning flags was Diagnoses, Decisions, and Outcomes: Web Search as Decision Support for Cancer , which examined search behavior of patients detected with cancer. The paper Mining Missing Hyperlinks from Human Navigation Traces: A Case Study of Wikipedia looked at identifying missing hyperlinks in Wikipedia.
During the lunch break, I called in an informal meeting of the W3C Media Fragments WG and interested friends in order to discuss extensions to Media Fragments URI by allowing for more than rectangular spatial fragment shapes and dynamic moving spatial fragments. The notes are on the mailing list.
In the afternoon, I attended the Industry Knowledge Graphs PechaKucha 20×20 and Panel where Googler Chris Welty presented the Google Knowledge Graph , Yuqing Gao gave an overview of Microsoft's (Bing's) Satori , Paul Groth talked about Elsevier's scholarly publications graph, and Lora Aroyo presented Tagasauris' mediaGraph. This also touched on my 20% project together with Googlers Denny Vrandečić and Sebastian Schaffert around migrating Freebase to Wikidata via a crowdsourcing approach titled primary sources tool.
From the posters and demos session in the evening, I want to highlight whoVIS: Visualizing Editor Interactions and Dynamics in Collaborative Writing Over Time , which deals with visualizing editor interactions in Wikipedia ( demo ).
Main Conference, Day 3
Friday began with Andrei Broder's excellent keynote How good was the crystal ball? A personal perspective and retrospective on favorite Web research topics where he first looked back at search engines and what worked and what did not work (subscribing to pages for obtaining change notifications). I especially liked the outlook he gave for semantic smart agents and how Google Now is just the beginning.
Again driven by my PhD topic, I followed the paper presentation of Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts who showed how scepticism-driven follow-up queries on questionable news-spreading posts may reveal rumors early on. A fun paper was User Review Sites as a Resource for Large-Scale Sociolinguistic Studies that, among other things, detected that users older than 34 mostly use smileys with nose " :-) " , those younger than 34 without nose " :) ".
- People start to get tired of PDF proceedings when past WWW conferences explicitly required HTML submissions, as highlighted by Andrei Broder in his keynote. RASH : Research Articles in Simplified HTML is an attempt to bring back the Web to WWW.
- Larry Page and Sergey Brin were awarded the Test of Time award for their paper The Anatomy of a Large-Scale Hypertextual Web Search Engine .
- Splitting the poster and demos session into two sub-sessions is a great idea that certainly reduced my personal perceptual overload.
- Commonly frowned upon and joked about, the lack of any formal speech or program topic at all during the (not so gala) dinner felt somewhat inadequate.
- Paul Groth wrote a WWW trip report , too, as did Amy Guy with her WWW observations , eXascale with their blog post, and Daniel Garijo with his "first time at WWW" post.
Last week, I attended the 13th International Semantic Web Conference in Riva del Garda, Italy. Google was a gold sponsor, and Vice President Prabhakar Raghavan delivered one of the keynotes. This is my trip report with personal highlights and key take-aways.
I started the conference on Sunday with the Developers Workshop, where I had two papers. The workshop was the first of its kind and was put together by my good friend Ruben Verborgh. It pulled more than 70 people in the room and the workshop was prominently featured during the main conference's opening
Of personal interest for me were the following works. Knowledge Graphs were a core theme during the conference, and one example based on OpenRefine was shown by Parmesan et al. in form of
Dandelion. Liepi? et al. showed an ontology visualizer called OWLGrEd. Ebner et al. showed a system called LDcache that deals with caching flaky Linked Data sources. With XSPARQL, DellAglio et al. presented a language and implementation combining XML, SPARQL, and SQL to query heterogeneous data sources. Matteis et al. showed how App Engine or
Google Code among others can be used as "free" and queryable triple pattern data stores. Ceccarelli showed an entity linking framework called Dexter. My first contribution is titled Comprehensive Wikipedia Monitoring for Global and Realtime Natural Disaster Detection and focuses on natural disaster detection and monitoring with Wikipedia and online social networks. My second contribution is a paper called Self-Contained Semantic Hypervideos Using Web Components and introduces Web Components for the creation of hypervideos.
On Monday morning, I attended the Consuming Linked Data workshop. The most interesting paper for me was by Rula et al., which dealt with the recency of facts in DBpedia. In the afternoon, I switched to the NLP and DBpedia workshop where the highlight was an amazing 300 slides in 30 minutes keynote by Roberto Navigli on BabelNet, Babelfy, Games with a Purpose, and the Wikipedia Bitaxonomy. Further of interest was a paper by Weisenburger et al. on mining historical data for DBpedia via Wikipedia infoboxes.
Tuesday started with Prabhakar's well-received keynote, in which he provided an overview of search engine development in the last years. His book has a nice summary. I then went to the NLP & IEs track, where the best-paper-award-winning paper on the AGDISTIS framework on entity disambiguation by Usbeck et al. was presented. In the afternoon, I attended the Data Integration and Link Discovery track. I liked a paper by Erxleben et al. that described the integration of Wikidata in the Linked Data Web. From the demos in the evening, I want to specially highlight the best-demo-award-winning paper by my friends Verborgh et al. on Linked Data Fragments on a Raspberry Pie. In general, Linked Data Fragments were one of the themes at this conference with several works citing them and also the release of the official DBpedia Linked Data Fragments interface.
On Wednesday, I attended the User Interaction and Personalization track, where I want to highlight a paper by Uchida et al. who presented a Chrome extension on browser personalization. I further liked a paper by Khamkham et al. on the CrowdTruth framework for harnessing disagreement in gathering annotated data. In the afternoon, my personal highlight was Verborgh et al.'s full paper on Linked Data Fragments.
I skipped Thursday morning and was back in the afternoon for the Linked Data track. Notable papers include Beek et al.'s LOD Laundromat that provides a solution for streamlining access to Linked Data sources by cleansing and format conversion and Patel-Schneider's analysis of Schema.org and some (author's view) recommendations on how to improve it. Meusel et al. gave an overview of the current state of WebDataCommons project that examines Microdata, RDFa, and Microformats distribution in the CommonCrawl corpus.
The conference was archived by Eventifier and Seen. All papers are available as open access preprints on GitHub.