A Look Inside the Think Tank...

Service Worker Detector Chrome Extension Released

Created on and categorized as Technical.
Written by Thomas Steiner.

I've released a new Chrome extension today that detects Service Workers in modern websites.

Why would you want this? If you aren't into Web development, most probably you wouldn't. However, if you are into Web development, the extension helps you identify (unexpected) Service Worker registrations in the wild and lets you analyze their code and learn from them.

Why would you use this extension and not just the amazing Chrome Developer Tools? The answer is that the extension proactively detects Service Workers before you even have to open the Developer Tools (which you probably eventually will end up doing anyway).

World Wide Web Conference (WWW2016): Trip Report

Created on and categorized as Work.
Written by Thomas Steiner.

Last week, I attended the 25th International World Wide Web Conference (WWW2016) that took place from April 11 to 15, 2016 in Montréal, Canada. The main proceedings and the companion proceedings are both available online. Google was one of the gold sponsors and Google Director of Research Peter Norvig delivered one of the main keynotes. This is my trip report with personal highlights and observations.

Workshops, Day 1

I started the conference with the Making Sense of Microposts workshop that began with an invited talk by Yahoo Research Scientist Mihajlo Grbovic on Leveraging Blogging Activity on Tumblr to Infer Demographics and Interests of Users for Advertising Purposes. As a ground truth for their gender prediction they have used US Census data on popular baby names and for female reached a precision of 0.806 (recall 0.838) and for male a precision of 0.794 (recall 0.689). I spent the rest of the day with session hopping between the Microposts workshop and the Computational Social Science for the Web tutorial.

Workshops, Day 2

My second day was fully dedicated to the Wiki Workshop that started with surprise guest and Wikipedia co-founder Jimmy Wales, which led to a short discussion of, among other topics, payment and reward models for authors on Wikipedia and Wikia.

The workshop had an interesting concept of invited talks that filled the day, and the actual papers being presented at a poster session during lunch. I want to highlight the paper With a Little Help from my Neighbors: Person Name Linking Using the Wikipedia Social Network by J. Geiß and M. Gertz on named entity linking and disambiguation based on their co-occurrence in Wikipedia pages, and the paper Finding Structure in Wikipedia Edit Activity: An Information Cascade Approach by R. Tinati et al. My own paper Wikipedia Tools for Google Spreadsheets introduces a Google Spreadsheets add-on that facilitates working with data from Wikipedia and Wikidata from within a spreadsheet context.

My invited talk at the workshop covered The Wiki(pedia|data) Edit Streams Firehose, which you can see visualized and audiolized in my Wikipedia Screensaver that I have developed for the talk and released as open source.

Main Conference, Day 1

The main conference started with a keynote by Sir Tim Berners-Lee whose talk touched on the topic of mobile Web apps—which he prefers over native apps, because when [one goes] native, [one] become[s] part of a value chain—and that Web apps need to get closer to the capabilites of native apps (he did not mention Service Worker specifically, but it was somewhat clear from the context that he was aiming at this API).

After the keynote, I saw the presentation of a paper titled Immersive Recommendation: News and Event Recommendations Using Personal Digital Traces on an approach to leverage cross-platform user profiles for news and event recommendations. The authors' demo worked very well when I tested it with my YouTube and Twitter accounts.

Next, I learned how the team at YouTube deal with spammy comments by analyzing the temporal graph based on the engagement behavior pattern between users and videos from the paper presentation of In a World That Counts: Clustering and Detecting Fake Social Engagement at Scale.

An interesting idea to prevent online trackers from tracking personally identifiable information was shown in the paper Tracking the Trackers by the makers of the Web browser CLIQZ. Their approach leverages concepts from k-anonymity by—rather than working with fixed block lists—having users collectively identify unsafe tracking elements in the background that have the potential to uniquely identify individual users, and by then removing such information from tracking requests.

The paper Crowdsourcing Annotations for Websites' Privacy Policies: Can It Really Work? tackles the issue of lengthy and hard-to-read privacy policies and whether crowdsourcing their annotation can help. The authors come to the conclusion that, if carefully deployed, crowdsourcing can indeed result in the generation of non-trivial annotations and can also help identify elements of ambiguity in policies. A demo with annotated privacy policies shows some examples.

From the poster session, I especially liked Visual Positions of Links and Clicks on Wikipedia that looked at the visual positions of clicked links on Wikipedia based on the Wikipedia clickstream dataset and Travel the World: Analyzing and Predicting Booking Behavior using E-Mail Travel Receipts that examined more than 25 million travel receipts from Yahoo Mail users to predict their booking behavior.

Main Conference, Day 2

Day 2 started with a keynote by Mary Ellen Zurko, Principal Engineer at Cisco Systems, in which she provided a tour down memory lane through security from S-HTTP to Experimenting At Scale With Google Chrome's SSL Warning.

From the research track, I first want to highlight a Yahoo Labs Research paper on Predicting Pre-click Quality for Native Advertisements. Native ads are defined as a specific form of online advertising, where ads replicate the look-and-feel of their serving platform. The authors introduce the notion of bad ads that have a high Offensive Feedback Rate (OFR), i.e., the relation between the number of times an ad was rated offensive and the number of impressions. According to the paper, the OFR metrics are more reliable than the commonly used click-through rate (CTR) metrics.

One of my favorite papers of the conference was Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes (lighter reading: slides, project homepage) that aims at identifying hoaxes on Wikipedia, i.e., deliberately fabricated falsehood made to masquerade as truth. Some famous hoaxes survived for more than nine years and were widely cited in the media.

I continued with the presentation of our Industry Track paper From Freebase to Wikidata: The Great Migration, in which we describe our ongoing data transfer project for migrating the (now shut-down) structured knowledge base Freebase to Wikidata. We further report on the data mapping challenges, provide an analysis of the progress so far, and also describe the Primary Sources Tool that aims to facilitate this—and future—data migrations. The tool has been released as open source.

For me, the day ended with an interesting paper on The QWERTY Effect on the Web—How Typing Shapes the Meaning of Words in Online Human-Computer Interaction. I had never heard of the QWERTY effect before, but it is based on the hypothesis that on average words typed with more letters from the right side of the keyboard are more positive in meaning than words typed with more letters from the left. According to the paper, there is some evidence that this hypothesis also holds true for the Web.

Main Conference, Day 3

In the paper Tell Me About Yourself: The Malicious CAPTCHA Attack, the authors show how fake CAPTCHAS (Completely Automated Public Turing tests to tell Computers and Humans Apart) can be used to trick users into unwillingly disclosing private information like one's Facebook name displayed in (social widget) iframes embedded in attack pages that do not have access to this private data due to the Same Origin Policy by having users solve such fake CAPTCHAs consisting of many CSS-disguised iframes.

Google runs a service called Safe Browsing that alerts users when websites get compromised. In the paper Remedying Web Hijacking: Notification Effectiveness and Webmaster Comprehension, the authors provide a study that captures the life cycle of 760,935 hijacking incidents from July, 2014 to June, 2015, as identified by Google Safe Browsing and Search Quality. They observe that direct communication with webmasters increases the likelihood of cleanup by over 50% and reduces infection lengths by at least 62%.

Another paper on Wikipedia looked at Growing Wikipedia Across Languages via Recommendation by detecting missing articles, ranking them by local importance, and finally contacting potential Wikipedia editors via email and suggesting them to write the article in question. The authors have deployed the Wikipedia GapFinder that shows the appraoch in practice.

Other Observations

The Social Media Research Foundation provides a NodeXL-based visualization of the network of tweets that used the #WWW2016 hashtag, including all my #WWW2016 tweets.

One thing I noticed at the conference is that we (and I fully include myself here) from time to time still tend to unconsciously use stereotyped, gendered language where it is inadequate in the general case ("so easy my mom or grandma could use it", "to pass the 'mom test'", etc.). I called this out in a tweet. You may want to follow the interesting conversation it has started on Twitter or Facebook (if you are friends with me). This tweet led Christopher Gutteridge to create the imaginative naive Web user Rube.

Oh, and in the old days, there used to be more bananas… Next conference!

International Conference on Web Engineering (ICWE2015): Trip Report

Created on and categorized as Work.
Written by Thomas Steiner.

International Conference on Web Engineering (ICWE2015): Trip Report

Last week, I attended the 15th International Conference on Web Engineering in Rotterdam, the Netherlands. Google was one of the industry sponsors and Google Zurich's Enrique Alfonseca delivered one of the keynotes on news processing at Google and general advances in language understanding. As the name suggests, the focus of the conference was on Web engineering aspects; so below, in my list of personal paper highlights, I have also included a number of demo papers:

Beyond Graph Search: Exploring and Exploiting Rich Connected Data Sets: Discusses open questions and research directions for (knowledge) graph search (by a former Bing intern).

Conflict Resolution in Collaborative User Interface Mashups: Shows how to resolve conflicts in a collaborative iGoogle-like user interface ​using the operational transformation algorithm.

Collaborative Drawing Annotations on Web Videos: Web Components built with Polymer for WebRTC-based collaborative video annotation.

Tilt-and-Tap: Framework to Support Motion-Based Web Interaction Techniques: Nice demo of mobile device interaction patterns that might be interesting for photo gallery exploration.

SUMMA: A Common API for Linked Data Entity Summaries (best paper candidate): An API interface description for comparing entity summaries (i.e., ranked facts for an entity as presented in knowledge panels on Google, Bing, Yahoo!).

Curtains Up! Lights, Camera, Action! Documenting the Creation of Theater and Opera Productions with Linked Data and Web Technologies (disclosure: my paper): Web Components built with Polymer for the creation of hypervideos and the consumption of Linked Data Fragments.

Mediterranea.JS: Trip Report

Created on and categorized as Technical.
Written by Thomas Steiner.

Mediterranea.JS—Trip Report

This week, I attended and spoke at Mediterranea.JS, a JavaScript developer conference in sunny Barcelona, Spain. The whole conference-related social network discussions and photos were captured on Eventifier. Below are some interesting links for your reading pleasure:

World Wide Web Conference (WWW2015): Trip Report

Created on and categorized as Work.
Written by Thomas Steiner.

World Wide Web Conference (WWW2015)—Trip Report

The week before last, I attended the 24th International World Wide Conference ( WWW2015 ) in Florence, Italy. Google was a gold sponsor, and Google's Distinguished Scientist Andrei Broder  delivered one of the main keynotes. The core proceedings  and the companion proceedings  are available online. This is my trip report with personal highlights and key take-aways.

Workshops, Day 1

I started the conference on Monday with the Workshop on Web APIs and RESTful Design  ( WS-REST ) that I have co-organized together with Ruben Verborgh  (University of Gent) and Carlos Pedrinaci  (The Open University). We had three main themes in the workshop: testing, hypermedia and semantics, and REST in practice. The day started with a keynote delivered by Erik Wilde  ( ex-Siemens ); one of his main points—that also got identified as a general workshop theme—was that the REST world, despite all self-descriptiveness, still needs service descriptions and better testability. Erik shared his keynote slides  on his personal website. The WS-REST proceedings  can be found online. Personally, I liked Ronnie Mitra 's (CA Technologies) slides  and paper on his upcoming API design tool Rápido  a lot.

One of the workshop attendants, Michael Petychakis , also wrote a workshop report .

On the same day, I also had an accepted paper in the Workshop Ad Targeting at Scale  ( TargetAd ), co-organized by Googler D. Sculley . The title of my paper is AdAlyze Redux: Post-Click and Post-Conversion Text Feature Attribution for Sponsored Search Ads . In the paper, I describe a tool in use in my organization at Google to show large-scale advertisers what textual features work in their ads. The workshop triggered broad industry interest with presenters and speakers coming from Twitter, Yahoo!, Etsy, Adobe, eBay, Facebook, and Google (D. Sculley). The TargetAd proceedings  are available online.

Workshops, Day 2

I spent the first half of Tuesday morning in the Workshop Linked Data on the Web  ( LDOW ), and the second half in the Workshop on Web and Data Science for News Publishing  ( NewsWWW ). From LDOW , I want to highlight DBpedia Atlas , an alternative visualization of DBpedia ( demo ). NewsWWW  had an interesting paper on gender bias in news images . In the afternoon, I attended Facebook's Antoine Bordes ' and Google's Evgeniy Gabrilovich 's tutorial on Constructing and Mining Web-scale Knowledge Graphs  (slides from KDD 2014 , but similar enough to the ones at WWW).

Main Conference, Day 1

The main conference began with a keynote  by Jeanette Hofmann  (Berlin University of the Arts), who raised a number of critical points that she named "dilemmas of digitalization". She especially mentioned the Right to be Forgotten  and how (personal) data has become the currency we pay our free apps with.

A number of papers from Wednesday morning that I want to highlight are The Dynamics of Micro-Task Crowdsourcing: The Case of Amazon MTurk  on crowdsourcing with Amazon's Mechanical Turk, a Facebook study on The Lifecycles of Apps in a Social Ecosystem  where they study, among other things, app sustainability, and finally a Google paper on account recovery secret questions titled Secrets, Lies, and Account Recovery: Lessons From the Use of Personal Knowledge Questions at Google .

In the afternoon, I listened to Philipp Singer 's presentation  of their paper HypTrails: A Bayesian Approach for Comparing Hypotheses about Human Trails on the Web  (best paper award) , wherein they present "a general approach called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states" . Further of interest to me was a Google paper titled Getting More for Less: Optimized Crowdsourcing with Dynamic Tasks and Goals   where the authors "optimize the crowdsourcing process by jointly maximizing the user longevity in the system and the true value that the system derives from user participation" . The Yahoo! paper Evolution of Conversations in the Age of Email Overload  looked at 16 billion emails between 2 million users and studied the reply times and reply lengths as indicators of how people deal with email overload. The task of benchmarking entity annotation systems reproducibly was addressed in the paper GERBIL - General Entity Annotator Benchmarking Framework .

I follow privacy implications of Web tracking critically (probably due to my day job ), so the paper Cookies That Give You Away: The Surveillance Implications of Web Tracking  was of great interest to me. I generally liked the track Security and Privacy 3 – Browsers  a lot. Related to my PhD research on breaking news events and their perception in online social networks , I enjoyed the paper Crowdsourcing the Annotation of Rumourous Conversations in Social Media  very much.

Main Conference, Day 2

I started Thursday after the keynote with an interesting Yahoo! paper on explorative entity search titled From "Selena Gomez" to "Marlon Brando": Understanding Explorative Entity Search  that identified query patterns that lead to explorative searching. A somewhat emotional paper that certainly raises privacy warning flags was Diagnoses, Decisions, and Outcomes: Web Search as Decision Support for Cancer ,  which examined search behavior of patients detected with cancer. The paper Mining Missing Hyperlinks from Human Navigation Traces: A Case Study of Wikipedia  looked at identifying missing hyperlinks in Wikipedia.

During the lunch break, I called in an informal meeting  of the W3C Media Fragments WG  and interested friends in order to discuss extensions to Media Fragments URI  by allowing for more than rectangular spatial fragment shapes and dynamic moving spatial fragments. The notes  are on the mailing list.

In the afternoon, I attended the Industry Knowledge Graphs PechaKucha  20×20 and Panel  where Googler Chris Welty  presented the Google Knowledge Graph , Yuqing Gao  gave an overview of Microsoft's (Bing's) Satori , Paul Groth  talked about Elsevier's scholarly publications graph, and Lora Aroyo  presented Tagasauris' mediaGraph. This also touched on my 20% project together with Googlers Denny Vrandečić  and Sebastian Schaffert  around migrating Freebase to Wikidata  via a crowdsourcing approach titled primary sources  tool.

From the posters and demos session in the evening, I want to highlight whoVIS: Visualizing Editor Interactions and Dynamics in Collaborative Writing Over Time , which deals with visualizing editor interactions in Wikipedia ( demo ).

Main Conference, Day 3

Friday began with Andrei Broder's excellent keynote How good was the crystal ball? A personal perspective and retrospective on favorite Web research topics  where he first looked back at search engines and what worked and what did not work (subscribing to pages for obtaining change notifications). I especially liked the outlook  he gave for semantic smart agents  and how Google Now  is just the beginning.

Again driven by my PhD topic, I followed the paper presentation of Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts  who showed how scepticism-driven follow-up queries on questionable news-spreading posts may reveal rumors early on. A fun paper was User Review Sites as a Resource for Large-Scale Sociolinguistic Studies  that, among other things, detected that users older than 34 mostly use smileys with nose " :-) " , those younger than 34 without nose " :) ".

General observations

  • People start to get tired of PDF proceedings when past WWW conferences explicitly required HTML submissions, as highlighted by Andrei Broder in his keynote. RASH : Research Articles in Simplified HTML is an attempt to bring back the Web to WWW.
  • Larry Page and Sergey Brin were awarded  the Test of Time  award for their paper The Anatomy of a Large-Scale Hypertextual Web Search Engine .
  • Splitting the poster and demos session into two sub-sessions is a great idea that certainly reduced my personal perceptual overload.
  • Commonly frowned upon and joked about, the lack of any formal speech or program topic at all during the (not so gala) dinner felt somewhat inadequate.
  • Paul Groth wrote a WWW trip report , too, as did Amy Guy with her WWW observations , eXascale with their blog post, and Daniel Garijo with his "first time at WWW" post.