A Look Inside the Think Tank...
I dug into the AMP Toolbox Optimizer a bit and realized it replaces the boilerplate CSS with an externally referenced file called v0.css and was like 🤔 hmmm, why is that? Here’s a real world example, check for the 2nd request v0.css. So I tried to understand what was going on, if you care, read on.
Initially, the browser doesn’t know what AMP components like <amp-something> are. It only knows after the AMP runtime (and each components’ libraries) is loaded. The problem is, browsers are forgiving, so they just assume unknown tags are there on error, and ignore them:
Note the error? I wrote <dif>, and by that created an unknown tag. The browser will still display my Yeah!, and ignore everything else.
It gets worse when you add AMP, as some AMP components alter the box model of things. You can see the effect in this example file (view source):
The fake <amp-karussell> I have created simulates this issue. You can’t visually differentiate the two side by side AMP images from my fake carousel.
The AMP boilerplate essentially makes sure that for a grace period nothing is shown (white screen). You can simulate this by request-blocking the AMP runtime on a real AMP page and you will notice that after the grace period of 8 seconds is over, the layout is messed up, the <div>s form the last example show up as block element, as HTML’s creators wanted them to appear:
Now to the actual question, why the CSS file? It’s not the boilerplate, but sort of an AMP CSS Normalizer (more to that in the next section). With the AMP Toolbox, you can simply on your own site already apply the optimizations that the cache would apply on the CDN. I’m quoting directly from the documentation:
“In order to avoid Flash of Unstyled Content (FOUC) and reflows resulting from to the usage of web-components, AMP requires websites to add the amp-boilerplate in the header.
The amp-boilerplate renders the page invisible by changing it’s opacity, while the fonts and the AMP Runtime load. Once the AMP runtime is loaded, it is able to correctly set the sizes of the custom elements and once that happens, the runtimes makes the page visible again.
As a consequence, the first render of the page doesn’t happen until the AMP Runtime is loaded.
To improve this, AMP server-side rendering applies the same rules as the AMP Runtime on the server. This ensures that the reflow will not happen and the AMP boilerplate is no longer needed. The first render no longer depends on the AMP Runtime being loaded, which improves load times.
Caveats: it’s important to note that, even though the text content and layout will show faster, content that depends on the custom AMP elements (eg: any element in the page that starts with ’amp-’) will only be visible after the AMP Runtime is loaded.”
Looking into the CSS File
So now what does the CSS file that I called AMP CSS Normalizer do? If we look at the beautified source code here, we can see this beauty:
The toolbox optimizes…
<amp-img width=360 height=200 layout=responsive src=image.png></amp-img>
<amp-img width="360" height="200" layout="responsive" src="image.png" class="i-amphtml-layout-responsive i-amphtml-layout-size-defined" i-amphtml-layout="responsive"></amp-img>
What this means is that when we have a responsive <amp-img> and even if the browser has no clue what an <amp-img> is, it would still display it as a block element.
Styling actually (and maybe surprisingly) still works, even if the browser apart from that ignores the unknown tag:
border: solid red 1px;
Wooohoo, <beer>beer!</beer> Cheers!
This makes sure that the <beer> tag does what I told it to do:
Hope this was helpful.
I've released a new Chrome extension today that detects Service Workers in modern websites.
Why would you want this? If you aren't into Web development, most probably you wouldn't. However, if you are into Web development, the extension helps you identify (unexpected) Service Worker registrations in the wild and lets you analyze their code and learn from them.
Why would you use this extension and not just the amazing Chrome Developer Tools? The answer is that the extension proactively detects Service Workers before you even have to open the Developer Tools (which you probably eventually will end up doing anyway).
Last week, I attended the 25th International World Wide Web Conference (WWW2016) that took place from April 11 to 15, 2016 in Montréal, Canada. The main proceedings and the companion proceedings are both available online. Google was one of the gold sponsors and Google Director of Research Peter Norvig delivered one of the main keynotes. This is my trip report with personal highlights and observations.
Workshops, Day 1
I started the conference with the Making Sense of Microposts workshop that began with an invited talk by Yahoo Research Scientist Mihajlo Grbovic on Leveraging Blogging Activity on Tumblr to Infer Demographics and Interests of Users for Advertising Purposes. As a ground truth for their gender prediction they have used US Census data on popular baby names and for female reached a precision of 0.806 (recall 0.838) and for male a precision of 0.794 (recall 0.689).
I spent the rest of the day with session hopping between the Microposts workshop and the Computational Social Science for the Web tutorial.
Workshops, Day 2
My second day was fully dedicated to the Wiki Workshop that started with surprise guest and Wikipedia co-founder Jimmy Wales, which led to a short discussion of, among other topics, payment and reward models for authors on Wikipedia and Wikia.
The workshop had an interesting concept of invited talks that filled the day, and the actual papers being presented at a poster session during lunch. I want to highlight the paper With a Little Help from my Neighbors: Person Name Linking Using the Wikipedia Social Network by J. Geiß and M. Gertz on named entity linking and disambiguation based on their co-occurrence in Wikipedia pages, and the paper Finding Structure in Wikipedia Edit Activity: An Information Cascade Approach by R. Tinati et al. My own paper Wikipedia Tools for Google Spreadsheets introduces a Google Spreadsheets add-on that facilitates working with data from Wikipedia and Wikidata from within a spreadsheet context.
My invited talk at the workshop covered The Wiki(pedia|data) Edit Streams Firehose, which you can see visualized and audiolized in my Wikipedia Screensaver that I have developed for the talk and released as open source.
Main Conference, Day 1
The main conference started with a keynote by Sir Tim Berners-Lee whose talk touched on the topic of mobile Web apps—which he prefers over native apps, because
when [one goes] native, [one] become[s] part of a value chain—and that Web apps need to get closer to the capabilites of native apps (he did not mention Service Worker specifically, but it was somewhat clear from the context that he was aiming at this API).
After the keynote, I saw the presentation of a paper titled Immersive Recommendation: News and Event Recommendations Using Personal Digital Traces on an approach to leverage cross-platform user profiles for news and event recommendations. The authors' demo worked very well when I tested it with my YouTube and Twitter accounts.
Next, I learned how the team at YouTube deal with spammy comments by analyzing the temporal graph based on the engagement behavior pattern between users and videos from the paper presentation of In a World That Counts: Clustering and Detecting Fake Social Engagement at Scale.
An interesting idea to prevent online trackers from tracking personally identifiable information was shown in the paper Tracking the Trackers by the makers of the Web browser CLIQZ. Their approach leverages concepts from k-anonymity by—rather than working with fixed block lists—having users collectively identify unsafe tracking elements in the background that have the potential to uniquely identify individual users, and by then removing such information from tracking requests.
The paper Crowdsourcing Annotations for Websites' Privacy Policies: Can It Really Work? tackles the issue of lengthy and hard-to-read privacy policies and whether crowdsourcing their annotation can help. The authors come to the conclusion that, if carefully deployed, crowdsourcing can indeed result in the generation of non-trivial annotations and can also help identify elements of ambiguity in policies. A demo with annotated privacy policies shows some examples.
From the poster session, I especially liked Visual Positions of Links and Clicks on Wikipedia that looked at the visual positions of clicked links on Wikipedia based on the Wikipedia clickstream dataset and Travel the World: Analyzing and Predicting Booking Behavior using E-Mail Travel Receipts that examined more than 25 million travel receipts from Yahoo Mail users to predict their booking behavior.
Main Conference, Day 2
Day 2 started with a keynote by Mary Ellen Zurko, Principal Engineer at Cisco Systems, in which she provided a tour down memory lane through security from S-HTTP to Experimenting At Scale With Google Chrome's SSL Warning.
From the research track, I first want to highlight a Yahoo
Labs paper on Predicting Pre-click Quality for Native Advertisements. Native ads are defined as a specific form of online advertising, where ads replicate the look-and-feel of their serving platform. The authors introduce the notion of bad ads that have a high Offensive Feedback Rate (OFR), i.e., the relation between the number of times an ad was rated offensive and the number of impressions. According to the paper, the OFR metrics are more reliable than the commonly used click-through rate (CTR) metrics.
One of my favorite papers of the conference was Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes (lighter reading: slides, project homepage) that aims at identifying hoaxes on Wikipedia, i.e., deliberately fabricated falsehood made to masquerade as truth. Some famous hoaxes survived for more than nine years and were widely cited in the media.
I continued with the presentation of our Industry Track paper From Freebase to Wikidata: The Great Migration, in which we describe our ongoing data transfer project for migrating the (now shut-down) structured knowledge base Freebase to Wikidata. We further report on the data mapping challenges, provide an analysis of the progress so far, and also describe the Primary Sources Tool that aims to facilitate this—and future—data migrations. The tool has been released as open source.
For me, the day ended with an interesting paper on The QWERTY Effect on the Web—How Typing Shapes the Meaning of Words in Online Human-Computer Interaction. I had never heard of the QWERTY effect before, but it is based on the hypothesis that on average words typed with more letters from the right side of the keyboard are more positive in meaning than words typed with more letters from the left. According to the paper, there is some evidence that this hypothesis also holds true for the Web.
Main Conference, Day 3
In the paper Tell Me About Yourself: The Malicious CAPTCHA Attack, the authors show how fake CAPTCHAS (Completely Automated Public Turing tests to tell Computers and Humans Apart) can be used to trick users into unwillingly disclosing private information like one's Facebook name displayed in (social widget)
iframes embedded in attack pages that do not have access to this private data due to the Same Origin Policy by having users solve such fake CAPTCHAs consisting of many CSS-disguised
Google runs a service called Safe Browsing that alerts users when websites get compromised. In the paper Remedying Web Hijacking: Notification Effectiveness and Webmaster Comprehension, the authors provide a study that captures the life cycle of 760,935 hijacking incidents from July, 2014 to June, 2015, as identified by Google Safe Browsing and Search Quality. They observe that direct communication with webmasters
increases the likelihood of cleanup by over 50% and reduces infection
lengths by at least 62%.
Another paper on Wikipedia looked at Growing Wikipedia Across Languages via Recommendation by detecting missing articles, ranking them by local importance, and finally contacting potential Wikipedia editors via email and suggesting them to write the article in question. The authors have deployed the Wikipedia GapFinder that shows the appraoch in practice.
The Social Media Research Foundation provides a NodeXL-based visualization of the network of tweets that used the #WWW2016 hashtag, including all my #WWW2016 tweets.
One thing I noticed at the conference is that we (and I fully include myself here) from time to time still tend to unconsciously use stereotyped, gendered language where it is inadequate in the general case ("so easy my mom or grandma could use it", "to pass the 'mom test'", etc.). I called this out in a tweet. You may want to follow the interesting conversation it has started on Twitter or Facebook (if you are friends with me). This tweet led Christopher Gutteridge to create the imaginative naive Web user Rube.
Oh, and in the old days, there used to be more bananas… Next conference!
International Conference on Web Engineering (ICWE2015): Trip Report
Last week, I attended the 15th International Conference on Web Engineering in Rotterdam, the Netherlands. Google was one of the industry sponsors and Google Zurich's Enrique Alfonseca delivered one of the keynotes on news processing at Google and general advances in language understanding. As the name suggests, the focus of the conference was on Web engineering aspects; so below, in my list of personal paper highlights, I have also included a number of demo papers:
Beyond Graph Search: Exploring and Exploiting Rich Connected Data Sets: Discusses open questions and research directions for (knowledge) graph search (by a former Bing intern).
Conflict Resolution in Collaborative User Interface Mashups: Shows how to resolve conflicts in a collaborative iGoogle-like user interface using the operational transformation algorithm.
Collaborative Drawing Annotations on Web Videos: Web Components built with Polymer for WebRTC-based collaborative video annotation.
Tilt-and-Tap: Framework to Support Motion-Based Web Interaction Techniques: Nice demo of mobile device interaction patterns that might be interesting for photo gallery exploration.
SUMMA: A Common API for Linked Data Entity Summaries (best paper candidate): An API interface description for comparing entity summaries (i.e., ranked facts for an entity as presented in knowledge panels on Google, Bing, Yahoo!).
Curtains Up! Lights, Camera, Action! Documenting the Creation of Theater and Opera Productions with Linked Data and Web Technologies (disclosure: my paper): Web Components built with Polymer for the creation of hypervideos and the consumption of Linked Data Fragments.
conference in sunny Barcelona, Spain. The whole conference-related social network discussions and photos were captured on Eventifier. Below are some interesting links for your reading pleasure: