Best Practices for Hacking Third-Party Sites

While customizing vendor web services is not the most glamorous task, it’s something almost every library does. Whether we have full access to a templating system, as with LibGuides 2, or merely the ability to insert an HTML header or footer, as on many database platforms, we are tested by platform limitations and a desire to make our organization’s fractured web presence cohesive and usable.

What does customizing a vendor site look like? Let’s look at one example before going into best practices. Many libraries subscribe to EBSCO databases, which have a corresponding administrative side “EBSCOadmin”. Electronic Resources and Web Librarians commonly have credentials for these admin sites. When we sign into EBSCOadmin, there are numerous configuration options for our database subscriptions, including a “branding” tab under the “Customize Services” section.

While EBSCO’s branding options include specifying the primary and secondary colors of their databases, there’s also a “bottom branding” section which allows us to inject custom HTML. Branding colors can be important, but this post is focuses on effectively injecting markup onto vendor web pages. The steps for doing so in EBSCOadmin are numerous and not informative for any other system, but the point is that when given custom HTML access one can make many modifications, from inserting text on the page, to an entirely new stylesheet, to modifying user interface behavior with JavaScript. Below, I’ve turned footer links orange and written a message to my browser’s JavaScript console using the custom HTML options in EBSCOadmin.

customized EBSCO database

These opportunities for customization come in many flavors. We might have access only to a section of HTML in the header or footer of a page. We might be customizing the appearance of our link resolver, subscription databases, or catalog. Regardless, there are a few best practices which can aid us in making modifications that are effective.

General Best Practices

Ditch best practices when they become obstacles

It’s too tempting; I have to start this post about best practices by noting their inherent limitations. When we’re working with a site designed by someone else, the quality of our own code is restricted by decisions they made for unknown reasons. Commonly-spouted wisdom—reduce HTTP requests! don’t use eval! ID selectors should be avoided!—may be unusable or even counter-productive.

To note but one shining example: CSS specificity. If you’ve worked long enough with CSS then you know that it’s easy to back yourself into a corner by using overly powerful selectors like IDs or—the horror—inline style attributes. These methods of applying CSS have high specificity, which means that CSS written later in a stylesheet or loaded later in the HTML document might not override them as anticipated, a seeming contradiction in the “cascade” part of CSS. The hydrogen bomb of specificity is the !important modifier which automatically overrides anything but another !important later in the page’s styles.

So it’s best practice to avoid inline style attributes, ID selectors, and especially !important. Except when hacking on vendor sites it’s often necessary. What if we need to override an inline style? Suddenly, !important looks necessary. So let’s not get caught up following rules written for people in greener pastures; we’re already in the swamp, throwing some mud around may be called for.

There are dozens of other examples that come to mind. For instance, in serving content from a vendor site where we have no server-side control, we may be forced to violate web performance best practices such as sending assets with caching headers and utilizing compression. While minifying code is another performance best practice, for small customizations it adds little but obfuscates our work for other staff. Keeping a small script or style tweak human-readable might be more prudent. Overall, understanding why certain practices are recommended, and when it’s appropriate to sacrifice them, can aid our decision-making.

Test. Test. Test. When you’re done testing, test again

Whenever we’re creating an experience on the web it’s good to test. To test with Chrome, with Firefox, with Internet Explorer. To test on an iPhone, a Galaxy S4, a Chromebook. To test on our university’s wired network, on wireless, on 3G. Our users are vast; they contain multitudes. We try to represent their experiences as best as possible in the testing environment, knowing that we won’t emulate every possibility.

Testing is important, sure. But when hacking a third party site, the variance is more than doubled. The vendor has likely done their own testing. They’ve likely introduced their own hacks that work around issues with specific browsers, devices, or connectivity conditions. They may be using server-side device detection to send out subtly different versions of the site to different users; they may not offer the same functionality in all situations. All of these circumstances mean that testing is vitally important and unending. We will never cover enough ground to be sure our hacks are foolproof, but we better try or they’ll not work at all.

Analytics and error reporting

Speaking of testing, how will we know when something goes wrong? Surely, our users will send us a detailed error report, complete with screenshots and the full schematics of every piece of hardware and software involved. After all, they do not have lives or obligations of their own. They exist merely to make our code more error-proof.

If, however, for some odd reason someone does not report an error, we may still want to know that one occurred. It’s good to set up unobtrusive analytics that record errors or other measures of interaction. Did we revamp a form to add additional validation? Try tracking what proportion of visitors successfully submit the form, how often the validation is violated, how often users submit invalid data multiple times in a row, and how often our code encounters an error. There are some intriguing client-side error reporting services out there that can catch JavaScript errors and detail them for our perusal later. But even a little work with events in Google Analytics can log errors, successes, and everything in between. With the mere information that problems are occurring, we may be able to identify patterns, focus our testing, and ultimately improve our customizations and end-user experience.

Know when to cut your losses

Some aspects of a vendor site are difficult to customize. I don’t want to say impossible, since one can do an awful lot with only a single <script> tag to work with, but unfeasible. Sometimes it’s best to know when sinking more time and effort into a customization isn’t worth it.

For instance, our repository has a “hierarchy browse” feature which allows us to present filtered subsets of items to users. We often get requests to customize the hierarchies for specific departments or purposes—can we change the default sort, can we hide certain info here but not there, can we use grid instead of list-based results? We probably can, because the hierarchy browse allows us to inject arbitrary custom HTML at the top of each section. But the interface for doing so is a bit clumsy and would need to be repeated everywhere a customization is made, sometimes across dozens of places simply to cover a single department’s work. So while many of these change requests are technically possible, they’re unwise. Updates would be difficult and impossible to automate, virtually ensuring errors are introduced over time as I forget to update one section or make a manual mistake somewhere. Instead, I can focus on customizing the site-wide theme to fix other, potentially larger issues with more maintainable solutions.

A good alternative to tricky and unmaintainable customizations is to submit a feature request to the vendor. Some vendors have specific sites where we can submit ideas for new features and put our support behind others’ ideas. For instance, the Innovative Users Group hosts an annual vote where members can select their most desired enhancement requests. Remember that vendors want to make a better product after all; our feedback is valued. Even if there’s no formal system for submitting feature requests, a simple email to our sales representative or customer support can help.

CSS Best Practices

While the above section spoke to general advice, CSS and JavaScript have a few specific peculiarities to keep in mind while working within a hostile host environment.

Don’t write brittle, overly-specific selectors

There are two unifying characteristics of hacking on third-party sites: 1) we’re unfamiliar with the underlying logic of why the site is constructed in a particular way and 2) everything is subject to change without notice. Both of these making targeting HTML elements, whether with CSS or JavaScript, challenging. We want our selectors to be as flexible as possible, to withstand as much change as possible without breaking. Say we have the following list of helpful tools in a sidebar:

<div id="tools">
    <ul>
        <li><span class="icon icon-hat"></span><a href="#">Email a Librarian</a></li>
        <li><span class="icon icon-turtle"></span><a href="#">Citations</a></li>
        <li><span class="icon icon-unicorn"></span><a href="#">Catalog</a></li>
    </ul>
</div>

We can modify the icons listed with a selector like #tools > ul > li > span.icon.icon-hat. But many small changes could break this style: a wrapper layer injected in between the #tools div and the unordered list, a switch from unordered to ordered list, moving from <span>s for icons to another tag such as <i>. Instead, a selector like #tools .icon.icon-hat assumes that little will stay the same; it thinks there’ll be icons inside the #tools section, but doesn’t care about anything in between. Some assumptions have to stay, that’s the nature of customizing someone else’s site, but it’s pretty safe to bet on the icon classes to remain.

In general, sibling and child selectors make for poor choices for vendor sites. We’re suddenly relying not just on tags, classes, and IDs to stay the same, but also the particular order that elements appear in. I’d also argue that pseudo-selectors like :first-child, :last-child, and :nth-child() are dangerous for the same reason.

Avoid positioning if possible

Positioning and layout can be tricky to get right on a vendor site. Unless we’re confident in our tests and have covered all the edge cases, try to avoid properties like position and float. In my experience, many poorly structured vendor sites employ ad hoc box-sizing measurements, float-based layout, and lack a grid system. These are all a recipe for weird interconnections between disparate parts—we try to give a call-out box a bit more padding and end up sending the secondary navigation flying a thousand pixels to the right offscreen.

display: none is your friend

display: none is easily my most frequently used CSS property when I customize vendor sites. Can’t turn off a feature in the admin options? Hide it from the interface entirely. A particular feature is broken on mobile? Hide it. A feature is of niche appeal and adds more clutter than it’s worth? Hide it. The footer? Yeah, it’s a useless advertisement, let’s get rid of it. display: none is great but remember it does affect a site’s layout; the hidden element will collapse and no longer take up space, so be careful when hiding structural elements that are presented as menus or columns.

Attribute selectors are excellent

Attribute selectors, which enable us to target an element by the value of any of its HTML attributes, are incredibly powerful. They aren’t very common, so here’s a quick refresher on what they look. Say we have the following HTML element:

<a href="http://example.com" title="the best site, seriously" target="_blank">

This is an anchor tag with three attributes: href, title, and target. Attribute selectors allow us to target an element by whether it has an attribute or an attribute with a particular value, like so:

/* applies to <a> tags with a "target" attribute */
a[target] {
    background: red;
}
/* applies to <a> tags with an "href" that begin with "http://"
this is a great way to style links pointed at external websites
or one particular external website! */
a[href^="http://"] {
    cursor: help;
}
/* applies to <a> tags with the text "best" anywhere in their "title" attribute */
a[title*="best"] {
    font-variant: small-caps;
}

Why is this useful among the many ways we can select elements in CSS? Vendor sites often aren’t anticipating all the customizations we want to employ; they may not provide handy class and ID styling hooks where we need them. Or, as noted above, the structure of the document may be subject to change either over time or across different pieces of the site. Attribute selectors can help mitigate this by making style bindings more explicit. Instead of saying “change the background icon for some random span inside a list inside a div”, we can say “change the background icon for the link that points at our citation management tool”.

If that’s unclear, let me give another example from our institutional repository. While we have the ability to list custom links in the main left-hand navigation of our site, we cannot control the icons that appear with them. What’s worse, there are virtually no styling hooks available; we have an unadorned anchor tag to work with. But that turns out to be plenty for a selector of form a[href$=hierarchy] to target all <a>s with an href ending in “hierarchy”; suddenly we can define icon styles based on the URLs we’re pointing it, which is exactly what we want to base them on anyways.

Attribute selectors are brittle in their own ways—when our URLs change, these icons will break. But they’re a handy tool to have.

JavaScript Best Practices

Avoid the global scope

JavaScript has a notorious problem with global variables. By default, all variables lacking the var keyword are made global. Furthermore, variables outside the scope of any function will also be global. Global variables are considered harmful because they too easily allow unrelated pieces of code to interact; when everything’s sharing the same namespace, the chance that common names like i for index or count are used in two conflicting contexts increases greatly.

To avoid polluting the global scope with our own code, we wrap our entire script customizations in an immediately-invoked function expression (IIFE):

(function() {
    // do stuff here 
}())

Wrapping our code in this hideous-looking construction gives it its own scope, so we can define variables without fear of overwriting ones in the global scope. As a bonus, our code still has access to global variables like window and navigator. However, global variables defined by the vendor site itself are best avoided; it is possible they will change or are subject to strange conditions that we can’t determine. Again, the fewer assumptions our code makes about how the vendor’s site works, the more resilient it will be.

Avoid calling vendor-provided functions

Oftentimes the vendor site itself will put important functions in the global scope, funtions like submitForm or validate where their intention seems quite obvious. We may even be able to reverse engineer their code a bit, determining what the parameters we should pass to these functions are. But we must not succumb to the temptation to actually reference their code within our own!

Even if we have a decent handle on the vendor’s current code, it is far too subject to change. Instead, we should seek to add or modify site functionality in a more macro-like way; instead of calling vendor functions in our code, we can automate interactions with the user interface. For instance, say the “save” button is in an inconvenient place on a form and has the following code:

<button type="submit" class="btn btn-primary" onclick="submitForm(0)">Save</button>

We can see that the button saves the form by calling the submitForm function when it’s clicked with a value of 0. Maybe we even figure out that 0 means “no errors” whereas 1 means “error”.1 So we could create another button somewhere which calls this same submitForm function. But so many changes break our code; if the meaning of the “0” changes, if the function name changes, or if something else happens when the save button is clicked that’s not evident in the markup. Instead, we can have our new button trigger the click event on the original save button exactly as a user interacting with the site would. In this way, our new save button should emulate exactly the behavior of the old one through many types of changes.

{{Insert Your Best Practices Here}}

Web-savvy librarians of the world, what are the practices you stick to when modifying your LibGuides, catalog, discovery layer, databases, etc.? It’s actually been a while since I did customization outside of my college’s IR, so the ideas in this post are more opinion than practice. If you have your own techniques—or disagree with the ones in this post!—we’d love to hear about it in the comments.

Notes

  1. True story, I reverse engineered a vendor form where this appeared to be the case.

One thought on “Best Practices for Hacking Third-Party Sites”

  1. Most recently, when customizing a fine’s payment screen through our university’s TouchNet–think Paypal–was the use of :before and :after pseudo-selectors. If the vendor’s template is predictable, you can latch on to an element through a class or an identifier, hide or change its positioning, then insert your own custom content before or after. This is a dodgy trick for modifying the DOM when vendors allow CSS but no JavaScript.

Comments are closed.