Intuition and Epistemology of High-Dimensional Vector Space 2: Empirical Visualizations

Posted by Fabian Offert on March 24, 2017.
Tweet this post or cite this post (BibTex).

This is the second of two blog posts exploring the intuition and epistemology of high-dimensional vector space and the notion of visualization. While the previous post proposed that “solving is visualizing”, this post looks at generative methods as another extension of the epistemology of visualization. The post is partly based upon a very inspiring discussion with Leonardo Impett at the Coding Dürer digital art history “hackathon” that I attended in Munich this March, and a conversation on the “unreasonable effectiveness of RNNs” I had with Teddy Roland a while back.

In the digital humanities, scatterplots and similar visualizations have become such a common sight that they begin to look like the one “natural” way of translating high-dimensional data into Euclidian space, with the dimensionality reduction techniques behind them (in most cases t-SNE1) sharing this appearance of universality.

And there are good reasons for this status quo. As argued in the previous post, the only geometrically intuitive space is Euclidian space, so there is no way around dimensionality reduction. Furthermore, while t-SNE and other dimensionality reduction techniques necessarily always distort at least some portion of the high-dimensional data, it is (even mathematically) safe to assume that, at the end of the day, these distortions are negligible.

However, there is a big caveat to these types of visualization, or rather to these specific combinations of numerical and geometric visualizations, that is often overlooked: they tell us nothing about the way in which they were obtained. In other words: the semantic structure of the visualization (relations, clusters, transformations etc.) bears no resemblance to, and provides no information about, the semantic structure of the algorithms used to create it. The tool is absorbed into the result, and thus the mediation of the result becomes invisible, or at least unattainable for any critical analysis.

Visualizing Data, Visualizing Neural Networks

This is a general problem of neural network based machine learning: as all the complexity is in the data (this is the reason why neural networks only started to re-appear with the advent of big, i.e. “web-scale” data), it becomes difficult to visualize what exactly a network learns. While intuitively this should only hold for “deep” (multi-layer), and not for “shallow” (single-layer) neural networks like word embedding models, in fact word embedding models suffer from the same problem, as embeddings are essentially non-linear mappings (technically, word embedding models are autoencoders), and as it is exactly this “ambiguity” (in linguistic terms) or nonlinearity (in mathematical terms) that makes them work so well.

Generally, the assumption that the problem of visualizing high-dimensional data is highly related to the problem of visualizing neural networks is trivial, given the reliance of neural networks on high-dimensional feature spaces. It becomes more interesting, however, if we turn to recent solutions to the latter problem and try to apply them to the former.

Other than visualizations in the digital humanities, visualizations in computer science are often simple illustrations rather than tools for exploration and analysis. This is notably different for neural networks, where understanding what exactly the network learns is crucial to its further development. Often, heat maps are used to visualize the activation of neurons for certain inputs. Zeiler et. al were the first to propose patterns that maximize a specific neuron’s activation as a means of visualizing learned features2:

Visualization of CNN layers by means of maximum-activation patterns
Visualization of CNN layers by means of maximum-activation patterns3

By mathematically reversing the convolution operations in the network (“deconvolution”), Zeiler et. al. map activations back to the “input pixel space”, i.e. the image level. Concretely: to “examine a given convnet activation, we set all other activations in the layer to zero and pass the feature maps as input to the attached deconvnet layer. Then we successively (i) unpool, (ii) rectify and (iii) filter to reconstruct the activity in the layer beneath that gave rise to the chosen activation. This is then repeated until input pixel space is reached.”4

The resulting patterns, however, are still patterns that are “hidden” in the original data, i.e. the visualization simply “points to” parts of the data. This, of course, closely ties its readability to the readability of the data. To overcome this limitation, we can introduce generative methods: we simply convert a “classifier” network into a “generator” network that mathematically produces completely new images – fictions, if you will.

Generative Visualizations

Since the most recent comeback of “artificial intelligence”5, generative neural networks have time and again been dragged into the limelight. Google’s trippy-kitschy “Deep Dream”6, for instance enjoyed a brief spell of popularity during the summer of 2015, with Google even sponsoring an “exhibition/auction” of “art”7 created with Deep Dream. Andre Karpathy’s (now famous) blog post on the “Unreasonable Effectiveness of RNNs”8 describes several examples of generative RNNs that produce everything from Shakespeare sonnets to mock Linux kernel code9 (the accompanying paper also provides a good example of heat map type visualizations for text10). Finally, generative adversarial networks11 have been used to create almost photorealistic sets of images. In a generative adversarial network, two neural networks compete with each other: a generator and a discriminator. The discriminator tries to learn if an image was produced by the generator or if it is part of the original training set, thereby constantly “encouraging” the generator to produce better images. Eventually, for well-trained networks and large data sets, it is usually just the few odd looking outliers that give away their artificial nature:

Generated bedrooms from Alec Radford, Luke Metz, and Soumith Chintala, notice the highly distorted outlier in the lower left corner
Generated bedrooms from Alec Radford, Luke Metz, and Soumith Chintala,12 notice the highly distorted outlier in the lower left corner

But what, if any, are the use cases for such generated artifacts, beyond being impressive (and sometimes surprisingly witty) examples of the power of neural networks? It does not take much to imagine them soon being used for the most dismal dissipations of human creativity: advertising. Much like generated music today is created today mainly to avoid having to use copyrighted music, I suspect it will not be too long until fake bedrooms like those in Radford et. al. will appear in IKEA catalogs and as backgrounds in cheap children’s TV shows, replacing the already “blue-collared” job of 3D modeling the mundane entirely.

But I claim that there is another use case that has been somewhat overlooked: we can treat generated images as “empirical visualizations”13. Instead of looking for a concise symbolic representation of the semantic structures within the algorithm and the data, we produce an endless stream of material artifacts. Or, with Bruno Latour: we are temporarily reversing the chain of reference underlying the epistemology of (scientific) discovery.

The Epistemology of Representation for Generative Methods

The idea of a “chain of reference”, put forward by Latour in “Sampling the soil in the Amazonian forest”, is simple: We commonly understand the process of (scientific) discovery as an epistemological operation that bridges a very large gap between the material and the symbolic realms by generating a language-based description of a material object, then subjecting this description to logical operations which in turn yield new information about the material object. Latour argues that, instead, we should look at this process as a continuous oscillation between the material and the symbolic, a set of multiple tiny jumps in which each symbolic operation on a material object provides the “raw material”14 for the next, itself becoming material again: “Reference is our way of keeping something constant through a series of transformations”15.

Bruno Latour’s original chain of reference
Bruno Latour’s original chain of reference16

In our version, the chain of reference’s “upstream” direction now briefly describes a series of jumps from the symbolic to the material, while its “downstream” direction now describes a series of jumps from the material to the symbolic. Our “raw material” is now a symbolic object, and its description is a material object. However, this does not change the functionality of the chain of reference at all: “Truth-value” still “circulates in this chain, like electricity in a wire.”17

But why is this even possible? How can we possibly artificially generate “material truth”? The answer is simple: all this happens within the realm of the digital, where the only material is the flow of electrons in the integrated circuits. At least for the computer as an epistemological object in the real world, there truly “is no software”, as Kittler famously states18. Nevertheless, and this is where Kittler’s dogma needs to be softened, there is an ontology to computational processes which allows us to understand it in terms of the material and the symbolic, as a system of objects and properties related to one another in hierarchies and networks that are clear or opaque, small or large, permanent or temporary. In other words, the ontology of computational processes is very much spatial, and as such can be approximated through the concept of representation.

By generating images, we are thus visualizing the vector space encoded by the trained neural network: not by reducing its dimensionality but by manufacturing lots of imperfect “prints” with the hope that they will, as a set, provide an empirical approximation of the vector space’s semantic structure, and thus of the semantic structure of the original data as well.

“I Know It When I See It!”

A very interesting example is “Image Synthesis from Yahoo’s open_nsfw” (warning: the website behind this link features some artificial pornographic imagery), a project by Gabriel Goh at UC Davis. Goh created a generative adversarial network out of a classifier network called “open_nsfw” created by Yahoo to distinguish workplace-safe (“sfw”) from “not-safe-for-work” (“nsfw”) imagery: a literal mathematical model of the “I Know It When I See It” dogma. By generating sets of images ranging from most to least pornographic, Goh produces some interesting insights into Yahoo’s specific interpretation of “nsfw”. First of all, and not surprisingly, in Yahoo’s definition, “nsfw” exclusively means pornographic images, not, for instance, images of violence (I will leave the analysis of the implicit cultural logic behind this as exercise to the reader). Most interesting, however, are the “least pornographic” images generated by the network:

Most “safe-for-work” images from open_nsfw
Most “safe-for-work” images from open_nsfw

Goh notes that these images “all have a distinct pastoral quality – depictions of hills, streams and generally pleasant scenery. This is likely an artifact of the negative examples used in the training set.” Apparently, these “landscapes” tell us something about the very specific idea of “safe” images that the creators of open_nsfw had in mind. Because, if we think about it, what really is the “opposite” of pornography? The range of possibilities here is very broad (one obvious candidate would be fully clothed people?), but the creators of open_nsfw nevertheless picked a very specific candidate: pastoral landscapes.

The project itself is of course satirical but nevertheless provides an important argument for the use of generative methods as a tool of interpretation: if, for any reason, we do not have a clear concept to describe a semantic structure but also can not have a clear concept of it, generative methods can provide a means to empirically approximate, even “reverse-engineer” this concept in terms of what it not quite is.

Algorithmic Interpretation 2

This, of course, brings us back to the question of algorithmic interpretation raised in the previous post, where I suggested that, for word embeddings, every computational solution to an analogy task is itself an analogy, a description of something in terms of (a hierarchy of) something else. Not surprisingly, generative adversarial networks and word embeddings are based on the same idea of embedding, or “autoencoding” words or images.

Other than word2vec, however, where the word embedding layer provides only an approximation to the “exact analogy” in terms of a hierarchy of closest points, GANs do provide the exact solution directly as, other than intermediate words, intermediate images do exist, and do make sense. As Radford, Metz, and Chintala19 write on the GitHub page complementing their paper, “Smiling woman - neutral woman + neutral man = smiling man. Whuttttt!”; in other words: the image-based embedding space created by the network ca be described arithmetically much like a word-based embedding space.

Smiling woman - neutral woman + neutral man = smiling man
Smiling woman - neutral woman + neutral man = smiling man20

For the digital humanities, this opens up a wealth of possibilities – not so much in regard to new tools, as to a whole class of new cultural artifacts – neural networks – that need to be interpreted to find the hidden epistemological – and, for that matter, explicitly ideological – assumptions hidden in them (for instance by the intelligent application of empirical visualizations). This is a task, however, that is only manageable if we as digital humanists are aware of the existence of such hidden assumptions, and, more importantly, do not shy away from readings that go beyond the conveniently accessible interface and into the algorithmic constitution of the technologies that surround us.

  1. Laurens van der Maaten and Geoffrey Hinton, “Visualizing Data Using T-Sne,” Journal of Machine Learning Research, no. 9 (2008): 2579–2605,

  2. Matthew D Zeiler and Rob Fergus, “Visualizing and Understanding Convolutional Networks,” in European Conference on Computer Vision (Springer, 2014), 818–33.

  3. Ibid.

  4. Ibid.

  5. Without claiming to make a contribution to the debate behind it, the quotes around the term are intended here to convey that my own take on it (and subsequently on the question of “singularity”) is definitely a skeptical one. Generally, I agree with Scott Aaronsen that “one can divide everything that’s been said about artificial intelligence into two categories: the 70% that’s somewhere in Turing’s paper from 1950, and the 30% that’s emerged from a half-century of research since then.”, Scott Aaronsen, Quantum Computing Since Democritus (Cambridge University Press, 2013), 34. See also Maciej Cegłowski, “Superintelligence. The Idea That Eats Smart People,” Blog, Idle Words, (2016),

  6. Alexander Mordvintsev, Christopher Olah, and Tyka Mike, “Inceptionism: Going Deeper into Neural Networks,” Blog, Google Research Blog, (2015),

  7. It can never be emphasized enough how terrible Google’s track record in regard to media art really is. We should never forget, for instance the “Dev Art” controversy of 2014 and the brilliant reaction of a handful of artists to it.

  8. Andre Karpathy, “The Unreasonable Effectiveness of Recurrent Neural Networks,” Blog, Andre Karpathy Blog, (2015),

  9. Meanwhile it has been shown that it is possible to train an RNN to understand the meaning behind the code as well, not only its characteristic character distribution, esentially making the first step towards an RNN-based compiler: Wojciech Zaremba and Ilya Sutskever, “Learning to Execute,” arXiv Preprint arXiv:1410.4615, 2014,

  10. Andrej Karpathy, Justin Johnson, and Li Fei-Fei, “Visualizing and Understanding Recurrent Networks,” arXiv Preprint arXiv:1506.02078, 2015,

  11. Ian Goodfellow et al., “Generative Adversarial Nets,” in Advances in Neural Information Processing Systems, 2014, 2672–80,

  12. “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” arXiv Preprint arXiv:1511.06434, 2015,

  13. This blog post from openAI quotes Richard Feynman to drive a similar point home: “What I cannot create, I do not understand.”

  14. This of course alludes to Claude Lévi-Strauss’ discovery that even the mythological antipodes of raw and cooked translate to well-structured abstract knowledge, see Claude Lévi-Strauss, Mythologiques I: Le Cru et Le Cuit (Paris: Plon, 1964). Lisa Gitelman has recently examined the concept of “raw data” again in the light of big data, see Lisa Gitelman, Raw Data Is an Oxymoron (Cambridge, MA: MIT Press, 2013).

  15. Bruno Latour, “Circulating Reference: Sampling the Soil in the Amazon Forest,” in Pandora’s Hope: Essays on the Reality of Science Studies (Cambridge, MA: Harvard University Press, 1999).

  16. Ibid.

  17. Ibid.

  18. Friedrich A. Kittler, Die Wahrheit Der Technischen Welt. Essays Zur Genealogie Der Gegenwart, ed. Hans Ulrich Gumbrecht (Frankfurt am Main: Suhrkamp, 2013).

  19. “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.”

  20. Ibid.