Introduction

Whether you heard of them or not, Scalable Vector Graphics (or SVGs) are an essential component of modern UIs everywhere. In short, SVGs are an image format for vector graphics, as the name would suggest, which means they can be scaled indefinitely, making them perfect for icons and images that need to be versatile and look good on all sorts of screen sizes.

I have been working a lot with SVGs lately and have had to think about the ways you can work with them or use them, so this post will be dedicated to that. Not so much as to how they work, but to how to work with them and some of the interesting things that could be done with them.

I’ll start with a short introduction of SVGs. If you’re familiar with SVGs and what they’re please feel free to skip using the table of contents.

A Summary of SVGs

Basic Concept

For SVGs to achieve the desired scalability they need to be fundamentally different than raster (pixel) images. Due to raster images being a set of fixed numerical values (pixels), which works pretty well for most applications, this nature denies one the ability to extrapolate values for new pixels in case of enlarging the shape. That’s why SVGs use a form of data representation that’s infinitely extrapolatable, namely functions.

For example, one of the main functions SVGs use is the LineTo function, which takes a start point and an end point and draws a line between them. If you remember basic school geometry you would know that from two points a function for drawing a line can be formed, namely $y = mx + b$, and then arbitrary new points on this line can be sampled.

Representation

Now we can take a look at how SVGs are actually defined. So far we only talked about the mathematical background of SVGs, alas they’re rather of a semantical nature, more specifically XML-Markup based. Very similar to HTML if you’re more familiar with that.

<svg xmlns="http://www.w3.org/2000/svg">
  <path
    fill="none"
    stroke="red"
    d="M 10,30
       A 20,20 0,0,1 50,30
       A 20,20 0,0,1 90,30
       Q 90,60 50,90
       Q 10,60 10,30 z
       " />
</svg>

As you can see, SVGs consist of tags and attributes. The outer <svg> tag denotes the content as an SVG, while the <path> tag defines the actual look. So an SVG file can be thought of as instructions on how to draw the desired shape, a blueprint to be interpreted by an SVG rendering engine. <path> tells the rendering engine what to draw, with most importantly the attribute ’d’ containing the draw commands. You can think of the commands as something similar to the functions I described earlier, while the numbers in front of each command are coordinate parameters $x,y$.

So what is the result of the above code? Take a look:

For a list of all the commands and what they do look here. The same Mozilla docs probably contain all that you want to know about the properties of SVGs, but for the purposes of what I’m going to show next you don’t need more than understanding what I already explained.

Operations on SVGs

Embeddings

Probably the most common use case for SVGs is in user interfaces, specifically on the web. A lot goes into that, for example for an excellent journey into their use in the web for displaying icons look at this post. But I didn’t write this post to just tell you that, that’d be boring. I’m here for the more interesting things we can do with SVGs.

As you may know, image processing and manipulation are basically all mathematical operations on the image pixel values. We might want to use such well-developed operations on our SVGs to get more out of the SVG format, but that presents us with the first challenge already: SVGs consist of code and not numbers. That means we need a way of translating from the SVG space to a numerical space, to a form we can apply mathematical operations on, and then revert this translation, preferably without informational loss. Let’s call this translation an svg embedding.

This is more tricky than you might think, though. First of all, we need a way to accurately differentiate between the discrete SVG tags. Then, a way to accurately describe the attributes associated with each tag, which are a lot of the time discrete classes. It goes without saying that the svg embedding also needs to be bijective.

Let’s explore a few ideas.

Redrawing

One of my first thoughts when considering this problem was maybe redrawing the same shape of the SVG using mathematical equations, and I remembered this video on drawing arbitrary shapes using Fourier series. Sure enough, after some googling I found this interesting implementation of exactly what I was thinking. However, while this way of embedding is definitely impressive and the math involved is quite interesting, it has a lot of shortcomings that disqualify it.

First of all, using functions means getting a continuous drawing, unlike a lot of SVGs which consist of separate disconnected parts. Moreover, using Fourier series relies on sampling random points along the SVG paths, akin to analog-to-digital audio conversion, which means, just like with digital audio, that this is a lossy conversion. It can be argued that for large enough sampling rates the resulting transformation won’t have noticeable differences from the original, but the lossy nature of the transformation can’t be denied.

Leaving Fourier series aside as a backup we should keep looking to see if we can do better.

Autoencoders

Autoencoders are some of the most used machine learning models architectures out there, and you can find a thorough explanation of them with a quick Google search. In short, they learn to create a reversible, unique and latent representation of a given data set, a representation that’s more beneficial in a certain way (e.g. a compressed representation) and are mostly used with images.

Source. A simplified diagram of how an autoencoder looks like. The compressed data is in the latent space we’re looking for.

Again, SVGs aren’t images per se, but the latent representation of an autoencoder is a numerical one, which is exactly what we’re looking for. So, if we can find a way to crate an autoencoder model that takes an SVG and returns a latent representation, we can then do whatever operations we fancy on this representation, revert it back to an SVG using our autoencoder, and hopefully get an SVG with the desired modifications. And again, this is not something we need to do ourselves, because someone already did it.

The implementation I’m alluding to here is DeepSVG which is quite fascinating. There’s a lot going on in DeepSVG but I’ll focus on the relevant parts for now. First of all, we need a unified and simplified representation of SVGs that we can feed to our autoencoder, since SVGs can have a wide variety of commands and tags. Quoting from the DeepSVG paper:

… this does not significantly reduce the expressivity [of SVGs] since other basic shapes can be converted into a sequence of Bézier curves and lines. We consider a Vector graphics image $V = \{P_1, . . . , P_{N_P} \}$ to be a set of $N_P$ paths $P_i$. Each path is itself defined as a triplet $P_i =(S_i , f_i, v_i)$, where $v_i ∈ \{0, 1\}$ indicates the visibility of the path and $f_i ∈ \{0, 1, 2\}$ determines the fill property. Each $S_i = (C^1_i, . . . , C^{N_C}_i)$ contains a sequence of $N_C$ commands $C^j_i$. The command $C^j_i = (c^j_i, X^j_i)$ itself is defined by its type $c^j_i ∈ \{\text{\textless}SOS\text{\textgreater}, m, l, c, z, \text{\textless}EOS\text{\textgreater} \}$ and arguments.

This however will still be a discrete representation, we need a numerical embedding the autoencoder can work with. The writers of the DeepSVG paper do this using an embedding for each command $C^j_i$ as the vector $e^j_i = e^j_{cmd,i} + e^j_{coord,i} + e^j_{ind,i}$, which is the sum of three separate embeddings: the command embedding $e^j_{cmd,i}$, the coordinates embedding $e^j_{coord,i}$ and the index embedding $e^j_{ind,i}$. The detailed calculations for each embedding and how they are merged into a single vector is a little beyond the scope of this post and can be found in the DeepSVG paper P.5 under “Embedding”.

Now, the result of all of that is our desired latent vector of 256 dimensions that we can modify at well. For example, adding and subtracting values to and from this vector has all sorts of effects, take a look:

Source. Those are all GIFs of different versions of the same SVGs.

As for my own experiments with this model I feel like the encoding is a bit rough, and maybe the model can be improved. Either way it’s enough for doing a lot of cool things with SVGs. One of the things I tried was adding random noise to SVGs which is quite easy using the DeepSVG Library. Let’s use this Euro SVG as an example and take its simplified representation on the coordinate system as a start point.

And do the magic:

The dots in the left bottom corner are unused -1 values, don’t worry about them.

It looks basically like random motion in fluid particles as you heat them up, which is actually the inspiration for the first diffusion models if I’m not mistaken, but that’s a story for another time. This opens up the possibility of real SVG image editing, akin to raster images, and creating SVG animations, and all sorts of cool stuff! I’ll close this lengthy post with a few more examples.


See you later and I hope you learned something new.