Simple and secure data encryption via molecular weight distribution fingerprints

Jeroen H. Vrijsen a, Maarten Rubens ab and Tanja Junkers *ab
aOrganic and (Bio-)Polymer Chemistry, Institute for Materials Research, Hasselt University, Agoralaan D, 3590 Diepenbeek, Belgium
bPolymer Reaction Design Group, School of Chemistry, Monash University, Clayton, Victoria 3800, Australia. E-mail: tanja.junkers@monash.edu

Received 26th July 2020 , Accepted 8th September 2020

First published on 9th September 2020


A method for encryption and safe transmission of data in the shape of molecular weight distributions (MWD) is presented. Relatively simple individual distributions serve as the key to encode information by mixing those in known amounts to an overlapped MWD. Increasing molecular weights are used to denote the position of these individual sub-distributions, and the relative quantity per sub-distribution is used to store data. The concept is demonstrated by a series of messages, whereby two-letter codes can be transmitted per overall MWD. In the final step, the encryption safety is confirmed. Due to the inherent differences in SEC calibrations, samples sent from Australia to Belgium could only be deconvoluted when the recipient of the message owned the physical sub-distributions, which then act as the encryption key.


Introduction

Nature has, without a doubt, developed an admirable wealth of biopolymers for various purposes. Chemists have strived for decades to create synthetic materials to the same level of sophistication, or to even come close to them in complexity and tailored properties. Biopolymers or macromolecules take a special place in nature's library of compounds. One central role of biomacromolecules is thereby information storage. In DNA, the blueprint for life is stored in a chain of nucleotides using a 4-letter chemical alphabet. In proteins, information is encoded by amino acids, with 21 different chemical groups serving as letters. The information density that can be reached by such molecules is spectacular, in principle, allowing for the storage of petabytes of data per gram of material.1,2 Yet, synthesis of such molecules remains challenging, and the readout requires thorough sequencing, which is time and cost-intensive. Various strategies have been introduced to create simpler readouts, yet methods remain complex.3 In the past decade, several approaches have been introduced to translate the principle of molecular encoding to artificial materials.4–6 Lutz and co-workers demonstrated molecular binary codes and molecular bar codes.7,8 Meier and co-workers showed how such sequence-defined oligomers could be used as an encryption tool.9,10 Recently, Du Prez and co-workers demonstrated how the concept could be employed in conjunction with synthesis robots to produce molecular QR codes, marking a record in information density for any non-biological sequence.11,12 Synthesis methods are developing rapidly, and readout methods are catching up, making the available concepts more and more practically viable. In terms of data encoding, the use of macromolecules as encryption keys is intriguing. Macromolecules can be implemented in low concentrations in diverse media, are relatively safe, and are – compared to relatively volatile digital media – stable. They can be sent over long distances and provide a convenient way for concealed information transmission. The current limitations in sequence lengths that can be safely reached in synthetic materials point to strings of short oligomers rather than large single molecules for the use of actual data storage. As an alternative to encoding information on a chain, surfaces can be used to store data in chemical functionalities, which simplifies the readout significantly but also limits the achievable data depth.13

A different problem that we wish to address in here with sequence-defined oligomers is – despite all its advantages – that readout and direct copying are typically possible for outsiders. The readout is usually carried out via mass spectrometric sequencing, which is available to any well-equipped researcher. Even though retro-engineering is not necessarily simple, it is entirely possible when enough manpower is invested. An ideal molecular key will only be readable if extra information is available, enabling secure data exchange. Hence, knowing the structure of a polymer (or polymer distribution, see discussion below) should ideally not suffice for a readout. Independent information needs to be provided to allow decoding. On a chemical level, this means that analysis of a macromolecule would ideally only be possible if additional information were obtained independently. The transmitted message should thus consist of encoded information (safe from copying), and an independent key that is used to decipher the message. Furthermore, an ideal code will be safe from interference during transmission of the message. For a molecular/chemical compound, this means that the message should be ideally destroyed when an analysis is tried, or a sample is taken. Since most analysis techniques either require only very low amounts of material, or are non-invasive, this is not easily realized, and transmission safety can only be approximated. Yet, also in conventional data encryption, this is a common problem with quantum cryptography providing the only truly safe transmission method. All macromolecular data coding strategies so far developed rely on mass spectrometry, which only requires micrograms of material. A strategy that uses more samples would thus – counterintuitive to most chemical research – in fact, be advantageous, as it would allow the detection of the removal of a sample during transmission more effectively.

In the following, we propose an encryption method that is based on the shape of molecular weight distributions rather than sequence information. This method fulfils the above-outlined criteria. Data storage is reliable, yet readout of the information is only possible if two independent pieces of information are provided. They can be transmitted independently easily. The method relies on a comparatively simple analysis method, size exclusion chromatography, SEC. SEC requires several milligrams of sample for analysis, hence the removal of a sample during transmission – while not impossible – is relatively easy to detect. Yet, the strength of the method is undoubtedly routed in using independent transmission of information and key, and the fact that retro-engineering of the samples (and hence the embedded information) is virtually impossible. Various ideas come to mind about applying this concept to real world scenarios. Relevant examples are mixing the polymer blends with plastic formulations to validate correct mixing of the individual components or as a means to verify the authenticity of the end product. The method is, therefore, not only useable for encryption, but also for counterfeiting. As mentioned above, the readout of a molecular weight distributions shape is, in principle, simple and requires nothing else than standard SEC equipment with a concentration sensitive detector (i.e., refractive index). Duplication (or copying) of encrypted messages is practically impossible, as will be discussed below. Also, the synthesis of these molecular weight distributions is comparatively easy and does not require sophisticated lab equipment or complicated multi-step synthesis. Any polymer chemistry lab can make use of this method easily in a few non-time-consuming steps, which is in stark contrast to sequence-coding methods, which require time-consuming chemistry, automatic sequencers, and expensive readout equipment.

Results and discussion

Principle of information encoding

Molecular weight distributions (MWD) are synthetically complex and often consist of more than one sub-distribution. The deconvolution of these sub-distributions is far from trivial,14 and often sophisticated methods are used to separate chemically inhomogeneous samples from each other. Notwithstanding, the exact shape of such overlays is very precise, and despite calibration issues with standard size exclusion chromatography, very reliably measured. SEC takes a very specific point in the analysis. While the method is intrinsically very well-performing with respect to repeatability (the measurement of the same distribution on a given set of columns and pumps is very reliable), it performs poorly concerning to absolute accuracy. This means a machine can measure a particular shape (and average molecular weight) with high accuracy and repeatability on one device. Yet, another SEC apparatus, even when featuring the same overall configuration, will yield a different result due to the uniqueness of chromatographic column sets and extrinsic influences on the measurement. Even with a perfect instrument calibration differences in molecular weights between two laboratories can be as high as 20%. While this is generally a shortcoming in SEC analysis, it is of paramount importance for our coding method. Coming from our experience in additive creation of multimodal MWDs,15,16 we realized that with knowledge of underlying distributions, complex shapes could be deconvoluted to very high precision. Using any type of polymerization that yields reasonably narrow molecular weight distributions, a series of standards can be synthesized, which are then overlapped by mixing in precise quantities. When these overlaps are measured, it is practically impossible to deconvolute the distributions with precision if the single underlying distributions are not known. This holds particularly true if the standards used are not entirely ideal in their shape and feature various dispersities or modalities even. Again, this is counterintuitive to most synthetic polymer work, where most researchers typically strive for as well as possible defined materials. An infinite number of possible combinations of sub/distributions could exist for a given overall shape. Due to the high repeatability, but low reproducibility when changing SEC equipment, single distributions and overlays must thus be measured on the same system to allow for an accurate deconvolution of a molecule weight distribution shape into its single sub distributions.17 Hence, to read an encoded overlapped distribution correctly, one also needs to own the individual sub-distributions that were used to construct the message. To summarize, from an encryption point of view, the overlapped distributions represent the encrypted message, while the samples for the individual distributions represent the encryption key (see Fig. 1).
image file: d0py01071e-f1.tif
Fig. 1 Schematic representation of the proposed encryption concept.

To deconvolute the overlapped distributions, the individual distributions must be relatively narrow and spaced sufficiently in average molecular weight. Since molecular weights are measured on a logarithmic scale in SEC, we chose to increase the gap between distributions successively towards higher distributions. As a demonstrator, we chose polystyrene (PS) samples with a degree of polymerization (DP) of 40, 60, 100, 150, and 225. While a gap in molecular weight is required, one also does not want to space distributions too widely, as this would allow deconvoluting single distributions directly from the overlaps without the knowledge of the exact encryption key. Each individual sub-distribution is approximated by a combination of five Gaussian distributions. This accounts for most asymmetries in experimental SEC traces. Next, these series of Gaussians are then fitted (with a fixed increment per five distributions) to the coded molecular weight distribution. With the result of this first fit as an initial guess, the experimental distributions are then fitted directly to the overlay in a least-square fit procedure using the mean square error (MSE) as cost function. After extensive optimization and testing, we found this approach to be most viable and the least time-intensive in computation time (a typical deconvolution by this method takes about 30 s on an average computer) yet achieving the required accuracy. The code for this procedure has been made available online free-of-charge (see the ESI for details).

Achievable data depth

Mass fraction increments are associated with a certain error in analysis, and thus increment intervals are defined. With x different individual DP samples and y distinguishable mass concentration increments, yx possible combinations exist. Thus, for 4 different polymers being mixed in 6 concentration increments, already close to 1300 possible combinations exist. In the 4-letter DNA alphabet, this would be equivalent to a sequence of 5–6 nucleotides in a chain. In digital quantities, this represents 10 bits. Current cryptography keys comprise typically 192 bits, which could either be generated in molecular weight distributions by increasing the number of overlapping distributions, or by providing a series of samples that each represent a different part of the key. Splitting information into different fractions that can be analysed independently is a common approach, as was also used by Du Prez and co-workers. In that sense, also longer codes are achievable and allow for embedding more than 192 bits of information. The complex distributions discussed herein can be seen as letters in an alphabet that allow for any composition of words or sentences. The only limiting factor for reaching sufficient data depth is that the used polymers must be well defined and well-spaced molecular weight-wise, and that the readout of data is reliable. Furthermore, one sub-distribution needs to be used as an internal mass standard. Hence when mixing five distributions, only 4 of those can be used for encryption.

Here, we use a direct look-up table to translate bits into letters, not unlike the original 1963 ASCII code, to present the concept of our method. Yet, it is possible to achieve a greater variety by following a Unicode coding system, as in the UTF-32 code that gives access to a much broader range of letters without the need to encode more information. Based on a Unicode system, up to 2400, possible characters can be stored in one mix of 4 sub-distributions (Fig. 2).


image file: d0py01071e-f2.tif
Fig. 2 Coding table used to translate the relative six increments of two polymer sub distributions into a letter (the outer ring denotes the encoded letter).

The code trace would here be composed of 4 sub-distributions (aside from the reference distribution) of which each sub-distribution has a value between 0 and 6 (i.e., a positional system with base 7). The relative position of each distribution (i.e., the index) is tied to a value (70 for the first, 71 for the second, 72 for the third, and 73 for the fourth) to which the mass increment value is multiplied. The sum of all values is then the final stored decimal value. This decimal value per code trace can then be used in a look-up table to identify the final character(s) that is/are stored. Take, for example, the above code trace in Fig. 3 in which the character set “PR” was stored. The mass increment values were “3, 4, 3, and 6” for the respective sub-distributions. In the positional system with base seven, the stored decimal value would be read as follows:

3·70 + 4·71 + 3·72 + 6·73 = 3·1 + 4·7 + 3·49 + 6·343 = 2236


image file: d0py01071e-f3.tif
Fig. 3 Overall distributions and deconvolution of the sub-distributions resulting in the letter sequence ‘PRD!’ following the coding table given in Fig. 2.

In the more extended UTF-32 Unicode text system, this decimal value would translate to an Arabic letter. The usefulness of this letter for encryption purposes can be argued. However, the base system substantially extends the number of characters to be encoded. This possibility of choosing different systems and look-up tables allows us to increase data transmission security further.

The sender and receiver can still agree on their look-up table if desired. Alternatively, the code trace could be composed of 2 by 2 sub-distributions using the positional system in which two different decimal values (from 0 to 48) could be stored.

Theoretical validation of the method

The limits of the decryption concept using the custom-made deconvolution algorithm must be identified. Therefore we first calculated theoretical overlaps of distributions based on the framework presented earlier.15 We assumed overlaps of polymer distributions with intermediate differences in the number-average degree of polymerization (DP) of 10, 15, and 20. All components were simulated to be mixed with the same mass fraction. These mixtures were then computationally deconvoluted to their components and mass fractions. This test showed that even at lower DP, a difference of at least 20 repeat units between the DP of separate polymer distributions should be maintained to ensure accurate deconvolution of the SEC trace of the mixture (see the ESI). While we did not increase the molecular weight beyond DP225 for practical reasons, more distributions could be overlapped with larger spacing. Yet, we refrained from doing so as creating multiple overlaps (and in this way increase the amount of stored information) appeared more convenient than adding more distributions (complicating per synthesis and deconvolution time distribution). Another restriction is the maximum amount of mass concentration increments that can be distinguished. To estimate this, the difference between the initial theoretical mass fraction of the individual components and the deconvoluted solution was determined. For the simulations in which a DPtarget difference is 20, this error was around 2% for the initial guess and further reduced to 0% after fine tuning. The algorithm can thus successfully deconvolute a complicated SEC trace back to its original components if the initial traces are provided. Regardless, thorough testing of the algorithm on experimental MWDs revealed that deconvolution is relatively accurate, assuming six increment levels. More increments yielded readout errors occasionally, which we wanted to avoid. Generally, weighing samples for mixing must be done with the highest precision, yet the experimental error in fitting appears to be higher than the error made while weighing.

Experimental validation

For the sub-distributions, poly(styrene) samples were made by RAFT polymerization (see the ESI for experimental details). PS is commonly polymerized via both anionic and (controlled) radical polymerization. Anionic polymerization gives narrower MWDs (Đ < 1.10), which would increase the accuracy of the deconvolution. But, to showcase the wide-spread applicability of our approach – and its robustness – we decided to use the less ideal RAFT approach. The thermal RAFT polymerization of styrene results in more complex MWDs (Đ > 1.20) due to termination and transfer events. As mentioned above, each individual distribution is modelled by a blend of five Gaussian distributions to describe the tailing of the complex MWDs adequately. In addition, PS was picked as a polymer system because of its ease of synthesis, longevity, and chemical inertness, which allows the data to be stored quickly and safely.

To facilitate high accuracy in mixing, a stock solution for each PS sample is made by weighing the polymer (typically 30–40 mg) and dissolving it in a known quantity of THF (5 mL, 4.45 g). The mass concentration of the stock solution is used to add the correct amount of polymer to each code mass-wise. Starting from the stock solutions, each code is by this method generated in a few minutes. This is significantly faster than comparative methods involving chemical synthesis whereby the data encoding can take days even under automated conditions.

Within a polymer distribution comprising 5 individual sub-distributions, one character is thus encoded per two distinct polymers. Six discrete mass levels result in a space to encode 36 different characters. We decided to encode the Latin alphabet (26 characters) and some commonly used punctuation marks (see Fig. 2). The discrete mass levels are defined against a standard, which is the first polymer in the distribution (PS, DP40). The standard is always set as 100%. The absolute amount of material can be set arbitrary but needs to consider the concentration ranges in which the SEC operates (ideally the transmitted code amounts to exactly what is needed for a standard SEC analysis, so that if analysis is attempted during sending, this would be directly detected). All six levels are mixed and measured against this standard. This allows encoding two letters per overall distribution.

For instance, concerning the writing and reading of the character ‘P’, two polymers are required to represent both the levels 3 and 4 (see Fig. 2), and one additional polymer as the standard. In practical terms, 100% (5.0 mg) of the first polymer is used as the standard. Next, the second polymer is mixed as 105% (level 3, range 90–120%, 5.25 mg) and the third polymer as 135% (level 4, range 120–150%, 6.75 mg, see the ESI for further details). Measuring this mixture would then deconvolute the original components back to concentration ranges of levels 3 and 4 to read out the encoded data as the character ‘P’. Note that the key used allows us to encode more letters than the alphabet counts, hence we added some interpunctuation and the classical descriptors for molecular weight distributions into the encryption scheme.

As a first proof-of-principle, the sequence ‘PRD!’ was encoded by two MWDs, each consisting out of five PS polymers (Fig. 3). The first MWD contains the letters ‘PR’, and the second the characters ‘D!’. Afterward, both codes and the original polymer samples were measured to read out the message successfully. Each deconvolution found the original polymers well within their correct relative ranges. Fig. 3 shows the measured encoded MWDs and their deconvolution into the respective sub-distributions (see the ESI for details).

To apply the concept in further tests, we also encoded longer character sequences. To capture the spirit of the method, we opted to translate the name ‘TURING’ and ‘STAUDINGER’ respectively, one of the key figures of code deciphering, and the pioneer of polymer chemistry for the 100-year anniversary of his historic paper on polymers.18,19 As seen in Fig. 4 and Fig. S7, in each case, the names were read back successfully by the deconvolution algorithm. Again, each analysis found the original polymers well within their correct relative mass ranges, extending the method to longer sequences and even full sentences.


image file: d0py01071e-f4.tif
Fig. 4 Overall distributions and deconvolution of the sub-distributions resulting in the letter sequence ‘STAUDINGER’ following the coding table given in Fig. 2.

Communication across borders

In the final step, we tested if codes can indeed be sent over distances. This is not a trivial task. Polymers in solution tend to change over time, and hence samples need to be dried before sending and redissolved and filtered upon arrival. Furthermore, degradation could potentially occur during sending. For that aim, we created a code based on the same polymers as described above and sent the code and the individual distributions from Melbourne, Australia, to Diepenbeek, Belgium. This permitted testing (i) if the pure possession of the overlapped distributions allows for deconvolution or if the individual distribution samples are also required as key to reading the code. (ii) Also, the reliability of the method could be assessed in a real-life scenario to probe if deconvolutions would also be achievable on a very different SEC machine after drying, shipping over intercontinental distances, and dissolving again. Fig. 5 shows a comparison of the sent encoded distributions between the Australian and Belgian laboratories. As is directly evident, quite different distributions are obtained as both SEC were supplied by different manufacturers, and different column sets were used in the analysis. The deconvolution of the encoded distribution in Belgium with the single sub-distributions measured in Australia fails entirely as expected. Likewise, an a-priori deconvolution that allows the algorithm to fit random distributions must fail.
image file: d0py01071e-f5.tif
Fig. 5 Overall distributions resulting in the letter sequence ‘WORLD!’, measured on two separate SEC systems in Australia and Belgium.

However, when the encryption key sub-distributions were also measured on the received SEC in Belgium, deconvolution was immediately successful. Three distributions were readout, yielding back the letter sequence “WORLD!”. Thus, both aspects, transmission of the message without loss of information and the safety of the procedure, are in this experiment confirmed. Knowledge (or in this case, possession) of the key is essential to decoding information. Too many variations are possible to create the overlapped distributions if the original distributions are not known in detail. Even possession of SEC chromatograms of the sub-distributions alone is not sufficient, as they need to be measured on the same chromatography system.

As shown in Fig. 5, the shape of the overall distributions can be different depending on the chromatography system used. It should be noted that the Belgian system consisted of three analytical high-resolution columns, and still provided a comparatively good resolution (however somewhat lower than the original SEC). A hypothetical SEC system with very high accuracy could in principle allow for deconvolution of the message without knowledge of the key. Yet, even if such an instrument was available, this could be easily counteracted by narrowing the space between two adjacent distributions (which is limited exactly by the resolution of the employed SEC). One may argue that knowledge of Mn and the synthesis method of the sub-distribution suffice to re-create the encryption key. Yet, this is not readily achievable due to the typical batch-to-batch variation of the underlying RAFT process. Even variation of a few percent's in average molecular weight or dispersity will make the deconvolution inaccurate (furthermore, the synthesis is not necessarily known to an intermittent person). While it is, of course, in principle possible for an intruder to steal samples of both key and message during the sending of samples, it would be quite easy for the recipient to see if the message transmission was compromised since the intruder would need to take a considerable mass of the sample to carry out their analysis. Analytical balances up to μg precision are available and should be used to verify the mass of the sample before sending and after receiving. Additionally, one should use tamper proof containers for transport of the messages which adds another layer of protection. SEC does, in principle, allow us to recover the analysed material, but in our experience, repeated dissolution of samples and drying tends to destroy the message via degradation. Moreover, it is, in principle, possible to design polymers that would degrade faster in solution, and hence wholly make readout during transmission impossible without the destruction of the message. Regardless, knowledge of the mass of the samples is sufficient to know if a middle person has taken a sample. A direct copy of the samples is not achievable, as this again requires knowledge of the synthesis procedure of the individual sub-distributions, and additionally the possibility to precisely re-engineer the polymer, which is close to impossible with said batch-to-batch variation that prevents exact reproduction even if the required synthesis method is known. Our presented encryption process is successful in the proof of concept stage, nevertheless there are current limitations such as the relatively long measurement/reading process as well as the limited amount of data to be stored per polymer sample. Measurement of the samples can also be performed on rapid SEC instruments reducing the analysis time up to even 10 minutes. Future work is underway in our labs focussing on immediate improvements by fine tuning our read-out algorithm as well as including additional detectors (i.e. UV detector) which should effectively double the data depth (i.e. four letters instead of two) and halve the number of samples needed to store a specific message.

Conclusion

A new method for encryption and safe transmission of data in the shape of MWDs is presented. It should be noted that the encryption system, as demonstrated herein, only provides a proof-of-concept. Of course, more complex polymers can be used in the next step as the encryption key, then allowing for the use of more than five individual sub-distributions. Furthermore, even with the method used, the available number of letters can be significantly increased by using Unicode-based look-up tables. Again, this allows us to improve the length of messages to be transmitted per sample. It should, however, be clearly noted that this study aims at introducing the concept of data transmission, not at providing the longest message possible.

Next to the refinement of the coding method used, variation of the type of polymerization employed in making the sub-distribution as well as changing the polymer type would add a second dimension to the method and mainly increase data density. Also, improvements in the deconvolution code or introduction of an error correction algorithm would allow increasing the information density significantly. Yet, even with the current data depth and simple coding, it is possible to already transmit fairly complex information in a series of distributions, which are all simple to make, fast to process, and technology-wise accessible to any lab in the world having access to basic polymer characterization equipment. Next to encryption, the method can also be used for counterfeiting. Codes can be mixed with any polymer and hence leave a unique fingerprint that is almost impossible to reproduce accurately without additional information available that only the person possesses who created the encryption key. Counterfeiting is even more secure than data transmission, as in this case the key can always be kept fully confidential and does not need to be transmitted. Data density is likely to further improve with further optimization of the concept, yet the power of the method lies in its relative simplicity with simultaneous high safety. The individual polymers are made effortlessly on the scale, and neither mixing nor readout of the messages requires any high-level know-how or technologies. This method of encoding-encryption can be used by any chemist almost immediately.

Methods

Poly(styrene) polymerization

In a typical polymerization, styrene monomer (4 M, eq. degree of polymerization) is weighed in a septum closed 25 mL round bottom flask together with the right equivalent of RAFT agent (DoPAT, 1 eq.) and the thermal initiator AIBN (0.1 eq.). Afterward, the solvent, toluene, is added to obtain 15 mL as the final volume. Next, the solution is flushed with argon (Ar) stream for 15 minutes to remove all residual oxygen. Finally, the flask is transferred to a 100 °C oil bath for 6 hours. Afterward, a pure poly(styrene) polymer is obtained by precipitation in ice-cold methanol.

Size-exclusion chromatography

Analysis of the molar mass (distributions) of the oligomer samples at Monash University in Melbourne, Australia was performed on a PSS SECcurity2 GPC system operated by PSS WinGPC software, equipped with an SDV 5.0 μm guard column (50 × 8 mm), followed by three SDV analytical 5.0 μm columns with varying porosity (1000 Å, 100[thin space (1/6-em)]000 Å, and 1[thin space (1/6-em)]000[thin space (1/6-em)]000 Å) (50 × 8 mm) coupled to a differential refractive index (RI) detector and viscosity detector DVD1260 using THF as the eluent at 40 °C with a flow rate of 1 mL min−1 delivered by an isocratic pump. The SEC system was calibrated using narrow linear polystyrene standards ranging from 474–7.5 × 106 g mol−1.

For analysis of the molar mass distributions at Hasselt University in Diepenbeek Belgium a Tosoh EcoSEC HLC-8320GPC was employed consisting of an autosampler, PSS guard column SDV (50 × 7.5 mm) and three PSS SDV analytical, linear XL columns (5 μm, 300 × 7.5 mm) and a differential refractive index detector (Tosoh EcoSEC RI). The column temperature was maintained at a steady 40 °C, and a flow rate of 1 mL min−1 of high-performance liquid chromatography (HPLC) grade tetrahydrofuran (THF) was used as the eluent with toluene as a flow marker. Calibration was performed using narrow linear polystyrene (PS) standards from PSS Laboratories in the range of 470–7.5 × 106 g mol−1.

Encryption principle

Variations of polymer mass relative to a standard in a mixture of polymers with varying number average degree of polymerization (DP) are used to code information in the current work. Six mass levels were chosen, and a combination of two mass levels (two polymers) translates into one letter, as demonstrated in Fig. S2A. A combination of level 3 and level 4 translates for the letter “P”, while levels 3 and 6 are used to code the letter “R”. These mass levels are defined relative to a polymer standard that is arbitrarily set at 100% as a reference, as shown in Fig. S2B. For example level 1 is related to a mass fraction of 45% relative to a standard (e.g., 5 mg of polymer used as a standard mixed with 2.25 mg of polymer used to code a mass level), and level 2 is set at 75% (5 mg of standard mixed with 3.75 mg of polymer to code a mass level). Every level is distinguished by an acceptable concentration range of 30%. To code two letters in one molecular weight distribution, we need five polymers – one polymer used as a standard and two times two polymers to define a letter each. In Fig. S2C the theoretical mass increment targets are demonstrated. In practical terms, 100% (5.0 mg) of the first polymer is used as the standard. Next, the second polymer is mixed as 105% (level 3, range 90–120%, 5.25 mg), the third polymer as 135% (level 4, range 120–150%, 6.75 mg), the fourth polymer as 105% (level 3, range 90–120%, 5.25 mg) and the fifth as 195% (level 6, range 180%–210%, 9.75 mg). Fig. S2D shows the theoretical mixture of these polymers.

Decryption principle

A refractive index detector (as used in the current work) is a concentration sensitive detector often used in SEC. The concentration of a particular (polymer) material is directly proportional to peak integration. When the refractive index increment (dn/dc) of the material is also known, one can even calculate the concentration of the material in the measured solution. For this reason, we are also able to deconvolute a complicated SEC trace back to its original sub-distributions and derive their relative quantities within the mixture.

To achieve this a custom python script was developed, which is available free of charge as part of the ESI. For the deconvolution, a number distribution overlay of the individual composing traces with the mixture trace is loaded into the program. The individual composing traces are randomly fitted against 5 Gaussians per trace using the “Lmfit” package for Python using the Nelder-Mead fitting method. By using five gaussian fits, we can compensate for SEC effects or the unusual shape of the composing trace. These fits of the sub-distributions are then used to fit the code trace with a similar fitting method by only allowing the amplitude of the separate composing traces to be varied. With these best fits, the ratios of all contributing traces are calculated (by integration). This ratio is used as an initial guess, which is used to fine-tune the ratios within allowed boundaries, further using the experimental contributing SEC traces instead of the fits. This step increases the accuracy of the deconvolution since any additional SEC inaccuracy is included in the deconvolution. For each possible combination, the root means square error (RMSE) is calculated between the experimental code trace and the calculated combination. The variation with the lowest RMSE (usually between 0.5 and 2%) is used to derive the coded information by again calculating the ratio of each peak integration to the integration of the standard peak (the first peak). From these ratios, the mass level of each contributing distribution is determined using Table S2B. Next, these levels are to decode the letters present in the mixture.

Author contributions

M. R. and J. H. V. contributed equally to the full extent of this work, ranging from the experimental setups, data acquisition, and interpretation, analysis as well as validation, to the writing of the manuscript. T. J. was involved in the discussions, responsible for funding acquisition and project administration.

Conflicts of interest

There are no conflicts to declare.

Acknowledgements

The authors are grateful to the Fonds Wetenschappelijk Onderzoek (FWO) for providing scholarships for MR & JHV. Funding from Monash University is also kindly acknowledged.

Notes and references

  1. G. M. Church, Y. Gao and S. Kosuri, Science, 2012, 337, 1628–1628 Search PubMed.
  2. N. Goldman, P. Bertone, S. Chen, C. Dessimoz, E. M. LeProust, B. Sipos and E. Birney, Nature, 2013, 494, 77–80 Search PubMed.
  3. K. W. Kim, V. Bocharova, J. Halámek, M. K. Oh and E. Katz, Biotechnol. Bioeng., 2011, 108, 1100–1107 Search PubMed.
  4. J. F. Lutz, J. M. Lehn, E. Meijer and K. Matyjaszewski, Nat. Rev. Mater., 2016, 1, 1–14 Search PubMed.
  5. S. C. Solleder, R. V. Schneider, K. S. Wetzel, A. C. Boukis and M. A. Meier, Macromol. Rapid Commun., 2017, 38, 1600711 Search PubMed.
  6. J. F. Lutz, Rapid Commun., 2017, 38, 1700582 Search PubMed.
  7. R. K. Roy, A. Meszynska, C. Laure, L. Charles, C. Verchin and J. F. Lutz, Nat. Commun., 2015, 6, 1–8 Search PubMed.
  8. C. Laure, D. Karamessini, O. Milenkovic, L. Charles and J. F. Lutz, Angew. Chem., Int. Ed., 2016, 55, 10722–10725 Search PubMed.
  9. S. C. Solleder, D. Zengel, K. Wetzel and M. A. Meier, Angew. Chem., Int. Ed., 2016, 55, 1204–1207 Search PubMed.
  10. A. C. Boukis, K. Reiter, M. Frölich, D. Hofheinz and M. A. R. Meier, Nat. Commun., 2018, 9, 1439 Search PubMed.
  11. S. Martens, J. Van den Begin, A. Madder, F. E. Du Prez and P. Espeel, J. Am. Chem. Soc., 2016, 138, 14182–14185 Search PubMed.
  12. S. Martens, A. Landuyt, P. Espeel, B. Devreese, P. Dawyndt and F. E. Du Prez, Nat. Commun., 2018, 9, 1–8 Search PubMed.
  13. I. B. Burgess, L. Mishchenko, B. D. Hatton, M. Kolle, M. Lončar and J. Aizenberg, J. Am. Chem. Soc., 2011, 133, 12430–12432 Search PubMed.
  14. T. Van Hoeylandt, K. Chen, F. Du Prez and F. Lynen, J. Chromatogr. A, 2014, 1342, 63–69 Search PubMed.
  15. M. Rubens and T. Junkers, Polym. Chem., 2019, 10, 5721–5725 Search PubMed.
  16. M. Rubens and T. Junkers, Polym. Chem., 2019, 10, 6315–6323 Search PubMed.
  17. R. J. Bruessau, Macromol. Symp., 1996, 110, 15–32 Search PubMed.
  18. H. Staudinger, Chem. Ber., 1920, 53, 1073–1085 Search PubMed.
  19. H. Frey and T. Johann, Polym. Chem., 2020, 11, 8–14 Search PubMed.

Footnotes

Electronic supplementary information (ESI) available: Detailed materials and methods as well as supporting results. The python script used for deconvolution of SEC traces is available at http://www.polymatter.net under a Creative Commons license. See DOI: 10.1039/d0py01071e
These authors contributed equally to this work.

This journal is © The Royal Society of Chemistry 2020