There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
The pursuit of atomic precision structure of porous covalent organic frameworks (COFs)
is the key to understanding the relationship between structures and properties, and
further developing new materials with superior performance. Yet, a challenge of how
to determine their atomic structures has always existed since the first COFs reported
seventeen years ago. Here, we present a universal method for ab initio structure determination
of polycrystalline three-dimensional (3D) COFs at atomic level using enhanced cryo-continuous
rotation electron diffraction (cryo-cRED), which combines hierarchical cluster analysis
with cryo-EM technique. The high-quality datasets possess not only up to 0.79-angstrom
resolution but more than 90% completeness, leading to unambiguous solution and precise
refinement with anisotropic temperature factors. With such a powerful method, the
dynamic structures with flexible linkers, degree of interpenetration, position of
functional groups, and arrangement of ordered guest molecules are successfully revealed
with atomic precision in five 3D COFs, which are almost impossible to be obtained
without atomic resolution structure solution. This study demonstrates a practicable
strategy for determining the structures of polycrystalline COFs and other beam-sensitive
materials and to help in the future discovery of novel materials on the other.
Abstract
Structure determination of covalent organic frameworks (COFs) is the key to pushing
the development of COF-based materials further but precise determination of the structure
of COFs is challenging. Here, the authors develop a universal ab initio structure
determination method for polycrystalline 3D COFs using cryo-cRED by combining hierarchical
cluster analysis with cryo-EM technique and demonstrate COF structures with atomic
preciscion and up to 0.79-angstrom resolution.
1. Introduction Although crystal structure determination by means of X-ray diffraction has had a major scientific impact for the last 100 years, it still requires the solution of the crystallographic phase problem. This problem arises because although methods for measuring the intensities of the diffracted X-rays have made considerable progress during that time, the direct experimental measurement of their relative phases is still only rarely practicable. Small-molecule crystal structures are usually solved by the use of probability relationships involving the phases of the stronger reflections, the so-called direct methods (Sheldrick et al., 2001 ▶; Giacovazzo, 2014 ▶) or more recently by the iterative use of Fourier transforms, e.g. dual-space methods such as charge flipping (Oszlányi & Sütő, 2004 ▶; Palatinus, 2013 ▶), in which the phases are constrained by the observed reflection intensities in reciprocal space and by the properties of the electron density in real space. Before the phase problem can be solved, the usual procedure is to determine the space group of the crystal with the help of the Laue symmetry of the diffraction pattern, the presence or absence of certain reflections (the systematic absences) and statistical tests (e.g. to distinguish between centrosymmetric and non-centrosymmetric structures). This space-group determination may be upset by the presence of dominant heavy atoms or by pseudo-symmetry affecting the intensities of certain classes of reflections, and in some cases the space group is ambiguous. For example, the space groups I222 and I212121 have the same systematic absences, as do Pmmn and two different orientations of Pmn21. Many dual-space methods perform at least as well when the data are first expanded to the nominal space group P1 (Sheldrick & Gould, 1995 ▶). In this paper ‘P1’ will be used to cover the centred triclinic non-centrosymmetric space-group settings such as C1 as well; the data do not need to be re-indexed for the primitive cell. After solving the phase problem in P1, the space group can be determined using the P1 phases (Burla et al., 2000 ▶; Palatinus & van der Lee, 2008 ▶) and this turns out to be a very robust general approach. SHELXT also employs this strategy. The systematic absences are not then used for the space-group determination, but all the weak reflections are still useful for identifying the best solution. Fig. 1 ▶ summarizes the course of structure determination using SHELXT. The individual stages will now be discussed in detail. The current version of SHELXT is intended for single-crystal X-ray data and is not suitable for neutron diffraction data. 2. Solving the phase problem for data expanded to space group P1 SHELXT reads standard SHELX format and files. It extracts the unit cell, Laue group (but not space group) and the elements that are expected to be present (but not how many atoms of each). A number of options, e.g. that all trigonal and hexagonal Laue groups should be considered ( ), may be specified by command-line switches. A summary of the possible options is output when no filename is given on the SHELXT command line and further details are available on the SHELX home page. The data are first merged according to the specified Laue group and then expanded to P1. In theory, SHELXT could also have been programmed to determine the Laue group, e.g. by calculating the R values or correlation coefficients when the equivalent reflections are merged. However, the Laue group has to be known to scale the data, which is an essential step for the highly focused beams now common for synchrotrons and laboratory microsources, because the effective volume of the crystal irradiated is different for different reflections and needs to be corrected for. So in practice it is best to determine the Laue group first anyway. Even though programs such as XPREP (Bruker AXS, Madison, WI 53711, USA) are no longer required to determine the space group, it is still necessary to identify the correct unit cell and metric symmetry. 2.1. Dual-space iteration starting from a Patterson superposition The P1 dual-space recycling in SHELXT may start with random phases, but the default option of starting from a Patterson superposition minimum function (Buerger, 1959 ▶; Sheldrick, 1997 ▶) is usually more effective. Two copies of the sharpened Patterson function, displaced from each other by a strong Patterson vector, are superimposed and the minimum value of the two is calculated at each grid point. The resulting map is used as the initial electron density for the dual-space recycling. In an ideal case it is a double image of the structure consisting of 2N peaks, where N is the number of unique atoms, but the space-group symmetry has been lost. Since the dual-space recycling is being performed in P1 anyway, this is a good start and 2N is a significant reduction from the N 2 peaks in the original Patterson. The subsequent dual-space recycling is performed using the modified structure factors where E is the normalized structure factor, and a new density map is calculated by a hybrid difference Fourier synthesis with phases and coefficients where and G c are obtained by Fourier transformation of the current map. The default values for m and q are 3 and 0.5, respectively, but may be changed by the user. Based on experience with other structure-solution programs, q should probably be larger for large equal-atom structures and smaller for structures involving heavy atoms (to reduce Fourier ripples), but in practice it is rarely necessary to change the default values. SHELXT adds unmeasured data above and below the resolution limit of the data in the file similar to the free lunch method described by Caliandro et al. (2005 ▶). This enables structures to be solved at an earlier stage in the data collection and is particularly useful for data collected with diamond-anvil high-pressure cells, with which it is not always possible to collect complete data. It reduces the effects of series-termination errors in the Fourier syntheses, but tends to make the electron-density integration used to assign the element types less reliable. 2.2. The random omit procedure Omit maps are frequently used in macromolecular crystallography to reduce model bias. A small part of the structure is deleted and the rest is refined to reduce memory effects, then a new difference-density map is generated and interpreted. This concept plays an important role in SHELXT, but because no model is available at the P1 dual-space stage, it is implemented differently. The following density modification is performed unless otherwise specified by the user. A mask M(x) is constructed consisting of Gaussian-shaped peaks of unit volume at the positions of the maxima in the electron-density map. A small number of these Gaussian peaks are then deleted from the mask at random, usually every third dual-space cycle, and the new density is obtained by multiplying the original density ρ(x) with the mask: at each grid point x in the unit cell. This allows the random omit method to be implemented efficiently using fast Fourier transforms (FFTs) in both directions. Imposing a shape function in this way improves the atomicity of the map. Negative density is truncated to zero, a common theme in phase improvement by density modification (Shiono & Woolfson, 1992 ▶). Compared with charge flipping, the stronger imposition of atomicity probably allows the resolution requirements to be relaxed. On the other hand, charge flipping should be better for the solution of severely disordered or modulated structures, precisely because they are not atomistic! To decide which P1 solution is best, three criteria are considered: (a) The correlation coefficient CC between G o and G c, where G c are the amplitudes obtained by Fourier back-transformation of the modified electron density. (b) The structure factors G c are normalized to give E c and R weak is calculated as the average value of for the 10% of unique reflections (including systematic absences) with the smallest observed normalized structure factors E (Burla et al., 2013 ▶). In this way, the weak reflections can still play a decisive role in the structure solution even though they were not used directly to determine the space group. (c) The chemical figure of merit CHEM is calculated by performing a peak search and calculating all bond angles involving two distances in the range 1.1 to 1.8 Å. CHEM is the fraction of these angles that lie between 95 and 135° (Langs & Hauptman, 2011 ▶). The combined figure of merit CFOM is given by where X is 1.0 unless reset by the user. For organic or organometallic structures, especially for low resolution or incomplete data, the alternative, is sometimes better, but this is not the default option because it is not appropriate for inorganic and mineral structures. If CFOM is less than a preset threshold, the program refines further sets of starting phases, increasing the number of iterations each time this is done. 3. Using phases to find the origin shift and space group The idea of trying all possible space groups in a specified Laue group is also sometimes used in macromolecular crystal structure determination. For example, if the crystal is orthorhombic P, Laue group mmm, and only the Sohncke space groups need to be considered, a molecular-replacement program can be asked to test all eight possibilities. If only one of the eight gives a solution with good figures of merit, both the crystal structure and the space group have been determined! For chemical problems the situation is more interesting, because there are 30 possible orthorhombic P space groups and a total of 120 possibilities when different orientations of the axes are taken into account (as in SHELXT). The procedure used in SHELXT to find space groups and origin shifts that are consistent with the P1 phases is based closely on the methods proposed by Burla et al. (2000 ▶) and Palatinus & van der Lee (2008 ▶), so it only needs to be summarized here. For a reflection h with P1 phase ψ and its mth symmetry equivalent h m = hR m with P1 phase ψ m , where R m is a 3 × 3 rotation matrix and t m is the corresponding translation vector, we define For the correct space group and the correct origin shift Δx, η should be close to zero. To facilitate comparisons, the figure of merit α is defined as the F 2-weighted sum of η2 over all pairs of equivalents for all reflections, normalized so that it should be unity for random phases. α should be as small as possible for the correct combination of space group and origin shift. SHELXT first calculates α for the space group ; this value is referred to as α0. If α0 is less than about 0.3, the space group is probably centrosymmetric. For centrosymmetric space groups, the origin shift may be used to place a centre of symmetry on the origin; however, SHELXT has to take into account that the space group may possess more than one non-equivalent centre of symmetry. For , η is calculated with a FFT and for non-centrosymmetric, non-polar space groups a two-dimensional grid search followed by a one-dimensional search is performed to speed up the calculation. The space-group search is performed in parallel for all space groups that need to be tested. Although the solution with the lowest α value is often the correct one, only unlikely solutions with α greater than a specified value (default 0.3) are eliminated before going on to the next stage. 4. Assigning chemical elements to the electron-density peaks Each solution with a reasonable α value is first subject to ten cycles of density modification in the chosen space group after applying the origin shift. This density modification consists only of averaging the phases of equivalent reflections taking the space-group symmetry into account and resetting negative density to zero. A peak search is then performed, and the density inside a sphere (default radius 0.7 Å) about each peak is summed. It is better to use integrated densities rather than peak heights because the atoms may have different atomic displacement parameters. However, these integrated densities are not on an absolute scale, so the problem is how to set the scale so that they correspond to atomic numbers and the elements can be assigned. SHELXT attempts to set the scale as follows, going on to the next test only if the previous tests are negative: (a) If carbon is specified as one of the elements present, the program searches for peaks with similar integrated densities separated from each other by typical C—C distances (i.e. between 1.25 and 1.65 Å). If enough are found, the scale is set so that they will have average atomic numbers of 6. (b) If boron is expected, boron cages with distances between 1.65 and 1.8 Å are searched for. (c) A search is made for oxyanions. The oxygen atoms should have similar integrated densities to each other and similar distances to a central atom. (d) If the above tests are negative, it is assumed that the heaviest atom expected corresponds to the peak with the highest integrated density. This can run into trouble if, for example, there is an unexpected bromide or iodide ion in the structure and it has not been possible to fix the scale by one of the above methods. When the density scale has been found, it is used to assign elements to the remaining atoms. If it then appears that there are high-density peaks that cannot be assigned because only light atoms were expected, chlorine, bromine or iodine atoms are added. Some rudimentary checks are made to ensure that the element assignments are chemically reasonable. 5. Isotropic refinement and absolute structure determination After the atoms have been assigned, an isotropic refinement is performed using a conjugate-gradient solution of the least-squares normal equations. This is similar to the CGLS refinement in SHELXL (Sheldrick, 2008 ▶, 2015 ▶) and is performed in parallel. For non-centrosymmetric space groups this is followed by the determination of the Flack parameter (Flack, 1983 ▶) by the quotient method (Parsons et al., 2013 ▶) and inversion of the structure if the value of the Flack parameter is greater than 0.5. It is thus very likely that the structure determined by SHELXT will correspond to the correct absolute structure (so far no examples to the contrary have been reported). If α0 is below 0.3 and no atom heavier than scandium is expected, the program stops after finding a plausible centrosymmetric solution. The command-line switch may be used to force the program to test all space groups in the assumed Laue group. 6. Building the structure The following algorithm used to assemble the structure is diabolically simple but almost always builds and clusters the molecules in a way that is instantly recognizable. No covalent radii etc. are used, so the algorithm is independent of the element assignments. (a) Generate the SDM (shortest-distance matrix). This is a triangular matrix of the shortest distances between unique atoms, taking symmetry into account. (b) Set a flag to for each unique atom, then change it to for one atom (it does not matter which). (c) Search the SDM for the shortest distance for which the product of the two flags is . If none, exit. (d) Symmetry transform the atom with flag corresponding to this distance so that it is as near as possible to the atom with flag , then set its flag to . (e) Go to (c). The next stage is to centre the cluster of molecules optimally in the unit cell. This is complicated, but makes extensive use of the tables of alternative origins for the different space groups given in Chapter 3 of Giacovazzo (2014 ▶). For example, for space group there are four alternative origins (0, 0, 0; 0, 0, ½; ½, 0, ¼; ½, 0, ¾1), but for there are only two (0, 0, 0; 0, 0, ½). These are combined with the lattice centring (in this case 0, 0, 0; ½, ½, ½). For polar space groups the optimal position along the polar direction(s) (e.g. along the body diagonal of the unit cell for space group R3 indexed on a primitive rhombohedral lattice) that minimizes the maximum distance of any atom from the centre of the unit cell is determined. 7. Examples The first example is an organoselenium compound (Clegg et al., 1980 ▶) for which an extract from the listing file from SHELXT is shown in Fig. 2 ▶. Four different Patterson superposition vectors were used by default to start four dual-space structure solution attempts in parallel. This was a good choice because the computer had an Intel i7 processor with four cores. On the evidence of the combined figure of merit CFOM, one of the four (try 1) is a good P1 solution. The correlation coefficient CC and the chemical figure of merit CHEM clearly indicate the correct solution, but R weak is less clear. N is the number of peaks used in the density modification, Sig(min) is the height of peak N divided by the r.m.s. (root-mean-square) Fourier map density and Vol/N is the volume per peak in Å3. The best phase set was then used to search for the space group and three space groups are reported (Fig. 3 ▶); the other 11 space groups tested were rejected because one or more figures of merit were too high. The space group P21 is clearly indicated by the values of R1, R weak, α and the Flack parameter, so there can be little doubt that it is correct, and in fact all the atoms are assigned to the correct elements. Note that although α0 is less than 0.3, the non-centrosymmetric space groups were searched as well because an atom (Se) heavier than scandium was specified on the instruction. The second example (Müller et al., 2006 ▶) involves a reorientation of the unit cell. Since two orientations of Pmn21 have the same systematic absences, both (and possibly also the centrosymmetric Pmmn) would have had to be tried for a conventional structure solution. SHELXT finds only one solution and all atoms are correct (Fig. 4 ▶). The Flack parameter is still rather approximate but is sufficient to indicate the correct absolute structure; it improves on anisotropic refinement including the hydrogen atoms. The third example (Walker et al., 1999 ▶) contains a bromine atom and so the non-centrosymmetric space group P1 is also tested, despite the good R1 and α values for the centrosymmetric solution (Fig. 5 ▶). In fact, this structure is pseudo-centrosymmetric and contains a mixture of diastereoisomers that imitates a centre of symmetry. The P1 solution is completely correct. Both solutions have similar figures of merit because the main difference is the position of one carbon atom that appears to be disordered in but not P1, but the Flack parameter strongly indicates P1. The last example shows what can go wrong. This structure was published by Barkley et al. (2011 ▶) in the non-centrosymmetric space group , but there are two warning signs: checkCIF (Spek, 2009 ▶) detects an inversion centre (a B alert) and the Flack parameter is dubious: the current SHELXL (Sheldrick, 2015 ▶) gives a value of 0.46 (11). Often a value close to 0.5 indicates a centrosymmetric structure. At first glance, SHELXT appears to indicate because of a significantly lower R1 value. Unfortunately, the Flack parameter cannot be determined by SHELXT for this space group because the deposited data had been merged in a different non-centrosymmetric point group (hence ‘ ’ in Fig. 6 ▶). However, neither nor are correct! Basically all the solutions are the same structure and the correct space group is the centrosymmetric P63/mmc of which all the other space groups are subgroups. The cause of the debacle is that only for were the elements assigned completely correctly and hence this space group has a lower R1 value. For the correct space group P63/mmc the manganese atom has been incorrectly assigned as calcium. With the correct element assignments all the figures of merit would have been very similar for all the space groups. In such cases the highest-symmetry (centrosymmetric) space group is almost always correct. 8. Program development and distribution SHELXT is compiled with the Intel ifort Fortran compiler using the statically linked MKL library and is particularly suitable for multi-CPU computers. It is available free to academics for the 32- or 64-bit Windows, 32- or 64-bit Linux and 64-bit Mac OS X operating systems. The program may be downloaded as part of the SHELX system via the SHELX home page (http://shelx.uni-ac.gwdg.de/SHELX/), which also provides documentation and other useful information. Users are recommended to view the ‘recent changes’ section on the home page from time to time. The initial development of SHELXT was based on a test databank of about 650 structures, mostly determined in Göttingen, covering a wide range of problems. It has also been tested by more than 200 beta-testers for up to three years, in the course of which several thousand structures were solved (and a few not solved). It is difficult to generalize, but the correct space group was identified in about 97% of cases, and for about half of the structures every atom was located and assigned to the correct element. Most of the remaining structures were basically correct, the most common errors being carbon assigned as nitrogen or vice versa. Poor solutions were sometimes obtained when the heavy atoms corresponded to a centrosymmetric substructure but the full structure possessed a lower symmetry. It is always essential to check the element assignments, especially if the program has added extra elements, and also to check for the presence of disordered solvent molecules that may have been missed. The biggest danger is that inexperienced users may assume that the program is always right!
1. Introduction In the late 1960s, only 40 years ago, a routine small-molecule crystal structure determination in the setting of a well equipped crystallography laboratory would take several months. The bottlenecks were the data-collection, structure-solution and structure-refinement stages. Since then, data collection has advanced from a time-consuming film-based and serial detector-based technique to the current area detector-based systems, thus speeding up this stage by at least an order of magnitude. Modern CCD detector-based systems can easily collect 1000 small-molecule data sets in a year. The currently available direct methods for structure solution have essentially solved the long-standing phase problem in small-molecule crystallography given crystals of sufficient quality. Easy-to-use structure-determination software is now widely available and often comes with the data-collection hardware. The computing power needed for data processing, structure solution and refinement, once expensive and a monopoly of the University Computer Centre, is nowadays ubiquitous, cheap and fast on the personal computer platform. Therefore, given a routine structure determination, it is now quite possible to collect diffraction data, solve and refine the structure and send off a structure report for publication in Acta Crystallographica Section E within a day. This development is clearly demonstrated by the growth in the number of small-molecule structures that are published each year. This number has increased exponentially over the past 40 years from about 1000 in 1967 to over 35 000 in 2007. It should be noted that this last figure is a lower bound of the actual number of small-molecule structure determinations that are carried out each year. It is likely that a similar number of studies never reach the literature. The publication of a crystal structure as part of a research paper is still a time-consuming activity and remains a bottleneck, often together with the problems of obtaining publication-quality crystals. Nowadays, the majority of small-molecule crystal structures are determined to ‘confirm’ the outcome of synthetic chemical work. The confirmation of a newly prepared compound by a crystal structure is generally a requirement for the publication of the associated chemistry in major chemical journals. Seeing is believing. Crystallography is in this sense often used as an analytical tool. However, there is a problem. The number of experienced crystallographers dedicated to single-crystal studies has certainly not increased in proportion to the number of reported studies. Many single-crystal structure analyses are currently carried out by non-experts using the available black-box software. Often, for understandable reasons, such investigators lack sufficient experience to avoid the many possible pitfalls, such as an incorrect atom-type assignment, that may be obvious to an expert. In the past, all unusual aspects of a structure analysis were supposed to be discussed in a publication with sufficient detail for both the reader and referee to make their own judgment about a claimed result. Nowadays, crystallography is considered by many chemical journals as routine and the crystallographic information is, at best, supplied in a footnote or as supplementary material with very limited details, if any, given in the published text. The chances are therefore high that papers are accepted for publication without crystallographic referees ever having looked at the supporting material. Unfortunately, the number of experienced crystallographic referees has decreased dramatically. As a result, the literature and databases, such as the Cambridge Structural Database (CSD; Allen, 2002 ▶), include obviously incorrect structures associated with formally refereed papers. About 12 years ago (Linden, 2007 ▶), a crystal structure-validation project was started in the context of the journals of the International Union of Crystallography in order to address the refereeing issue and the time-consuming work that went into the checking of the supplied data for completeness and consistency. Its initial implementation was used to evaluate papers submitted to Acta Crystallographica Section C. At that time, it was already a requirement of the journal that the crystallographic data had to be provided in the computer-readable CIF format (Hall et al., 1991 ▶). The submission of electronic data files allowed the validation software to perform a number of quality and validity checks and to create a report in the form of ALERTS on issues to be addressed by authors and referees. Soon afterwards, further validation tests on structural issues were added. These tests are incorporated as part of the structure-analysis tools that are available in the PLATON package (Spek, 2003 ▶; Müller et al., 2006 ▶). The official IUCr structure-validation suite (checkCIF/PLATON) is currently available as an IUCr web service (http://journals.iucr.org/services/cif/checking/checkfull.html). Its use is required for every small-molecule crystal structure submitted for publication in the IUCr journals. Many major journals currently have similar requirements, as stated in their Notes for Authors. This paper reports on the current status of the IUCr validation project. 2. Structure validation Structure validation addresses three simple but important questions: (i) Is the reported information complete? (ii) What is the quality of the analysis? (iii) Is the structure correct? The answer to the first question involves the use of a computerized checklist. The answers to the other questions are obviously less straightforward. The quality of a single-crystal study can be classified into one of four classes. Class I consists of high-quality structure determinations that were carried out using data collected from a near-perfect crystal and under optimal experimental conditions. This will generally be data collection at a sufficiently low temperature and to a sufficiently high resolution. Such conditions are not always attainable. Inherently poor-quality crystals, disorder or a phase transition can be reasons why this goal cannot be reached. Class II structures are good structures that were determined under routine conditions or with experimental restrictions that are sufficient for the purpose of their study but not necessarily to the highest attainable quality. This class includes structures from data collected at room temperature or with high-pressure cells. Class III structures are poor structures that are essentially correct as far as the associated chemistry is concerned but for various reasons have limited accuracy. Reasons can be poor crystals, incomplete or weak and noisy diffraction data. Severe disorder that is difficult to model can be another reason. Class IV structures are incorrect. Important examples are those in which some of the element-type assignments are wrong or models with too few or too many H atoms. The impact of an incorrect published structure may be disastrous for research that builds on it. Examples include attempts to synthesize complex natural products on the basis of an incorrectly reported crystal structure (for an example, see Li, Burgett et al., 2001 ▶; Li, Jeong et al., 2001 ▶). Ideally, most issues reported by the validation software should already have been corrected at an early stage of the analysis and thus should never appear in published structures. Correction at the publication stage may be laborious or even impossible for unique crystalline samples. Clearly, structure validation is particularly important for addressing Class IV structures. Class III structures may be useful to direct further research, but are generally not suitable for publication unless supported by an in-depth analysis. Crystallographic journals will aim at Class I structures, while noncrystallographic referees of chemical journals may even be satisfied with Class III structures. Validation should avoid having Class IV structures ever appear in print. The holy grail of structure validation is a tool that unequivocally assigns one of the above four quality classes to a given structure report. This would be performed on the basis of the application of objective criteria to the supplied structural and experimental data. The currently available IUCr tool, checkCIF/PLATON, is in this sense still far from that ideal. Instead, a list of ALERTS is produced that are classified according to their level of seriousness. These should be addressed by the investigator and those remaining evaluated by experts. The validation criteria currently in use are in many cases empirical and based on experience and tradition rather than based on science. Some criteria have changed over time. There is an obvious trade-off between being too critical, leading to too many false ALERTS, and being less sensitive and thus missing multiple weak indications of a serious problem. Eventually, a scientifically sound underpinning of the validation criteria will be sought. Automated structure validation as it is today has its origin in the definition of the CIF standard for the exchange and archival of structural and experimental data (Hall et al., 1991 ▶). CIF became ‘the standard’ in small-molecule crystallography with its adoption by the widely used SHELXL refinement-software package (Sheldrick, 2008 ▶). Acta Crystallographica Section C made CIF the required data-submission format for publication and it is currently the only way to submit a structural report to Acta Crystallographica Sections C and E. Initially, software was developed to check the completeness of the supplied data, its consistency and its validity. It was soon realised that the availability of coordinate data also made it possible to base geometry and other calculations on these data. Examples are the detection of solvent-accessible voids in a structure that were missed by the investigators and the search for missed higher symmetry. This can be achieved by the use of readily available tools in the PLATON package (Spek, 2003 ▶). Validation issues are subdivided into four categories: (i) Missing or inconsistent data. (ii) Indicators that the structure model may be wrong or deficient. (iii) Indicators that the quality of the results of the study may be low. (iv) Cosmetic improvements, queries and suggestions. The validation software assigns one of four severity levels (A, B, C and G) to reported issues. Level A ALERTS usually indicate that corrective action is imperative or there has to be a scientifically acceptable explanation for the case at hand. Level G ALERTS concern issues that may be correct but should be checked. They can still point to serious problems that could not be analyzed in detail on the basis of the available data. Currently, about 400 validation tests have been implemented. Most tests result in a one-line ALERT message. Each test is associated with some documentation explaining the problem with possible options to address them. 3. Validation of the diffraction data Most problems with and questions related to a structure report can be resolved just using the data available in the CIF. However, reflection data in computer-readable format will sometimes be needed in borderline cases for a detailed analysis of issues such as the correct symmetry description. Some problems, such as missed or ignored twinning as an explanation for an unsatisfactory refinement result, may only show up in an analysis of the reflection data. The submission of reflection data as a structure-factor file (F o/F c data in CIF format) is required for a structural publication in Acta Crystallographica. This allows automatic checking for missed twinning. Absolute structure assignments are generally inferred from the value of the Flack parameter that is reported in the CIF (Flack, 1983 ▶). This value can be erroneous (Flack et al., 2006 ▶) and lead to false conclusions about enantiopurity. The availability of the reflection file allows software to check the reported value independently. This is performed by a comparison of the value of the reported Flack parameter with the value of the Hooft parameter (Hooft et al., 2008 ▶), which is calculated from the Bijvoet differences. The availability of reflection data also allows an independent structure determination and inspection of difference density Fourier maps for special features such as missing or incorrectly positioned H atoms. Unfortunately, the referees of chemical journals have no easy access to the reflection data since there is no deposition requirement by non-IUCr journals. Consequently, those primary data are also not archived. The Cambridge Structural database does not archive reflection data either. The validation of F o/F c data is available with the standalone PLATON/VALIDATION software (http://www.cryst.chem.uu.nl), and will be available shortly through the IUCr checkCIF/PLATON web service. Validation utilizing the reflection data is currently implemented for papers submitted to Acta Crystallographica Sections C and E. 4. Examples This section reviews a number of published structure reports that have been shown to be erroneous and for which a formal correction has appeared in the literature. There are many more (largely undocumented) examples of troublesome reports. Any analysis of the data for a subset of structures taken from the nearly 500 000 structures in the CSD will show outliers. Most of these outliers point under close inspection to unresolved problems or errors of some sort rather than being of scientific interest. Unfortunately, in most cases the primary data (reflection data) are unavailable for a proper objective and definitive analysis. 4.1. Missed symmetry The assignment of the correct space group of a structure to one of the possible 230 space groups can at times be problematic. The effective space group cannot always be assigned uniquely at the start of the structure analysis on the basis of the observed systematic absences alone. Often, preliminary structure solution only succeeds in a space group that turns out to be a subgroup of the real one. In fact, difficult structures can often only be solved in the lowest symmetry space group P1, leaving the transformation to the correct space group to be performed afterwards. Unfortunately, many examples in the literature (see Marsh & Spek, 2001 ▶) show that this goal is not always achieved. The required transformation is not always trivial. Software that suggests the real symmetry and performs the associated transformation is readily available (e.g. PLATON/ADDSYM), but is not always part of the refinement software suite being used. Some missed symmetry cases are relatively harmless in that this error does not seriously affect the structure and its interpretation (e.g. wrong Laue group), such as Example 1 below. On the other hand, overlooking an inversion centre is generally serious. This last problem can be hidden when structure refinement is performed by using constraints and restraints to secure the stability of the least-squares refinement. There are many borderline cases for which the reflection data are needed for a definitive space-group assignment. 4.1.1. Missed symmetry: Example 1 Fig. 1 ▶ illustrates an example of a structure that was published with one crystallographically independent molecule in the orthorhombic space group Pbca (Azumaya et al., 1995 ▶). A program that displays a structure perpendicular to the main molecular plane by default will immediately show that this molecule has at least pseudo-threefold axial symmetry. Such an axis may or may not coincide with a crystallographic axis. The existence of crystallographic threefold symmetry was shown to be the case by Herbstein (1999 ▶). The correct cubic space-group assignment, Pa , would have been indicated by the current validation software. 4.1.2. Missed symmetry: Example 2 Fig. 2 ▶(a) illustrates the dramatic effect of the solution and erroneous refinement of a centrosymmetric structure in a noncentrosymmetric space group (Kahn et al., 2000a ▶). Even just the published displacement ellipsoid plot of this structure, which has been refined in space group P1, should have aroused serious suspicion with the referees of the paper about the quality and correctness of the structure. This structure would have been a perfect candidate for the ‘ORTEP of the Year’ award (Harlow, 1996 ▶). It was only on the basis of a suggestion from a reader of the journal that this structure was re-refined in the centrosymmetric space group P . The correctly refined structure, shown in Fig. 2 ▶(b), clearly looks quite normal (Kahn et al., 2000b ▶). Thus, what might have looked like a structure report based on very poor data turned out to be a good-quality structure after all. In this context, it is interesting that the detailed discussions in the original paper about the unusual differences in bond distances turned out in hindsight to be based on incorrectly interpreted refinement artifacts. The checkCIF/PLATON validation report (using the downloadable CIF) for the original P1 structure cites the space-group problem and numerous other issues. 4.2. Missing or incorrectly placed H atoms Missing H atoms or too many H atoms in a reported molecular structure may have a significant impact on the interpretation of the chemistry or the nature of the compound. H atoms are often introduced to the model at calculated positions without checking whether there is significant electron density at that location or are erroneously left out. Hydroxyl moieties generally have their H atom on a cone and pointing to a hydrogen-bond acceptor in the structure. Exceptions are rare and are generally the consequence of misplaced H-atom positioning, incomplete structures or wrong atom-type assignment. 4.2.1. Missing H atoms Fig. 3 ▶ shows a structure that was published as a synthetic breakthrough with the title The stable pentacyclopentadienyl cation (Lambert et al., 2002 ▶). Interesting chemistry building upon this result was envisioned. ‘Packing effects’ were offered as an explanation for the unusual nonplanarity of two substituents on the five-membered ring. It was rapidly shown by Otto et al. (2002 ▶) that the reported structure obviously needed two additional H atoms at sp 3 positions on the five-membered ring and that the reported structure was actually the less interesting pentamethylcyclopentenyl cation. Given the availability of reflection data, it was easy to verify the presence of the two additional H atoms in a difference density map. 4.2.2. Wrongly placed H atom Fig. 4 ▶(a) shows a structure with an incorrectly positioned hydroxyl H atom (Körner et al., 2000a ▶). The problem cannot be seen in a published single-molecule ORTEP illustration. What is needed is an analysis of the intermolecular interactions. Fig. 4 ▶(b) illustrates the problem that was detected in a retrospective validation run. The correct hydrogen-bond network shown in Fig. 4 ▶(c) makes more sense (Körner et al., 2000b ▶). Contoured difference electron-density maps can be very helpful in analyzing this type of problem. A misplaced H atom will show up as a negative density peak in its false location and the correct location will appear as a positive peak. 4.3. Incorrect atom-type assignments The result of a crystal structure determination is not always the expected one. In such cases, atom-type assignments may be biased by preconceived ideas and assumptions. Linden (2007 ▶) reports several cases in which the reported chemical species is nearly certain to be wrong. Structures published as possessing —C=N—H groups may sometimes have resulted from a misinterpretation of —C=O groups. Zhong et al. (2007 ▶, 2008 ▶) report the retraction of a coordination complex with a missing H atom on an N atom and a central SnIV atom that is most likely the cation of a lanthanide(III) coordination complex. Below are two further examples in which the reported chemistry was incorrect. 4.3.1. Withdrawn misinterpreted structure Fig. 5 ▶ is an example of a structure report (Fang et al., 2007 ▶) on a ‘novel heterocyclic’ compound, crystals of which were obviously obtained unexpectedly from a reaction mixture. A reader (an Acta Crystallographica Section C Co-editor) recognized this structure as being at least isomorphous with the well known structure of the mineral borax. Closer inspection revealed that the two compounds were indeed identical. The displacement ellipsoids of the N and C atoms clearly suggested that they should be interpreted as the atom types O and B, respectively. Hirshfeld (1976 ▶) rigid-bond test ALERTS sent out similar signals. The structure report was subsequently retracted (Fang et al., 2008 ▶). 4.3.2. Charge-balance problem Fig. 6 ▶ shows a published network structure (Sadiq-ur-Rehman et al., 2007 ▶) that was obtained unexpectedly. It is not clear from the reaction conditions where the NO3 − anion in the proposed structure is supposed to come from. In addition, there is also a charge-balance problem that was obviously overlooked by both the authors and the referees of the paper. An anion with a −2 charge is needed. The same authors (Sadiq-ur-Rehman et al., 2008 ▶) have now corrected the structure in view of the charge-balance problem. The NO3 − anion was replaced by CO3 2−, as suggested by the unusual size of the displacement ellipsoid of N in the NO3 − version. Generally, such a change of atom type would result in significantly better displacement parameters and refinement results. In this case, no significant improvement was observed. Interestingly, the revised report also does not mention that the reflection data were from a merohedrally twinned crystal. Part of the reason for this might be that the current CIF file definition (and for that reason software such as SHELXL) does not yet offer a standard means of recording twinning in a CIF. The twinning correction that was correctly applied was detected as part of the validation of the reflection file. On the other hand, the general implementation of a check for charge balance is a challenging validation issue. 5. Evaluation and discussion An analysis of the ALERTS generated for the 35 760 entries added to the CSD from 2006 and early 2007 indicates that validation and the provision of adequate responses to the issues raised still has room for improvement. 384 space-group changes were indicated. Other frequently reported problems are unaccounted-for solvent-accessible voids and numerous problems with H atoms. Some ALERTS require an in-depth analysis by experts. Investigators not trained in crystallography may have no clue as to what to do with ALERTS about symmetry issues, as may be gleaned from queries such as ‘What does it mean: space group incorrect’. A recent example of a structure with a space-group-related ALERT is the structure report of a small organic molecule that is correctly reported by Portilla et al. (2008 ▶) in space group P (Fig. 7 ▶). Validation suggests space group C2/m within default error tolerances as a higher symmetry alternative, which makes sense since the basic molecule has an approximate mirror plane. In fact, this structure easily solves and refines in C2/m when instructed to do so, although with a higher R factor. The evidence against C2/m is that the atomic displacement parameters in the t-butyl moiety are high. In addition, the proposed transformation from triclinic to monoclinic symmetry leads to α and γ angles that differ by 0.3° from the 90° required for monoclinic symmetry. The published structure is based on 120 K data and may well have exact C2/m symmetry at higher temperature. The Hirshfeld rigid-bond test (Hirshfeld, 1976 ▶) has proved to be very effective in revealing problems in a structure. It is assumed in this test that two bonded atoms vibrate along the bond with approximately equal amplitude. Significant differences, i.e. those which deviate by more than a few standard uncertainties from zero, need close examination. Notorious exceptions are metal-to-carbonyl bonds, which generally show much larger differences (Braga & Koetzle, 1988 ▶). 6. What next? Crystallographic procedures evolve. This also has an impact on structure-validation procedures. A number of currently implemented validation issues are related to data-collection techniques that are based on serial detectors. Those detectors have now largely been superseded by image-plate or CCD-based instruments, which may themselves become obsolete with the arrival of a new generation of (pixel) detectors that allow shutterless data collection. Before the introduction of two-dimensional detectors, corrections for absorption were performed using a multitude of techniques that ranged from purely empirical to an exact calculation based on a description of the crystal shape. Tests were implemented to validate the appropriate use of the chosen method. Nowadays, with two-dimensional detector data, a correction for absorption is mostly of the multi-scan type (e.g. SADABS; Sheldrick, 2008 ▶) convoluted with inter-image scaling and optionally preceded by a numerical correction for absorption on the basis of a description of the crystal shape. New up-to-date validation tests for this are needed. Current validation does not yet validate the results of powder diffraction, incommensurate structures and charge-density studies. The same applies to the more involved issues with inorganic compounds. The geometry of a newly determined structure can be validated against similar structures in the CSD (Allen, 2002 ▶; Bruno et al., 2004 ▶). This is easily performed manually but is not easy to automate. An interesting development is the arrival on the market of automated bench-top ‘crystal-to-structure’ instruments. This might pose an interesting challenge to journals and validation software when structure reports from such machines run in black-box mode arrive on editors’ desks. Formal crystallographic training has disappeared in many places, so inexperienced authors might be confronted with difficult to answer ALERT queries. Regular crystallographic training courses are still organized on a national or international basis and should be strongly supported. 7. Concluding remarks Structure validation has become a standard procedure in small-molecule crystallography. It sets a quality standard that is not just based on low final R factors and can save a lot of time for both the investigator and the referees of a paper. A short or zero-length list of minor ALERTS may indicate a good structure. Some ALERTS may even point to interesting structural features that would otherwise have gone unnoticed and are worth discussing in a publication. Examples are pseudo-symmetry and short intermolecular contacts. Some ALERTS reveal issues that can only be addressed by experienced crystallographers. An example is whether a given structure is best described as disordered in a centrosymmetric space group or as ordered in a noncentrosymmetric space group (Flack et al., 2006 ▶). The scope of the currently implemented checkCIF/PLATON validation procedures is high-resolution small-molecule crystal structures. Extension to large or low-resolution protein structures is not envisioned. As an example, the PLATON/ADDSYM algorithm that is used to detect missing symmetry requires atomic resolution data. The automated structure-validation techniques that are currently applied to submissions to Acta Crystallographica have essentially eliminated long-standing errors, such as missed higher symmetry, in Acta Crystallographica Sections B, C and E. This is unfortunately not yet the case for many other journals. Class IV structures still appear in the chemical literature. Structures are still published in a too low-symmetry space group despite the many papers on this issue by Dick Marsh entitled ‘More space group changes’ (see, for example, Marsh & Herbstein, 1988 ▶). Most major journals state structure validation as a requirement in their Notes for Authors. However, in practice it appears that many structures are published without serious inspection of the crystallographic data by an expert. An often-heard comment is ‘addressing crystallographic details holds up the publication of important chemistry’. In many cases, these crystallographic details are just trivial pieces of information that should already have been included as a standard protocol in the CIF at the end of the structure analysis. Database services, such as the Cambridge Crystallographic Data Centre (CCDC; Allen, 2002 ▶), attempt to sort out some of the obvious problems by consultation with the authors, but the CCDC staff cannot add any judgment or correction without the consent of the authors.
Covalent organic frameworks (COFs) have been designed and successfully synthesized by condensation reactions of phenyl diboronic acid {C6H4[B(OH)2]2} and hexahydroxytriphenylene [C18H6(OH)6]. Powder x-ray diffraction studies of the highly crystalline products (C3H2BO)6.(C9H12)1 (COF-1) and C9H4BO2 (COF-5) revealed expanded porous graphitic layers that are either staggered (COF-1, P6(3)/mmc) or eclipsed (COF-5, P6/mmm). Their crystal structures are entirely held by strong bonds between B, C, and O atoms to form rigid porous architectures with pore sizes ranging from 7 to 27 angstroms. COF-1 and COF-5 exhibit high thermal stability (to temperatures up to 500 degrees to 600 degrees C), permanent porosity, and high surface areas (711 and 1590 square meters per gram, respectively).
[1
]GRID grid.11135.37, ISNI 0000 0001 2256 9319, College of Chemistry and Molecular Engineering, Beijing National Laboratory for Molecular
Sciences, , Peking University, ; 100871 Beijing, China
[2
]GRID grid.10548.38, ISNI 0000 0004 1936 9377, Berzelii Center EXSELENT on Porous Materials, Department of Materials and Environmental
Chemistry, , Stockholm University, ; 10691 Stockholm, Sweden
[3
]GRID grid.16890.36, ISNI 0000 0004 1764 6123, Department of Mechanical Engineering, , The Hong Kong Polytechnic University, ; 999077 Hong Kong, China
[4
]GRID grid.5037.1, ISNI 0000000121581746, Present Address: Department of Fibre and Polymer Technology, School of Engineering
Sciences in Chemistry, Biotechnology and Health, , KTH Royal Institute of Technology, ; Tekninkringen 56-58, Stockholm, SE-100 44 Sweden
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license, and indicate if changes were
made. The images or other third party material in this article are included in the
article’s Creative Commons license, unless indicated otherwise in a credit line to
the material. If material is not included in the article’s Creative Commons license
and your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder. To view
a copy of this license, visit
http://creativecommons.org/licenses/by/4.0/.
History
Date
received
: 29
April
2022
Date
accepted
: 21
June
2022
Funding
Funded by: FundRef https://doi.org/10.13039/501100001809, National Natural Science Foundation of China (National Science Foundation of China);
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.