Reconstructing a 2D Structure From X-ray Data

In this project, you'll learn some of the basics of how to reconstruct the position of molecules from x-ray diffraction data. For simplicity, we'll work with a strictly two dimensional system. You see applications of things like this at surfaces such as membranes where molecules, such as some proteins, live.

X-ray diffraction data gives you information about the fourier transform of the electron density. It doesn't give you the fourier transform because it lacks phase information. Instead what you obtain is the absolute value, that is, the amplitude of the 2d fourier transform.

From this alone, it is tricky and, in many experimental situations, impossible to get the underlying density because all that lost phase information is very important. One way around this is to employ heavy atom substitution. You substitute one atom for another that gives a different amount of scattering. You get the intensities for this case as well. The two x-ray patterns can be combined often to solve the structure for you.

What we give here is a simple version of this. We have the native amplitudes in the file nat.txt Download nat.txt

. We have another file where an extra atom has been added at the origin of the coordinate system. The scattering from this is point-like so that the amplitude at all points is the same, in this case, it has been determined that it is 100. The means that the diffraction pattern is now different because of this extra contribution from the additional atom. The data from this is in the file hev.txt Download hev.txt.

The diffraction pattern data should be read in and each should be a 360x360 array. You can see this by editing the file directly. There are 360 lines containing 360 numbers.

As you write a script in an editor, you can check out what it's doing by going into ipython.

The first step is to load in the data. If you are in a terminal window, you can start ipython with:

ipython -pylab

You should eventually get a prompt like:

In [1]

and at that prompt type:

a_f = loadtxt("native.txt")

You are loading in the data as an array and giving it the array name "a_f". You can display a picture of it by typing imshow(a_f). (a stands for "amplitude"). This is the absolute value amplitude data squared. I could have given you the square root of this instead, the difference is trivial, you can promptly take the square root of this array in scipy if you want, e.g. square_root_array = sqrt(your_array). You can write: shape(a_f) to get the dimensions of that array.

So the same thing for "hev.txt". I gave it the name a_g.

Now it is your job to figure out how to use this data to get the underlying density for this system.

The math of this is not too complicated. You'd like to know the fourier transform of the data, so you can invert it, but you're only give the absolute value squared. How is the fourier transform for heavy atom system related to that of the native system? Hint: what's the fourier transform of a delta function? Once you see how the two fourier transforms are related, you take the absolute value of both sides of the equation and play around with it to solve for the fourier transform of the native data. Do not go off into trig land and start worrying about angles!. Your experience with complex numbers has taught you that it is easier just to use algebra, things like complex conjugate, and the real part to do these kinds of problems than dealing with trigonometric function. You should not have a single sine or cosine in your script. If you do it as I suggest, your formula with be really simple.

One useful bit of information, is that the scattering data you have been given is for a region a lot bigger than the system itself. You might want to use this as follows. When you get the phase information from this system, it can be hard to determine if it's positive theta, or negative theta, e.g. +0.1 or -0.1. However the cos(kx) = cos(-kx) = Re(exp(ikx)). What bad things would ignoring the imaginary part of the fourier transform do? (Hint: remember the first homework assignment). Once you've figured that out, you'll see there's a good reason why we want to make the system being smallish compared to the region we're looking at.

Other things that are useful to know.

Example of writing a function in python:

def add(a,b):
      return a+b

a and b can be regular numbers, or even arrays! White space is important in python.

so if a and b are both 8x8 arrays,

c = add(a,b)

makes c an 8x8 array.

The inverse 2d fourier transform of array b is inv_ft_b = ifft2(b)

where I've called the result inv_ft_b. This will give a bunch of complex numbers. If you want just the real part do

real(inv_ft_b)

real(ifft2(b))

To make this into a script, put commands in a file put add the first two "from" lines you see at the top and then your script.

from scipy import *

from pylab import * l=100 ... ... imshow(result)

Then when you're in ipython, type

 
run script_name

If you have graphics in there, (like imshow) you have to type show() to actually see the results.

Another thing you should know, is that the script you need to write less than 16 lines long!

Further Questions:

1. Is it necessary that you be supplied with both native.txt and heavy.txt to find the structure or could you do it from knowing only native.txt?

Hint: http://www.nature.com.oca.ucsc.edu/nature/journal/v400/n6742/full/400342a0.html

2. Find an experimental problem where you don't have a single crystal and have to find the structure of a system from diffraction data. Describe the difference between diffraction experiments from a single crystal and powder diffraction.

3. How much is known about the structure of membrane proteins, and why is knowing the structure important? Is it easy to crystallize a membrane protein? How is the structure determined? Briefly describe a recent experiment or research project on membrane protein crystallography.