A3: Photo to 3D Face

We want to reconstruct 3D face(s) from 2D photos(s) in this assignment. Any algorithm will need to estimate the pose of the face relative to the camera as well as the 3D geometry. This problem is underconstrained and 100s of papers have been written. Some papers use multiple viewpoints to add constraints, while others work with a single input image. With the emergence and success of deep learning to solve many computer vision problems, Convolution Neural Networks (CNNs) have been used to restore the corresponding 3D information from a single 2D facial image, which provides both dense face alignment and 3D face reconstruction results. Some example applications of photo to 3D Face are image-based face recognition, transferring a real actor's expression to a facial rig or avatar in CGI or animated movies, medical applications (planning aesthetic operations and plastic surgeries, and designing personalized masks, bio-printing facial organs).

For the purposes of this assignment we are all going to start from from the same 'test dataset' and run different algorithms. As a whole class, we hope to see lots of examples to try to understand what the state of the art truly is, versus what it says in papers. We are explicitly being skeptical of the claims in the papers. We want to know does it *really* work? And how well?

Think of this as all one assignment. Part 1 and Part 2 are really checkpoints to make sure you get started, so your goal in the checkpoints is to demonstrate you are making progress toward the final goal. Our main interest is sharing our findings as a class after Part 3 is done.

Figure: An example of input image, aligned reconstruction, animation with various poses & expressions (Source Links to an external site.)

Part 1: Get some code running and calculate some faces

Download datasets of 2D Photos and the ground truth 3D face from this link Links to an external site.. The 2D photos are 60 still pictures of the same person taken from different angles. The ground truth is a 3D scan of the same person given as a .ply file. Check out this data so you know what we are working with.
Pick a paper/software of your own choice or from the given lists below that gets 3D faces from 2D photos. There are various techniques to achieve that. For example,
- Photogrammetry, SfM, Multi-view stereo from multiple images to 3D, etc.
- Deep learning based 2D to 3D.
- Here is a paper with zillions of references (although it only talks about single view techniques):
  - State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications [paper Links to an external site.]
Here is a list of some recent papers for creating 3D faces from the photo(s) with the source code available to download. Download the code for your selected paper (or implement it). These were not carefully picked, they were just googled. You are encouraged to consider other papers/software/code.
- Machine Learning Based
  - Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild (CVPR 2020) [Paper Links to an external site.] [Code Links to an external site.]
  - Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network (ECCV 2018). [Paper Links to an external site.] [Code Links to an external site.]
  - Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation (ICCV 2017) [Paper Links to an external site.] [Code Links to an external site.]
  - Deep 3D Portrait from a Single Image (CVPR2020) [Paper Links to an external site.] [Github Links to an external site.] [Video Links to an external site.]
- Multi-view stereo
  - Write code in OpenCV
  - Agisoft Links to an external site., is a complete software package for doing this, they have 30 day free trial.
- Combining the two methods
  - Deep Facial Non-Rigid Multi-view Stereo (CVPR2020) [Paper Links to an external site.] [Code Links to an external site.]
  - MVF-Net: Multi-View 3D Face Morphable Model Regression (CVPR2019) [Paper Links to an external site.] [Code Links to an external site.]
  - Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPR Workshop 2019 AMFG Best Paper Award) [Paper Links to an external site.] [Code Links to an external site.]
Get the code to work, create some 3D faces using the paper/tool you selected,
- Try on their provided sample images first since those should probably work
  - Are the results you get the same as what they report in the paper?
- Try on the images from our dataset (they might not work well yet, just try it)
  - Are the results just as good as what we would expect?
  - Does it actually look like the person?

5. You will do this more carefully in future weeks, for now just explore the following questions,

- What are the effects of trying different angles? Does it matter for the 3D reconstruction?
- For reconstruction from multi-image, does the number of images matter?

What to turn in:

A single zip file containing a pdf report (with pictures), the generated 3D meshes, and code/files (if any) with clear instructions to run.
Add at least one slide with your progress/results/findings to this google slides deck Links to an external site.. All the students will present from this slides deck after Part 3 to share our findings as a class.

Submission deadline: Feb 14, 2021 at 11.59pm

Part 2: Get it working as well as possible on our data, and Calculate error to see how good it is

Tweak around parameters as necessary to get as good results as possible on our dataset.
Use ICP to align your output to the ground truth face (available in the link Links to an external site. here). The use of ICP is similar to the way you used it in the previous assignment. (You don't need to use ICP, but you do need to get the face aligned somehow or you cant calculate error)
Calculate the error between the ground truth 3D face and the output of your selected algorithm. You will have error numbers for a few specific faces you tried. We want to check the distance from each vertex in the ground truth to the closest vertex in the test mesh, and get error. We created a MATLAB script to do this in the google drive link. Links to an external site. (UCSC has a Campus-Wide License for MATLAB and you can also use the online version of MATLAB Links to an external site. using your CruzID, if you don't want to install it).
There will be errors caused by edge cases (your face is only the front but the ground truth has the back of the head. does that count as lots of error?). See if you can figure out how to calculate error in a way that is "fair" and doesnt include crazy numbers. You may use only the front of your generated face, and compare it to the "ground_truth_frontface.ply" given in the google drive.
Turn in a report on what you did (include pictures), including parameter settings, etc., and the 3D meshes you generated (saved after alignment to the ground truth), and the errors you calculated.

What to turn in:

A single zip file containing a pdf report (with pictures), the generated 3D meshes, and code/files (if any) with clear instructions to run.
Add at least one more slide after your slides from Part 1 with your progress/results/findings to this google slides deck Links to an external site.. All the students will present from this slides deck after Part 3 to share our findings as a class.

Submission deadline: Feb 21, 2021 at 11.59pm

Part 3: Test rigorously across more input data

Note: An updated MATLAB code (Error_Vis_1.1.zip) for error visualization color map and CDF plot is provided in the Google Drive link Links to an external site.. Read more about it here Links to an external site..

Produce a results table or figure or plot worthy of being in a paper.
- That is, we want to make a statement about the accuracy of this method from this paper, not just a single trial, but a more careful analysis.
- Try all the angles. Can you plot error as a function of input mesh angle? Does it matter?
- If you have a multiview method, could you plot the accuracy as a function of the number of input views?
- Are there manual steps in your process? If I asked you to run your analysis on 100 faces instead of 1, could you generate the all the numbers? Can you remove/minimize) those manual steps? (If you got interesting results I may actually ask if you can do this, since we are working on a related paper)
Produce a results table or figure or plot worthy of being in a paper.
- That is, we want to make a statement about the accuracy of this method from this paper, not just a single trial, but a more careful analysis.
- There are multiple levels of analysis that might exist in a paper, which I list as steps below.
- I am specifying some specific viewpoints and error thresholds to facilitate comparison as a class
- ML based methods from a single photo
  - Step A0 - Show mesh results without texture for Front and Profile comparing:
    - Images of Ground Truth
    - Images of recovered mesh for Viewpoints #X,#Y,#Z=40,44,22 (The viewpoint numbers)
    - I might not have chosen viewpoints wisely - Include images of the recovered mesh for the Viewpoint that best shows an interesting failure case #F and success #S, if this is different than X,Y,Z
  - Step A - Generate error visualization plots, making sure they are fair comparisons. This is still mostly visual, but it allows us to see where the error is occurring.
    - Align the meshes X,Y,Z each individually to ground truth (keep groundtruth fixed)
    - Trim the groundtruth as needed so that it only exists where your method actually created a mesh
    - Create the error visualization color map, using each of 2mm and 5mm as max
    - I might not have chosen 2mm and 5mm wisely, include whichever threshold you think is actually best
  - Step B - We want to understand the error distribution in a quantitative way as well. In addition to a single number like MAE (Mean Absolute Error) we can look at the distribution of all errors in a reconstruction.
    - Plot each of X,Y,Z on the same CDF plot, with labels, using xAxisMax=5mm
    - I might not have chosen 5mm wisely, change the scale if you think you can make a more informative plot
- Agisoft
  - Step A0 - Show mesh results without texture for Front and Profile comparing:
    - Images of Ground Truth
    - Images of recovered mesh using as many viewpoints as possible (around 60 or so)
    - Images of recovered mesh using two viewpoints #40, #36
      - Or if this doesnt work, choose two others
    - Images of recovered mesh using three viewpoints #40, #36, #48
      - Or if this doesnt work, choose three others
    - I might not have chosen viewpoints wisely - Include images of the recovered mesh for the Viewpoint set that best shows an interesting failure case and success, if this is different than the three conditions I suggested
  - Step A - Generate error visualization plots, making sure they are fair comparisons. This is still mostly visual, but it allows us to see where the error is occurring.
    - Align the meshes X,Y,Z each individually to ground truth (keep groundtruth fixed)
    - Trim the groundtruth as needed so that it only exists where your method actually created a mesh
    - Create the error visualization color map, using each of 2mm and 5mm as max
    - I might not have chosen 2mm and 5mm wisely, include whichever threshold you think is actually best
  - Step B - We want to understand the error distribution in a quantitative way as well. In addition to a single number like MAE (Mean Absolute Error) we can look at the distribution of all errors in a reconstruction.
    - Plot each of X,Y,Z on the same CDF plot, with labels, using xAxisMax=5mm
    - I might not have chosen 5mm wisely, change the scale if you think you can make a more informative plot
- Extra: Brainstorm what we should do next if we really want to compare these methods and understand what works and doesn't
  - Step C - Make a recommendation of a new figure or change in existing figure that you think it is possible to get which makes the results 'better science' or 'more believable'. Think carefully on what it is actually possible to do.
    - In terms of resources you dont have now, but could have
      - We have access to more test data of 100 other people with exactly the same camera setup and views and groundtruth format
      - We can probably figure out the camera positions
  - Step C - Create a draft figure to show what this would look like with labels, but possibly fake data. Try to guess what you will show, and make the fake data resemble what you expect to see

What to turn in:

A single zip file containing a pdf report, the generated 3D meshes (after alignment to groundtruth), and code/files (if any) with clear instructions to run.
Add a slide that looks something like the sample slide I added to the front of the deck. If we follow a similar layout and threshold levels as specified then we will be able to have a better comparison between results from different students.
Add additional slides with more results and interesting cases the class might like to see (optional depending on if you have something to show)
Leave your old slides there, and make sure all your slides are labeled obviously with your name
Label slides obviously in the top left corner with "Previous" if they were previously shown.
There will not be a "formal" presentation, but we will go through slides on Thursday for what we can learn as a whole class.

Submission deadline: Mar 3, 2021 at 11.59pm