diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
new file mode 100644
index 0000000000000000000000000000000000000000..08b500a221857ec3f451338e80b4a9ab1173a1af
--- /dev/null
+++ b/CODE_OF_CONDUCT.md
@@ -0,0 +1,80 @@
+# Code of Conduct
+
+## Our Pledge
+
+In the interest of fostering an open and welcoming environment, we as
+contributors and maintainers pledge to make participation in our project and
+our community a harassment-free experience for everyone, regardless of age, body
+size, disability, ethnicity, sex characteristics, gender identity and expression,
+level of experience, education, socio-economic status, nationality, personal
+appearance, race, religion, or sexual identity and orientation.
+
+## Our Standards
+
+Examples of behavior that contributes to creating a positive environment
+include:
+
+* Using welcoming and inclusive language
+* Being respectful of differing viewpoints and experiences
+* Gracefully accepting constructive criticism
+* Focusing on what is best for the community
+* Showing empathy towards other community members
+
+Examples of unacceptable behavior by participants include:
+
+* The use of sexualized language or imagery and unwelcome sexual attention or
+  advances
+* Trolling, insulting/derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or electronic
+  address, without explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Our Responsibilities
+
+Project maintainers are responsible for clarifying the standards of acceptable
+behavior and are expected to take appropriate and fair corrective action in
+response to any instances of unacceptable behavior.
+
+Project maintainers have the right and responsibility to remove, edit, or
+reject comments, commits, code, wiki edits, issues, and other contributions
+that are not aligned to this Code of Conduct, or to ban temporarily or
+permanently any contributor for other behaviors that they deem inappropriate,
+threatening, offensive, or harmful.
+
+## Scope
+
+This Code of Conduct applies within all project spaces, and it also applies when
+an individual is representing the project or its community in public spaces.
+Examples of representing a project or community include using an official
+project e-mail address, posting via an official social media account, or acting
+as an appointed representative at an online or offline event. Representation of
+a project may be further defined and clarified by project maintainers.
+
+This Code of Conduct also applies outside the project spaces when there is a
+reasonable belief that an individual's behavior may have a negative impact on
+the project or its community.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported by contacting the project team at <opensource-conduct@fb.com>. All
+complaints will be reviewed and investigated and will result in a response that
+is deemed necessary and appropriate to the circumstances. The project team is
+obligated to maintain confidentiality with regard to the reporter of an incident.
+Further details of specific enforcement policies may be posted separately.
+
+Project maintainers who do not follow or enforce the Code of Conduct in good
+faith may face temporary or permanent repercussions as determined by other
+members of the project's leadership.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
+available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
+
+[homepage]: https://www.contributor-covenant.org
+
+For answers to common questions about this code of conduct, see
+https://www.contributor-covenant.org/faq
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000000000000000000000000000000000000..c88cc4f734d301267f3e7c00f6cfe4baf9a8222c
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,31 @@
+# Contributing to PoseDiffusion
+We want to make contributing to this project as easy and transparent as
+possible.
+
+## Pull Requests
+We actively welcome your pull requests.
+
+1. Fork the repo and create your branch from `main`.
+2. If you've added code that should be tested, add tests.
+3. If you've changed APIs, update the documentation.
+4. Ensure the test suite passes.
+5. Make sure your code lints.
+6. If you haven't already, complete the Contributor License Agreement ("CLA").
+
+## Contributor License Agreement ("CLA")
+In order to accept your pull request, we need you to submit a CLA. You only need
+to do this once to work on any of Facebook's open source projects.
+
+Complete your CLA here: <https://code.facebook.com/cla>
+
+## Issues
+We use GitHub issues to track public bugs. Please ensure your description is
+clear and has sufficient instructions to be able to reproduce the issue.
+
+Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe
+disclosure of security bugs. In those cases, please go through the process
+outlined on that page and do not file a public issue.
+
+## License
+By contributing to PoseDiffusion, you agree that your contributions will be licensed
+under the LICENSE file in the root directory of this source tree.
\ No newline at end of file
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000000000000000000000000000000000000..e395ca3e2cdebf48a6375a3c1022d10caabba7db
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,399 @@
+Attribution-NonCommercial 4.0 International
+
+=======================================================================
+
+Creative Commons Corporation ("Creative Commons") is not a law firm and
+does not provide legal services or legal advice. Distribution of
+Creative Commons public licenses does not create a lawyer-client or
+other relationship. Creative Commons makes its licenses and related
+information available on an "as-is" basis. Creative Commons gives no
+warranties regarding its licenses, any material licensed under their
+terms and conditions, or any related information. Creative Commons
+disclaims all liability for damages resulting from their use to the
+fullest extent possible.
+
+Using Creative Commons Public Licenses
+
+Creative Commons public licenses provide a standard set of terms and
+conditions that creators and other rights holders may use to share
+original works of authorship and other material subject to copyright
+and certain other rights specified in the public license below. The
+following considerations are for informational purposes only, are not
+exhaustive, and do not form part of our licenses.
+
+     Considerations for licensors: Our public licenses are
+     intended for use by those authorized to give the public
+     permission to use material in ways otherwise restricted by
+     copyright and certain other rights. Our licenses are
+     irrevocable. Licensors should read and understand the terms
+     and conditions of the license they choose before applying it.
+     Licensors should also secure all rights necessary before
+     applying our licenses so that the public can reuse the
+     material as expected. Licensors should clearly mark any
+     material not subject to the license. This includes other CC-
+     licensed material, or material used under an exception or
+     limitation to copyright. More considerations for licensors:
+	wiki.creativecommons.org/Considerations_for_licensors
+
+     Considerations for the public: By using one of our public
+     licenses, a licensor grants the public permission to use the
+     licensed material under specified terms and conditions. If
+     the licensor's permission is not necessary for any reason--for
+     example, because of any applicable exception or limitation to
+     copyright--then that use is not regulated by the license. Our
+     licenses grant only permissions under copyright and certain
+     other rights that a licensor has authority to grant. Use of
+     the licensed material may still be restricted for other
+     reasons, including because others have copyright or other
+     rights in the material. A licensor may make special requests,
+     such as asking that all changes be marked or described.
+     Although not required by our licenses, you are encouraged to
+     respect those requests where reasonable. More_considerations
+     for the public: 
+	wiki.creativecommons.org/Considerations_for_licensees
+
+=======================================================================
+
+Creative Commons Attribution-NonCommercial 4.0 International Public
+License
+
+By exercising the Licensed Rights (defined below), You accept and agree
+to be bound by the terms and conditions of this Creative Commons
+Attribution-NonCommercial 4.0 International Public License ("Public
+License"). To the extent this Public License may be interpreted as a
+contract, You are granted the Licensed Rights in consideration of Your
+acceptance of these terms and conditions, and the Licensor grants You
+such rights in consideration of benefits the Licensor receives from
+making the Licensed Material available under these terms and
+conditions.
+
+Section 1 -- Definitions.
+
+  a. Adapted Material means material subject to Copyright and Similar
+     Rights that is derived from or based upon the Licensed Material
+     and in which the Licensed Material is translated, altered,
+     arranged, transformed, or otherwise modified in a manner requiring
+     permission under the Copyright and Similar Rights held by the
+     Licensor. For purposes of this Public License, where the Licensed
+     Material is a musical work, performance, or sound recording,
+     Adapted Material is always produced where the Licensed Material is
+     synched in timed relation with a moving image.
+
+  b. Adapter's License means the license You apply to Your Copyright
+     and Similar Rights in Your contributions to Adapted Material in
+     accordance with the terms and conditions of this Public License.
+
+  c. Copyright and Similar Rights means copyright and/or similar rights
+     closely related to copyright including, without limitation,
+     performance, broadcast, sound recording, and Sui Generis Database
+     Rights, without regard to how the rights are labeled or
+     categorized. For purposes of this Public License, the rights
+     specified in Section 2(b)(1)-(2) are not Copyright and Similar
+     Rights.
+  d. Effective Technological Measures means those measures that, in the
+     absence of proper authority, may not be circumvented under laws
+     fulfilling obligations under Article 11 of the WIPO Copyright
+     Treaty adopted on December 20, 1996, and/or similar international
+     agreements.
+
+  e. Exceptions and Limitations means fair use, fair dealing, and/or
+     any other exception or limitation to Copyright and Similar Rights
+     that applies to Your use of the Licensed Material.
+
+  f. Licensed Material means the artistic or literary work, database,
+     or other material to which the Licensor applied this Public
+     License.
+
+  g. Licensed Rights means the rights granted to You subject to the
+     terms and conditions of this Public License, which are limited to
+     all Copyright and Similar Rights that apply to Your use of the
+     Licensed Material and that the Licensor has authority to license.
+
+  h. Licensor means the individual(s) or entity(ies) granting rights
+     under this Public License.
+
+  i. NonCommercial means not primarily intended for or directed towards
+     commercial advantage or monetary compensation. For purposes of
+     this Public License, the exchange of the Licensed Material for
+     other material subject to Copyright and Similar Rights by digital
+     file-sharing or similar means is NonCommercial provided there is
+     no payment of monetary compensation in connection with the
+     exchange.
+
+  j. Share means to provide material to the public by any means or
+     process that requires permission under the Licensed Rights, such
+     as reproduction, public display, public performance, distribution,
+     dissemination, communication, or importation, and to make material
+     available to the public including in ways that members of the
+     public may access the material from a place and at a time
+     individually chosen by them.
+
+  k. Sui Generis Database Rights means rights other than copyright
+     resulting from Directive 96/9/EC of the European Parliament and of
+     the Council of 11 March 1996 on the legal protection of databases,
+     as amended and/or succeeded, as well as other essentially
+     equivalent rights anywhere in the world.
+
+  l. You means the individual or entity exercising the Licensed Rights
+     under this Public License. Your has a corresponding meaning.
+
+Section 2 -- Scope.
+
+  a. License grant.
+
+       1. Subject to the terms and conditions of this Public License,
+          the Licensor hereby grants You a worldwide, royalty-free,
+          non-sublicensable, non-exclusive, irrevocable license to
+          exercise the Licensed Rights in the Licensed Material to:
+
+            a. reproduce and Share the Licensed Material, in whole or
+               in part, for NonCommercial purposes only; and
+
+            b. produce, reproduce, and Share Adapted Material for
+               NonCommercial purposes only.
+
+       2. Exceptions and Limitations. For the avoidance of doubt, where
+          Exceptions and Limitations apply to Your use, this Public
+          License does not apply, and You do not need to comply with
+          its terms and conditions.
+
+       3. Term. The term of this Public License is specified in Section
+          6(a).
+
+       4. Media and formats; technical modifications allowed. The
+          Licensor authorizes You to exercise the Licensed Rights in
+          all media and formats whether now known or hereafter created,
+          and to make technical modifications necessary to do so. The
+          Licensor waives and/or agrees not to assert any right or
+          authority to forbid You from making technical modifications
+          necessary to exercise the Licensed Rights, including
+          technical modifications necessary to circumvent Effective
+          Technological Measures. For purposes of this Public License,
+          simply making modifications authorized by this Section 2(a)
+          (4) never produces Adapted Material.
+
+       5. Downstream recipients.
+
+            a. Offer from the Licensor -- Licensed Material. Every
+               recipient of the Licensed Material automatically
+               receives an offer from the Licensor to exercise the
+               Licensed Rights under the terms and conditions of this
+               Public License.
+
+            b. No downstream restrictions. You may not offer or impose
+               any additional or different terms or conditions on, or
+               apply any Effective Technological Measures to, the
+               Licensed Material if doing so restricts exercise of the
+               Licensed Rights by any recipient of the Licensed
+               Material.
+
+       6. No endorsement. Nothing in this Public License constitutes or
+          may be construed as permission to assert or imply that You
+          are, or that Your use of the Licensed Material is, connected
+          with, or sponsored, endorsed, or granted official status by,
+          the Licensor or others designated to receive attribution as
+          provided in Section 3(a)(1)(A)(i).
+
+  b. Other rights.
+
+       1. Moral rights, such as the right of integrity, are not
+          licensed under this Public License, nor are publicity,
+          privacy, and/or other similar personality rights; however, to
+          the extent possible, the Licensor waives and/or agrees not to
+          assert any such rights held by the Licensor to the limited
+          extent necessary to allow You to exercise the Licensed
+          Rights, but not otherwise.
+
+       2. Patent and trademark rights are not licensed under this
+          Public License.
+
+       3. To the extent possible, the Licensor waives any right to
+          collect royalties from You for the exercise of the Licensed
+          Rights, whether directly or through a collecting society
+          under any voluntary or waivable statutory or compulsory
+          licensing scheme. In all other cases the Licensor expressly
+          reserves any right to collect such royalties, including when
+          the Licensed Material is used other than for NonCommercial
+          purposes.
+
+Section 3 -- License Conditions.
+
+Your exercise of the Licensed Rights is expressly made subject to the
+following conditions.
+
+  a. Attribution.
+
+       1. If You Share the Licensed Material (including in modified
+          form), You must:
+
+            a. retain the following if it is supplied by the Licensor
+               with the Licensed Material:
+
+                 i. identification of the creator(s) of the Licensed
+                    Material and any others designated to receive
+                    attribution, in any reasonable manner requested by
+                    the Licensor (including by pseudonym if
+                    designated);
+
+                ii. a copyright notice;
+
+               iii. a notice that refers to this Public License;
+
+                iv. a notice that refers to the disclaimer of
+                    warranties;
+
+                 v. a URI or hyperlink to the Licensed Material to the
+                    extent reasonably practicable;
+
+            b. indicate if You modified the Licensed Material and
+               retain an indication of any previous modifications; and
+
+            c. indicate the Licensed Material is licensed under this
+               Public License, and include the text of, or the URI or
+               hyperlink to, this Public License.
+
+       2. You may satisfy the conditions in Section 3(a)(1) in any
+          reasonable manner based on the medium, means, and context in
+          which You Share the Licensed Material. For example, it may be
+          reasonable to satisfy the conditions by providing a URI or
+          hyperlink to a resource that includes the required
+          information.
+
+       3. If requested by the Licensor, You must remove any of the
+          information required by Section 3(a)(1)(A) to the extent
+          reasonably practicable.
+
+       4. If You Share Adapted Material You produce, the Adapter's
+          License You apply must not prevent recipients of the Adapted
+          Material from complying with this Public License.
+
+Section 4 -- Sui Generis Database Rights.
+
+Where the Licensed Rights include Sui Generis Database Rights that
+apply to Your use of the Licensed Material:
+
+  a. for the avoidance of doubt, Section 2(a)(1) grants You the right
+     to extract, reuse, reproduce, and Share all or a substantial
+     portion of the contents of the database for NonCommercial purposes
+     only;
+
+  b. if You include all or a substantial portion of the database
+     contents in a database in which You have Sui Generis Database
+     Rights, then the database in which You have Sui Generis Database
+     Rights (but not its individual contents) is Adapted Material; and
+
+  c. You must comply with the conditions in Section 3(a) if You Share
+     all or a substantial portion of the contents of the database.
+
+For the avoidance of doubt, this Section 4 supplements and does not
+replace Your obligations under this Public License where the Licensed
+Rights include other Copyright and Similar Rights.
+
+Section 5 -- Disclaimer of Warranties and Limitation of Liability.
+
+  a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
+     EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
+     AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
+     ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
+     IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
+     WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
+     PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
+     ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
+     KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
+     ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
+
+  b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
+     TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
+     NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
+     INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
+     COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
+     USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
+     ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
+     DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
+     IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
+
+  c. The disclaimer of warranties and limitation of liability provided
+     above shall be interpreted in a manner that, to the extent
+     possible, most closely approximates an absolute disclaimer and
+     waiver of all liability.
+
+Section 6 -- Term and Termination.
+
+  a. This Public License applies for the term of the Copyright and
+     Similar Rights licensed here. However, if You fail to comply with
+     this Public License, then Your rights under this Public License
+     terminate automatically.
+
+  b. Where Your right to use the Licensed Material has terminated under
+     Section 6(a), it reinstates:
+
+       1. automatically as of the date the violation is cured, provided
+          it is cured within 30 days of Your discovery of the
+          violation; or
+
+       2. upon express reinstatement by the Licensor.
+
+     For the avoidance of doubt, this Section 6(b) does not affect any
+     right the Licensor may have to seek remedies for Your violations
+     of this Public License.
+
+  c. For the avoidance of doubt, the Licensor may also offer the
+     Licensed Material under separate terms or conditions or stop
+     distributing the Licensed Material at any time; however, doing so
+     will not terminate this Public License.
+
+  d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
+     License.
+
+Section 7 -- Other Terms and Conditions.
+
+  a. The Licensor shall not be bound by any additional or different
+     terms or conditions communicated by You unless expressly agreed.
+
+  b. Any arrangements, understandings, or agreements regarding the
+     Licensed Material not stated herein are separate from and
+     independent of the terms and conditions of this Public License.
+
+Section 8 -- Interpretation.
+
+  a. For the avoidance of doubt, this Public License does not, and
+     shall not be interpreted to, reduce, limit, restrict, or impose
+     conditions on any use of the Licensed Material that could lawfully
+     be made without permission under this Public License.
+
+  b. To the extent possible, if any provision of this Public License is
+     deemed unenforceable, it shall be automatically reformed to the
+     minimum extent necessary to make it enforceable. If the provision
+     cannot be reformed, it shall be severed from this Public License
+     without affecting the enforceability of the remaining terms and
+     conditions.
+
+  c. No term or condition of this Public License will be waived and no
+     failure to comply consented to unless expressly agreed to by the
+     Licensor.
+
+  d. Nothing in this Public License constitutes or may be interpreted
+     as a limitation upon, or waiver of, any privileges and immunities
+     that apply to the Licensor or You, including from the legal
+     processes of any jurisdiction or authority.
+
+=======================================================================
+
+Creative Commons is not a party to its public
+licenses. Notwithstanding, Creative Commons may elect to apply one of
+its public licenses to material it publishes and in those instances
+will be considered the “Licensor.” The text of the Creative Commons
+public licenses is dedicated to the public domain under the CC0 Public
+Domain Dedication. Except for the limited purpose of indicating that
+material is shared under a Creative Commons public license or as
+otherwise permitted by the Creative Commons policies published at
+creativecommons.org/policies, Creative Commons does not authorize the
+use of the trademark "Creative Commons" or any other trademark or logo
+of Creative Commons without its prior written consent including,
+without limitation, in connection with any unauthorized modifications
+to any of its public licenses or any other arrangements,
+understandings, or agreements concerning use of licensed material. For
+the avoidance of doubt, this paragraph does not form part of the
+public licenses.
+
+Creative Commons may be contacted at creativecommons.org.
\ No newline at end of file
diff --git a/README.md b/README.md
index bc5f30d6632ac0efdc7be2e9095e9e9579af2e33..08ef81ec7ca35be21752575a0c040a3d64947cad 100644
--- a/README.md
+++ b/README.md
@@ -1,199 +1,89 @@
----
-library_name: transformers
-tags: []
----
+# [ECCV 2024] VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
 
-# Model Card for Model ID
+[Porject page](https://junlinhan.github.io/projects/vfusion3d.html), [Paper link](https://arxiv.org/abs/2403.12034)
 
-<!-- Provide a quick summary of what the model is/does. -->
+VFusion3D is a large, feed-forward 3D generative model trained with a small amount of 3D data and a large volume of synthetic multi-view data. It is the first work exploring scalable 3D generative/reconstruction models as a step towards a 3D foundation.
 
+[VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models](https://junlinhan.github.io/projects/vfusion3d.html)<br>
+[Junlin Han](https://junlinhan.github.io/), [Filippos Kokkinos](https://www.fkokkinos.com/), [Philip Torr](https://www.robots.ox.ac.uk/~phst/)<br>
+GenAI, Meta and TVG, University of Oxford<br>
+European Conference on Computer Vision (ECCV), 2024
 
 
-## Model Details
+## News
 
-### Model Description
+- [25.07.2024] Release weights and inference code for VFusion3D.
 
-<!-- Provide a longer summary of what this model is. -->
+## Results and Comparisons
 
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+### 3D Generation Results
+<img src='images/gif1.gif' width=950>
 
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
+<img src='images/gif2.gif' width=950>
 
-### Model Sources [optional]
+### User Study Results
+<img src='images/user.png' width=950>
 
-<!-- Provide the basic links for the model. -->
 
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
+## Setup
 
-## Uses
+### Installation
+```
+git clone https://github.com/facebookresearch/vfusion3d
+cd vfusion3d
+```
 
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Environment
+We provide a simple installation script that, by default, sets up a conda environment with Python 3.8.19, PyTorch 2.3, and CUDA 12.1. Similar package versions should also work.
 
-### Direct Use
+```
+source install.sh
+```
 
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+## Quick Start
 
-[More Information Needed]
+### Pretrained Models
 
-### Downstream Use [optional]
+- Model weights are available here [Google Drive](https://drive.google.com/file/d/1b-KKSh9VquJdzmXzZBE4nKbXnbeua42X/view?usp=sharing). Please download it and put it inside ./checkpoints/
 
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 
-[More Information Needed]
+### Prepare Images
+- We put some sample inputs under `assets/40_prompt_images`, which is the 40 MVDream prompt images used in the paper. Results of them are also provided under `results/40_prompt_images_provided`. 
 
-### Out-of-Scope Use
+### Inference
+- Run the inference script to get 3D assets.
+- You may specify which form of output to generate by setting the flags `--export_video` and `--export_mesh`.
+- Change `--source_path` and `--dump_path` if you want to run it on other image folders. 
 
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+    ```
+    # Example usages
+    # Render a video
+    python -m lrm.inferrer --export_video --resume ./checkpoints/vfusion3dckpt
+    
+    # Export mesh
+    python -m lrm.inferrer --export_mesh --resume ./checkpoints/vfusion3dckpt
+    ```
 
-[More Information Needed]
 
-## Bias, Risks, and Limitations
+## Acknowledgement
 
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+- This inference code of VFusion3D heavily borrows from [OpenLRM](https://github.com/3DTopia/OpenLRM).  
 
-[More Information Needed]
+## Citation
 
-### Recommendations
+If you find this work useful, please cite us:
 
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+```
+@article{han2024vfusion3d,
+  title={VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models},
+  author={Junlin Han and Filippos Kokkinos and Philip Torr},
+  journal={European Conference on Computer Vision (ECCV)},
+  year={2024}
+}
+```
 
-## How to Get Started with the Model
+## License
 
-Use the code below to get started with the model.
-
-[More Information Needed]
-
-## Training Details
-
-### Training Data
-
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-
-[More Information Needed]
-
-### Training Procedure
-
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-
-#### Preprocessing [optional]
-
-[More Information Needed]
-
-
-#### Training Hyperparameters
-
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-
-#### Speeds, Sizes, Times [optional]
-
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-
-[More Information Needed]
-
-## Evaluation
-
-<!-- This section describes the evaluation protocols and provides the results. -->
-
-### Testing Data, Factors & Metrics
-
-#### Testing Data
-
-<!-- This should link to a Dataset Card if possible. -->
-
-[More Information Needed]
-
-#### Factors
-
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-
-[More Information Needed]
-
-#### Metrics
-
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-
-[More Information Needed]
-
-### Results
-
-[More Information Needed]
-
-#### Summary
-
-
-
-## Model Examination [optional]
-
-<!-- Relevant interpretability work for the model goes here -->
-
-[More Information Needed]
-
-## Environmental Impact
-
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-
-## Technical Specifications [optional]
-
-### Model Architecture and Objective
-
-[More Information Needed]
-
-### Compute Infrastructure
-
-[More Information Needed]
-
-#### Hardware
-
-[More Information Needed]
-
-#### Software
-
-[More Information Needed]
-
-## Citation [optional]
-
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-
-**BibTeX:**
-
-[More Information Needed]
-
-**APA:**
-
-[More Information Needed]
-
-## Glossary [optional]
-
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-
-[More Information Needed]
-
-## More Information [optional]
-
-[More Information Needed]
-
-## Model Card Authors [optional]
-
-[More Information Needed]
-
-## Model Card Contact
-
-[More Information Needed]
\ No newline at end of file
+- The majority of VFusion3D is licensed under CC-BY-NC, however portions of the project are available under separate license terms: OpenLRM as a whole is licensed under the Apache License, Version 2.0, while certain components are covered by NVIDIA's proprietary license.
+- The model weights of VFusion3D is also licensed under CC-BY-NC.
diff --git a/__init__.py b/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ff1adc86a447203216b139815a88c4356b4cab73
--- /dev/null
+++ b/__init__.py
@@ -0,0 +1 @@
+from .modeling import LRMGenerator, LRMGeneratorConfig
diff --git a/assets/40_prompt_images/A 3D scan of AK47, weapon.jpeg b/assets/40_prompt_images/A 3D scan of AK47, weapon.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..30d3c272af90e3b2f15fa11e6942de3e9033bad1
Binary files /dev/null and b/assets/40_prompt_images/A 3D scan of AK47, weapon.jpeg differ
diff --git a/assets/40_prompt_images/A DSLR photo of Sydney Opera House.jpg b/assets/40_prompt_images/A DSLR photo of Sydney Opera House.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..a9813a060c4aacaa9629cc96007a8caebfa9f95b
Binary files /dev/null and b/assets/40_prompt_images/A DSLR photo of Sydney Opera House.jpg differ
diff --git a/assets/40_prompt_images/A bald eagle carved out of wood.jpg b/assets/40_prompt_images/A bald eagle carved out of wood.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..c399dca0716fe69c0f8983649cf2b2cf5b011cb4
Binary files /dev/null and b/assets/40_prompt_images/A bald eagle carved out of wood.jpg differ
diff --git a/assets/40_prompt_images/A bulldog wearing a black pirate hat.jpeg b/assets/40_prompt_images/A bulldog wearing a black pirate hat.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..3bf3cc0ae812bed719c8b4397dd42aed602f7000
Binary files /dev/null and b/assets/40_prompt_images/A bulldog wearing a black pirate hat.jpeg differ
diff --git a/assets/40_prompt_images/A crab, low poly.jpg b/assets/40_prompt_images/A crab, low poly.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..4e67107ce5c28ad7e9ac73162fe9dad95ec074ee
Binary files /dev/null and b/assets/40_prompt_images/A crab, low poly.jpg differ
diff --git a/assets/40_prompt_images/A photo of a horse walking.jpeg b/assets/40_prompt_images/A photo of a horse walking.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..86ec571909e0ab82ac1190164bd69fcdd56960d2
Binary files /dev/null and b/assets/40_prompt_images/A photo of a horse walking.jpeg differ
diff --git a/assets/40_prompt_images/A pig wearing a backpack.jpeg b/assets/40_prompt_images/A pig wearing a backpack.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..075e8238fd8b467d8fde6c495c3d762e31c952ff
Binary files /dev/null and b/assets/40_prompt_images/A pig wearing a backpack.jpeg differ
diff --git a/assets/40_prompt_images/A product photo of a toy tank.jpg b/assets/40_prompt_images/A product photo of a toy tank.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..67b0fbc039b0e338d16f511f8182475139584b44
Binary files /dev/null and b/assets/40_prompt_images/A product photo of a toy tank.jpg differ
diff --git a/assets/40_prompt_images/A see no evil monkey on a kick drum.jpg b/assets/40_prompt_images/A see no evil monkey on a kick drum.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..4e8f46a233d98b54ea3d74dd805d57eb0ee4998c
Binary files /dev/null and b/assets/40_prompt_images/A see no evil monkey on a kick drum.jpg differ
diff --git a/assets/40_prompt_images/A statue of angel, blender.jpg b/assets/40_prompt_images/A statue of angel, blender.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..ffa76760f799fd4f11f6aa79e54d18b3c5b49b3d
Binary files /dev/null and b/assets/40_prompt_images/A statue of angel, blender.jpg differ
diff --git a/assets/40_prompt_images/Corgi riding a rocket.jpeg b/assets/40_prompt_images/Corgi riding a rocket.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..c21f3911c84b84f71170e732538a4cd891138d5e
Binary files /dev/null and b/assets/40_prompt_images/Corgi riding a rocket.jpeg differ
diff --git a/assets/40_prompt_images/Daenerys Targaryen from game of throne.jpg b/assets/40_prompt_images/Daenerys Targaryen from game of throne.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..700073d917d2ce80e42ec6212284fa43679ea01a
Binary files /dev/null and b/assets/40_prompt_images/Daenerys Targaryen from game of throne.jpg differ
diff --git a/assets/40_prompt_images/Darth Vader helmet,g highly detailed.jpg b/assets/40_prompt_images/Darth Vader helmet,g highly detailed.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..a55926dc6d315ae8649b450bca32b78f30a93325
Binary files /dev/null and b/assets/40_prompt_images/Darth Vader helmet,g highly detailed.jpg differ
diff --git a/assets/40_prompt_images/Dragon armor.jpeg b/assets/40_prompt_images/Dragon armor.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..980390085b60b2d91cb92f5478bef1170ec95623
Binary files /dev/null and b/assets/40_prompt_images/Dragon armor.jpeg differ
diff --git a/assets/40_prompt_images/Fisherman House, cute, cartoon, blender, stylized.jpg b/assets/40_prompt_images/Fisherman House, cute, cartoon, blender, stylized.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..5382c16b5586ac34e5e4e20121747c1c6f6d11c8
Binary files /dev/null and b/assets/40_prompt_images/Fisherman House, cute, cartoon, blender, stylized.jpg differ
diff --git a/assets/40_prompt_images/Flying Dragon, highly detailed, breathing fire.jpeg b/assets/40_prompt_images/Flying Dragon, highly detailed, breathing fire.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..a6dd85b38483f0048fe41eb1828a12f0ccc1edfc
Binary files /dev/null and b/assets/40_prompt_images/Flying Dragon, highly detailed, breathing fire.jpeg differ
diff --git a/assets/40_prompt_images/Handpainted watercolor windmill, hand-painted.jpg b/assets/40_prompt_images/Handpainted watercolor windmill, hand-painted.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..ada1875148b962387133b495baea884198c03217
Binary files /dev/null and b/assets/40_prompt_images/Handpainted watercolor windmill, hand-painted.jpg differ
diff --git a/assets/40_prompt_images/Katana.jpeg b/assets/40_prompt_images/Katana.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..6e0518179f7a671144f817b6a891d464c27d5979
Binary files /dev/null and b/assets/40_prompt_images/Katana.jpeg differ
diff --git a/assets/40_prompt_images/Little italian town, hand-painted style.jpg b/assets/40_prompt_images/Little italian town, hand-painted style.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e80e9556db92d72ea548d453d705f9d98d561115
Binary files /dev/null and b/assets/40_prompt_images/Little italian town, hand-painted style.jpg differ
diff --git a/assets/40_prompt_images/Mr Bean Cartoon doing a T Pose.jpg b/assets/40_prompt_images/Mr Bean Cartoon doing a T Pose.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..0c4efed01a6a93336f5b63fa6e34c5f84edf61ee
Binary files /dev/null and b/assets/40_prompt_images/Mr Bean Cartoon doing a T Pose.jpg differ
diff --git a/assets/40_prompt_images/Pedestal Fan (White).jpeg b/assets/40_prompt_images/Pedestal Fan (White).jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..0a9e6d9122fed02f81eddcf743e859183cf7a6a2
Binary files /dev/null and b/assets/40_prompt_images/Pedestal Fan (White).jpeg differ
diff --git a/assets/40_prompt_images/Pikachu with hat.jpg b/assets/40_prompt_images/Pikachu with hat.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..209e659bbb762fc4eead32da3f364f40cc47ee80
Binary files /dev/null and b/assets/40_prompt_images/Pikachu with hat.jpg differ
diff --git a/assets/40_prompt_images/Samurai koala bear.jpg b/assets/40_prompt_images/Samurai koala bear.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e8528fcd3ab8db90ab142673db7c2d244a6462bf
Binary files /dev/null and b/assets/40_prompt_images/Samurai koala bear.jpg differ
diff --git a/assets/40_prompt_images/TRUMP figure.jpg b/assets/40_prompt_images/TRUMP figure.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..6506341e7d644ad7be8a8c0d7569e543029a94f2
Binary files /dev/null and b/assets/40_prompt_images/TRUMP figure.jpg differ
diff --git a/assets/40_prompt_images/Viking axe, fantasy, weapon, blender, 8k, HD.jpg b/assets/40_prompt_images/Viking axe, fantasy, weapon, blender, 8k, HD.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b386bbe9e771e6763b0f044c87f6d0afb27a5991
Binary files /dev/null and b/assets/40_prompt_images/Viking axe, fantasy, weapon, blender, 8k, HD.jpg differ
diff --git a/assets/40_prompt_images/a DSLR photo of a frog wearing a sweater.jpg b/assets/40_prompt_images/a DSLR photo of a frog wearing a sweater.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e7e2da782249bd67b6e638926248ecdbc1edda2a
Binary files /dev/null and b/assets/40_prompt_images/a DSLR photo of a frog wearing a sweater.jpg differ
diff --git a/assets/40_prompt_images/a DSLR photo of a ghost eating a hamburger.jpg b/assets/40_prompt_images/a DSLR photo of a ghost eating a hamburger.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..0b5b72f32bae9b399f4a13ad9de4b9f18fd7d4a0
Binary files /dev/null and b/assets/40_prompt_images/a DSLR photo of a ghost eating a hamburger.jpg differ
diff --git a/assets/40_prompt_images/a DSLR photo of a peacock on a surfboard.jpeg b/assets/40_prompt_images/a DSLR photo of a peacock on a surfboard.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..9b3345c1aa637c434f1c03448b79bb367bc5415c
Binary files /dev/null and b/assets/40_prompt_images/a DSLR photo of a peacock on a surfboard.jpeg differ
diff --git a/assets/40_prompt_images/a DSLR photo of a squirrel playing guitar.jpg b/assets/40_prompt_images/a DSLR photo of a squirrel playing guitar.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..edd12b7ec3eb99eb636f3541e3bcae4858de17bd
Binary files /dev/null and b/assets/40_prompt_images/a DSLR photo of a squirrel playing guitar.jpg differ
diff --git a/assets/40_prompt_images/a DSLR photo of an eggshell broken in two with an adorable chick standing next to it.jpeg b/assets/40_prompt_images/a DSLR photo of an eggshell broken in two with an adorable chick standing next to it.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..3cdc1a155f3f01bcfcab843fc0b85e9586b5d31c
Binary files /dev/null and b/assets/40_prompt_images/a DSLR photo of an eggshell broken in two with an adorable chick standing next to it.jpeg differ
diff --git a/assets/40_prompt_images/an astronaut riding a horse.jpeg b/assets/40_prompt_images/an astronaut riding a horse.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..23af0cf9eec13cca0db613eae90e9ed30b696eef
Binary files /dev/null and b/assets/40_prompt_images/an astronaut riding a horse.jpeg differ
diff --git a/assets/40_prompt_images/animal skull pile.jpg b/assets/40_prompt_images/animal skull pile.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..bebef28be9075ef2f8816bb1228d900b9168fe7a
Binary files /dev/null and b/assets/40_prompt_images/animal skull pile.jpg differ
diff --git a/assets/40_prompt_images/army Jacket, 3D scan.jpg b/assets/40_prompt_images/army Jacket, 3D scan.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..4b6e1c5e8287c2d82abc8502c8cef214f4e13f03
Binary files /dev/null and b/assets/40_prompt_images/army Jacket, 3D scan.jpg differ
diff --git a/assets/40_prompt_images/baby yoda in the style of Mormookiee.jpg b/assets/40_prompt_images/baby yoda in the style of Mormookiee.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..44b855965e758586ccbb0b0305cc54ca12abdd0e
Binary files /dev/null and b/assets/40_prompt_images/baby yoda in the style of Mormookiee.jpg differ
diff --git a/assets/40_prompt_images/beautiful, intricate butterfly.jpg b/assets/40_prompt_images/beautiful, intricate butterfly.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..035746d20194888b202795ebab053d3b131e830a
Binary files /dev/null and b/assets/40_prompt_images/beautiful, intricate butterfly.jpg differ
diff --git a/assets/40_prompt_images/girl riding wolf, cute, cartoon, blender.jpg b/assets/40_prompt_images/girl riding wolf, cute, cartoon, blender.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..3374d4572e3d6872e1669d9dabf0065ec3fa0b78
Binary files /dev/null and b/assets/40_prompt_images/girl riding wolf, cute, cartoon, blender.jpg differ
diff --git a/assets/40_prompt_images/mecha vampire girl chibi.jpg b/assets/40_prompt_images/mecha vampire girl chibi.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..661e0f81b7973ce7ed47187b7d038bf6687531f7
Binary files /dev/null and b/assets/40_prompt_images/mecha vampire girl chibi.jpg differ
diff --git a/assets/40_prompt_images/military Mech, future, scifi.jpg b/assets/40_prompt_images/military Mech, future, scifi.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d5a709490519af18909bac394244dd9cead33fcf
Binary files /dev/null and b/assets/40_prompt_images/military Mech, future, scifi.jpg differ
diff --git a/assets/40_prompt_images/motorcycle, scifi, blender.jpeg b/assets/40_prompt_images/motorcycle, scifi, blender.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..5f54a0ede7c417feb56706f2b773aacc54c101b4
Binary files /dev/null and b/assets/40_prompt_images/motorcycle, scifi, blender.jpeg differ
diff --git a/assets/40_prompt_images/saber from fate stay night, 3D, girl, anime.jpeg b/assets/40_prompt_images/saber from fate stay night, 3D, girl, anime.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..b449e2c3bc95a18d08d86db8d49d6c4ee2d74434
Binary files /dev/null and b/assets/40_prompt_images/saber from fate stay night, 3D, girl, anime.jpeg differ
diff --git a/install.sh b/install.sh
new file mode 100644
index 0000000000000000000000000000000000000000..2c54722629fb7d5ce802359113b8632cff1a2aef
--- /dev/null
+++ b/install.sh
@@ -0,0 +1,25 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+# This Script Assumes Python 3.8.19, CUDA 12.1. Similar package versions might still work but they are not tested.
+
+conda deactivate
+
+# Set environment variables
+export ENV_NAME=vfusion3d
+export PYTHON_VERSION=3.8.19
+export CUDA_VERSION=12.1
+
+# Create a new conda environment and activate it
+conda create -n $ENV_NAME python=$PYTHON_VERSION
+conda activate $ENV_NAME
+conda install pytorch=2.3.0 torchvision==0.18.0 pytorch-cuda=$CUDA_VERSION -c pytorch -c nvidia
+pip install transformers
+pip install imageio[ffmpeg]
+pip install PyMCubes
+pip install trimesh
+pip install rembg[gpu,cli]
+pip install kiui
\ No newline at end of file
diff --git a/lrm/__init__.py b/lrm/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..75ec8635435eb80f60bbe4cfe48c7c3239b3466e
--- /dev/null
+++ b/lrm/__init__.py
@@ -0,0 +1,5 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
\ No newline at end of file
diff --git a/lrm/cam_utils.py b/lrm/cam_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..1f11c7a909dcd2f5afe8c7ff7bffc063f7ffeafd
--- /dev/null
+++ b/lrm/cam_utils.py
@@ -0,0 +1,138 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+import torch
+import numpy as np
+import math
+
+"""
+R: (N, 3, 3)
+T: (N, 3)
+E: (N, 4, 4)
+vector: (N, 3)
+"""
+
+
+def compose_extrinsic_R_T(R: torch.Tensor, T: torch.Tensor):
+    """
+    Compose the standard form extrinsic matrix from R and T.
+    Batched I/O.
+    """
+    RT = torch.cat((R, T.unsqueeze(-1)), dim=-1)
+    return compose_extrinsic_RT(RT)
+
+
+def compose_extrinsic_RT(RT: torch.Tensor):
+    """
+    Compose the standard form extrinsic matrix from RT.
+    Batched I/O.
+    """
+    return torch.cat([
+        RT,
+        torch.tensor([[[0, 0, 0, 1]]], dtype=torch.float32).repeat(RT.shape[0], 1, 1).to(RT.device)
+        ], dim=1)
+
+
+def decompose_extrinsic_R_T(E: torch.Tensor):
+    """
+    Decompose the standard extrinsic matrix into R and T.
+    Batched I/O.
+    """
+    RT = decompose_extrinsic_RT(E)
+    return RT[:, :, :3], RT[:, :, 3]
+
+
+def decompose_extrinsic_RT(E: torch.Tensor):
+    """
+    Decompose the standard extrinsic matrix into RT.
+    Batched I/O.
+    """
+    return E[:, :3, :]
+
+
+def get_normalized_camera_intrinsics(intrinsics: torch.Tensor):
+    """
+    intrinsics: (N, 3, 2), [[fx, fy], [cx, cy], [width, height]]
+    Return batched fx, fy, cx, cy
+    """
+    fx, fy = intrinsics[:, 0, 0], intrinsics[:, 0, 1]
+    cx, cy = intrinsics[:, 1, 0], intrinsics[:, 1, 1]
+    width, height = intrinsics[:, 2, 0], intrinsics[:, 2, 1]
+    fx, fy = fx / width, fy / height
+    cx, cy = cx / width, cy / height
+    return fx, fy, cx, cy
+
+
+def build_camera_principle(RT: torch.Tensor, intrinsics: torch.Tensor):
+    """
+    RT: (N, 3, 4)
+    intrinsics: (N, 3, 2), [[fx, fy], [cx, cy], [width, height]]
+    """
+    fx, fy, cx, cy = get_normalized_camera_intrinsics(intrinsics)
+    return torch.cat([
+        RT.reshape(-1, 12),
+        fx.unsqueeze(-1), fy.unsqueeze(-1), cx.unsqueeze(-1), cy.unsqueeze(-1),
+    ], dim=-1)
+
+
+def build_camera_standard(RT: torch.Tensor, intrinsics: torch.Tensor):
+    """
+    RT: (N, 3, 4)
+    intrinsics: (N, 3, 2), [[fx, fy], [cx, cy], [width, height]]
+    """
+    E = compose_extrinsic_RT(RT)
+    fx, fy, cx, cy = get_normalized_camera_intrinsics(intrinsics)
+    I = torch.stack([
+        torch.stack([fx, torch.zeros_like(fx), cx], dim=-1),
+        torch.stack([torch.zeros_like(fy), fy, cy], dim=-1),
+        torch.tensor([[0, 0, 1]], dtype=torch.float32, device=RT.device).repeat(RT.shape[0], 1),
+    ], dim=1)
+    return torch.cat([
+        E.reshape(-1, 16),
+        I.reshape(-1, 9),
+    ], dim=-1)
+
+
+def center_looking_at_camera_pose(camera_position: torch.Tensor, look_at: torch.Tensor = None, up_world: torch.Tensor = None):
+    """
+    camera_position: (M, 3)
+    look_at: (3)
+    up_world: (3)
+    return: (M, 3, 4)
+    """
+    # by default, looking at the origin and world up is pos-z
+    if look_at is None:
+        look_at = torch.tensor([0, 0, 0], dtype=torch.float32)
+    if up_world is None:
+        up_world = torch.tensor([0, 0, 1], dtype=torch.float32)
+    look_at = look_at.unsqueeze(0).repeat(camera_position.shape[0], 1)
+    up_world = up_world.unsqueeze(0).repeat(camera_position.shape[0], 1)
+
+    z_axis = camera_position - look_at
+    z_axis = z_axis / z_axis.norm(dim=-1, keepdim=True)
+    x_axis = torch.cross(up_world, z_axis)
+    x_axis = x_axis / x_axis.norm(dim=-1, keepdim=True)
+    y_axis = torch.cross(z_axis, x_axis)
+    y_axis = y_axis / y_axis.norm(dim=-1, keepdim=True)
+    extrinsics = torch.stack([x_axis, y_axis, z_axis, camera_position], dim=-1)
+    return extrinsics
+
+def get_surrounding_views(M, radius, elevation):
+#   convert spherical coordinates (radius, azimuth, elevation) to Cartesian coordinates (x, y, z).
+    camera_positions = []
+    rand_theta= np.random.uniform(0, np.pi/180)
+    elevation = math.radians(elevation)
+    for i in range(M):
+        theta = 2 * math.pi * i / M  + rand_theta
+        x = radius * math.cos(theta) * math.cos(elevation)
+        y = radius * math.sin(theta) * math.cos(elevation)
+        z =  radius * math.sin(elevation)
+        camera_positions.append([x, y, z])
+    camera_positions = torch.tensor(camera_positions, dtype=torch.float32)
+    extrinsics = center_looking_at_camera_pose(camera_positions)
+
+    return extrinsics
diff --git a/lrm/inferrer.py b/lrm/inferrer.py
new file mode 100644
index 0000000000000000000000000000000000000000..b9a0b39ea60b0f33e261feceab2b62ca924508bb
--- /dev/null
+++ b/lrm/inferrer.py
@@ -0,0 +1,232 @@
+import torch
+import math
+import os
+import imageio
+import mcubes
+import trimesh
+import numpy as np
+import argparse
+from torchvision.utils import save_image
+from PIL import Image
+import glob
+from .models.generator import LRMGenerator  # Make sure this import is correct
+from .cam_utils import build_camera_principle, build_camera_standard, center_looking_at_camera_pose  # Make sure this import is correct
+from functools import partial
+from rembg import remove, new_session
+from kiui.op import recenter
+import kiui
+
+class LRMInferrer:
+    def __init__(self, model_name: str, resume: str):
+        print("Initializing LRMInferrer")
+        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+        _model_kwargs = {'camera_embed_dim': 1024, 'rendering_samples_per_ray': 128, 'transformer_dim': 1024, 'transformer_layers': 16, 'transformer_heads': 16, 'triplane_low_res': 32, 'triplane_high_res': 64, 'triplane_dim': 80, 'encoder_freeze': False}
+        
+        self.model = self._build_model(_model_kwargs).eval().to(self.device)
+        checkpoint = torch.load(resume, map_location='cpu')
+        state_dict = checkpoint['model_state_dict']
+        self.model.load_state_dict(state_dict)
+        del checkpoint, state_dict
+        torch.cuda.empty_cache()
+
+    def __enter__(self):
+        print("Entering context")
+        return self
+
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        print("Exiting context")
+        if exc_type:
+            print(f"Exception type: {exc_type}")
+            print(f"Exception value: {exc_val}")
+            print(f"Traceback: {exc_tb}")
+
+    def _build_model(self, model_kwargs):
+        print("Building model")
+        model = LRMGenerator(**model_kwargs).to(self.device)
+        print("Loaded model from checkpoint")
+        return model
+
+    @staticmethod
+    def get_surrounding_views(M, radius, elevation):
+        camera_positions = []
+        rand_theta = np.random.uniform(0, np.pi/180)
+        elevation = math.radians(elevation)
+        for i in range(M):
+            theta = 2 * math.pi * i / M + rand_theta
+            x = radius * math.cos(theta) * math.cos(elevation)
+            y = radius * math.sin(theta) * math.cos(elevation)
+            z = radius * math.sin(elevation)
+            camera_positions.append([x, y, z])
+        camera_positions = torch.tensor(camera_positions, dtype=torch.float32)
+        extrinsics = center_looking_at_camera_pose(camera_positions)
+        return extrinsics
+
+    @staticmethod
+    def _default_intrinsics():
+        fx = fy = 384
+        cx = cy = 256
+        w = h = 512
+        intrinsics = torch.tensor([
+            [fx, fy],
+            [cx, cy],
+            [w, h],
+        ], dtype=torch.float32)
+        return intrinsics
+
+    def _default_source_camera(self, batch_size: int = 1):
+        dist_to_center = 1.5
+        canonical_camera_extrinsics = torch.tensor([[
+            [0, 0, 1, 1],
+            [1, 0, 0, 0],
+            [0, 1, 0, 0],
+        ]], dtype=torch.float32)
+        canonical_camera_intrinsics = self._default_intrinsics().unsqueeze(0)
+        source_camera = build_camera_principle(canonical_camera_extrinsics, canonical_camera_intrinsics)
+        return source_camera.repeat(batch_size, 1)
+
+    def _default_render_cameras(self, batch_size: int = 1):
+        render_camera_extrinsics = self.get_surrounding_views(160, 1.5, 0)
+        render_camera_intrinsics = self._default_intrinsics().unsqueeze(0).repeat(render_camera_extrinsics.shape[0], 1, 1)
+        render_cameras = build_camera_standard(render_camera_extrinsics, render_camera_intrinsics)
+        return render_cameras.unsqueeze(0).repeat(batch_size, 1, 1)
+
+    @staticmethod
+    def images_to_video(images, output_path, fps, verbose=False):
+        os.makedirs(os.path.dirname(output_path), exist_ok=True)
+        frames = []
+        for i in range(images.shape[0]):
+            frame = (images[i].permute(1, 2, 0).cpu().numpy() * 255).astype(np.uint8)
+            assert frame.shape[0] == images.shape[2] and frame.shape[1] == images.shape[3], \
+                f"Frame shape mismatch: {frame.shape} vs {images.shape}"
+            assert frame.min() >= 0 and frame.max() <= 255, \
+                f"Frame value out of range: {frame.min()} ~ {frame.max()}"
+            frames.append(frame)
+        imageio.mimwrite(output_path, np.stack(frames), fps=fps)
+        if verbose:
+            print(f"Saved video to {output_path}")
+
+    def infer_single(self, image: torch.Tensor, render_size: int, mesh_size: int, export_video: bool, export_mesh: bool):
+        print("infer_single called")
+        mesh_thres = 1.0
+        chunk_size = 2
+        batch_size = 1
+
+        source_camera = self._default_source_camera(batch_size).to(self.device)
+        render_cameras = self._default_render_cameras(batch_size).to(self.device)
+
+        with torch.no_grad():
+            planes = self.model.forward(image, source_camera)
+            results = {}
+
+            if export_video:
+                print("Starting export_video")
+                frames = []
+                for i in range(0, render_cameras.shape[1], chunk_size):
+                    print(f"Processing chunk {i} to {i + chunk_size}")
+                    frames.append(
+                        self.model.synthesizer(
+                            planes,
+                            render_cameras[:, i:i+chunk_size],
+                            render_size,
+                            render_size,
+                            0,
+                            0
+                        )
+                    )
+                frames = {
+                    k: torch.cat([r[k] for r in frames], dim=1)
+                    for k in frames[0].keys()
+                }
+                results.update({
+                    'frames': frames,
+                })
+                print("Finished export_video")
+
+            if export_mesh:
+                print("Starting export_mesh")
+                grid_out = self.model.synthesizer.forward_grid(
+                    planes=planes,
+                    grid_size=mesh_size,
+                )
+                vtx, faces = mcubes.marching_cubes(grid_out['sigma'].float().squeeze(0).squeeze(-1).cpu().numpy(), mesh_thres)
+                vtx = vtx / (mesh_size - 1) * 2 - 1
+                vtx_tensor = torch.tensor(vtx, dtype=torch.float32, device=self.device).unsqueeze(0)
+                vtx_colors = self.model.synthesizer.forward_points(planes, vtx_tensor)['rgb'].float().squeeze(0).cpu().numpy()
+                vtx_colors = (vtx_colors * 255).astype(np.uint8)
+                mesh = trimesh.Trimesh(vertices=vtx, faces=faces, vertex_colors=vtx_colors)
+                results.update({
+                    'mesh': mesh,
+                })
+                print("Finished export_mesh")
+
+            return results
+
+    def infer(self, source_image: str, dump_path: str, source_size: int, render_size: int, mesh_size: int, export_video: bool, export_mesh: bool):
+        print("infer called")
+        session = new_session("isnet-general-use")
+        rembg_remove = partial(remove, session=session)
+        image_name = os.path.basename(source_image)
+        uid = image_name.split('.')[0]
+
+        image = kiui.read_image(source_image, mode='uint8')
+        image = rembg_remove(image)
+        mask = rembg_remove(image, only_mask=True)
+        image = recenter(image, mask, border_ratio=0.20)
+        os.makedirs(dump_path, exist_ok=True)
+
+        image = torch.tensor(np.array(image)).permute(2, 0, 1).unsqueeze(0) / 255.0
+        if image.shape[1] == 4:
+            image = image[:, :3, ...] * image[:, 3:, ...] + (1 - image[:, 3:, ...])
+        image = torch.nn.functional.interpolate(image, size=(source_size, source_size), mode='bicubic', align_corners=True)
+        image = torch.clamp(image, 0, 1)
+        save_image(image, os.path.join(dump_path, f'{uid}.png'))
+
+        results = self.infer_single(
+            image.cuda(),
+            render_size=render_size,
+            mesh_size=mesh_size,
+            export_video=export_video,
+            export_mesh=export_mesh,
+        )
+
+        if 'frames' in results:
+            renderings = results['frames']
+            for k, v in renderings.items():
+                if k == 'images_rgb':
+                    self.images_to_video(
+                        v[0],
+                        os.path.join(dump_path, f'{uid}.mp4'),
+                        fps=40,
+                    )
+                    print(f"Export video success to {dump_path}")
+
+        if 'mesh' in results:
+            mesh = results['mesh']
+            mesh.export(os.path.join(dump_path, f'{uid}.obj'), 'obj')
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--model_name', type=str, default='lrm-base-obj-v1')
+    parser.add_argument('--source_path', type=str, default='./assets/cat.png')
+    parser.add_argument('--dump_path', type=str, default='./results/single_image')
+    parser.add_argument('--source_size', type=int, default=512)
+    parser.add_argument('--render_size', type=int, default=384)
+    parser.add_argument('--mesh_size', type=int, default=512)
+    parser.add_argument('--export_video', action='store_true')
+    parser.add_argument('--export_mesh', action='store_true')
+    parser.add_argument('--resume', type=str, required=True, help='Path to a checkpoint to resume training from')
+    args = parser.parse_args()
+
+    with LRMInferrer(model_name=args.model_name, resume=args.resume) as inferrer:
+        with torch.autocast(device_type="cuda", cache_enabled=False, dtype=torch.float32):
+            print("Start inference for image:", args.source_path)
+            inferrer.infer(
+                source_image=args.source_path,
+                dump_path=args.dump_path,
+                source_size=args.source_size,
+                render_size=args.render_size,
+                mesh_size=args.mesh_size,
+                export_video=args.export_video,
+                export_mesh=args.export_mesh,
+            )
+            print("Finished inference for image:", args.source_path)
diff --git a/lrm/models/__init__.py b/lrm/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..75ec8635435eb80f60bbe4cfe48c7c3239b3466e
--- /dev/null
+++ b/lrm/models/__init__.py
@@ -0,0 +1,5 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
\ No newline at end of file
diff --git a/lrm/models/encoders/__init__.py b/lrm/models/encoders/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..75ec8635435eb80f60bbe4cfe48c7c3239b3466e
--- /dev/null
+++ b/lrm/models/encoders/__init__.py
@@ -0,0 +1,5 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
\ No newline at end of file
diff --git a/lrm/models/encoders/dino_wrapper2.py b/lrm/models/encoders/dino_wrapper2.py
new file mode 100644
index 0000000000000000000000000000000000000000..0930568aaeae919551686c361c1446217f3892e8
--- /dev/null
+++ b/lrm/models/encoders/dino_wrapper2.py
@@ -0,0 +1,51 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+import torch.nn as nn
+from transformers import ViTImageProcessor, ViTModel, AutoImageProcessor, AutoModel, Dinov2Model
+
+class DinoWrapper(nn.Module):
+    """
+    Dino v1 wrapper using huggingface transformer implementation.
+    """
+    def __init__(self, model_name: str, freeze: bool = True):
+        super().__init__()
+        self.model, self.processor = self._build_dino(model_name)
+        if freeze:
+            self._freeze()
+
+    def forward(self, image):
+        # image: [N, C, H, W], on cpu
+        # RGB image with [0,1] scale and properly sized
+        inputs = self.processor(images=image.float(), return_tensors="pt", do_rescale=False, do_resize=False).to(self.model.device)
+        # This resampling of positional embedding uses bicubic interpolation
+        outputs = self.model(**inputs)
+        last_hidden_states = outputs.last_hidden_state
+        return last_hidden_states
+
+    def _freeze(self):
+        print(f"======== Freezing DinoWrapper ========")
+        self.model.eval()
+        for name, param in self.model.named_parameters():
+            param.requires_grad = False
+
+    @staticmethod
+    def _build_dino(model_name: str, proxy_error_retries: int = 3, proxy_error_cooldown: int = 5):
+        import requests
+        try:
+            processor = AutoImageProcessor.from_pretrained('facebook/dinov2-base')
+            processor.do_center_crop = False
+            model = AutoModel.from_pretrained('facebook/dinov2-base')
+            return model, processor
+        except requests.exceptions.ProxyError as err:
+            if proxy_error_retries > 0:
+                print(f"Huggingface ProxyError: Retrying in {proxy_error_cooldown} seconds...")
+                import time
+                time.sleep(proxy_error_cooldown)
+                return DinoWrapper._build_dino(model_name, proxy_error_retries - 1, proxy_error_cooldown)
+            else:
+                raise err
diff --git a/lrm/models/generator.py b/lrm/models/generator.py
new file mode 100644
index 0000000000000000000000000000000000000000..e2bafb574a05ca5f380e8b509fd915faddd40607
--- /dev/null
+++ b/lrm/models/generator.py
@@ -0,0 +1,87 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+import torch.nn as nn
+
+from .encoders.dino_wrapper2 import DinoWrapper
+from .transformer import TriplaneTransformer
+from .rendering.synthesizer_part import TriplaneSynthesizer
+
+
+class CameraEmbedder(nn.Module):
+    """
+    Embed camera features to a high-dimensional vector.
+    
+    Reference:
+    DiT: https://github.com/facebookresearch/DiT/blob/main/models.py#L27
+    """
+    def __init__(self, raw_dim: int, embed_dim: int):
+        super().__init__()
+        self.mlp = nn.Sequential(
+            nn.Linear(raw_dim, embed_dim),
+            nn.SiLU(),
+            nn.Linear(embed_dim, embed_dim),
+        )
+
+    def forward(self, x):
+        return self.mlp(x)
+
+
+class LRMGenerator(nn.Module):
+    """
+    Full model of the large reconstruction model.
+    """
+    def __init__(self, camera_embed_dim: int, rendering_samples_per_ray: int,
+                 transformer_dim: int, transformer_layers: int, transformer_heads: int,
+                 triplane_low_res: int, triplane_high_res: int, triplane_dim: int,
+                 encoder_freeze: bool = True, encoder_model_name: str = 'facebook/dinov2-base', encoder_feat_dim: int = 768):
+        super().__init__()
+        
+        # attributes
+        self.encoder_feat_dim = encoder_feat_dim
+        self.camera_embed_dim = camera_embed_dim
+
+        # modules
+        self.encoder = DinoWrapper(
+            model_name=encoder_model_name,
+            freeze=encoder_freeze,
+        )
+        self.camera_embedder = CameraEmbedder(
+            raw_dim=12+4, embed_dim=camera_embed_dim,
+        )
+        self.transformer = TriplaneTransformer(
+            inner_dim=transformer_dim, num_layers=transformer_layers, num_heads=transformer_heads,
+            image_feat_dim=encoder_feat_dim,
+            camera_embed_dim=camera_embed_dim,
+            triplane_low_res=triplane_low_res, triplane_high_res=triplane_high_res, triplane_dim=triplane_dim,
+        )
+        self.synthesizer = TriplaneSynthesizer(
+            triplane_dim=triplane_dim, samples_per_ray=rendering_samples_per_ray,
+        )
+
+    def forward(self, image, camera):
+        # image: [N, C_img, H_img, W_img]
+        # camera: [N, D_cam_raw]
+        assert image.shape[0] == camera.shape[0], "Batch size mismatch for image and camera"
+        N = image.shape[0]
+
+        # encode image
+        image_feats = self.encoder(image)
+        assert image_feats.shape[-1] == self.encoder_feat_dim, \
+            f"Feature dimension mismatch: {image_feats.shape[-1]} vs {self.encoder_feat_dim}"
+
+        # embed camera
+        camera_embeddings = self.camera_embedder(camera)
+        assert camera_embeddings.shape[-1] == self.camera_embed_dim, \
+            f"Feature dimension mismatch: {camera_embeddings.shape[-1]} vs {self.camera_embed_dim}"
+
+        # transformer generating planes
+        planes = self.transformer(image_feats, camera_embeddings)
+        assert planes.shape[0] == N, "Batch size mismatch for planes"
+        assert planes.shape[1] == 3, "Planes should have 3 channels"
+        return planes
+
diff --git a/lrm/models/rendering/__init__.py b/lrm/models/rendering/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..75ec8635435eb80f60bbe4cfe48c7c3239b3466e
--- /dev/null
+++ b/lrm/models/rendering/__init__.py
@@ -0,0 +1,5 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
\ No newline at end of file
diff --git a/lrm/models/rendering/synthesizer_part.py b/lrm/models/rendering/synthesizer_part.py
new file mode 100644
index 0000000000000000000000000000000000000000..96f1c9b10e7304a09b919d8baee72beb8d1f70c1
--- /dev/null
+++ b/lrm/models/rendering/synthesizer_part.py
@@ -0,0 +1,194 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+import itertools
+import torch
+import torch.nn as nn
+
+from .utils.renderer import ImportanceRenderer
+from .utils.ray_sampler_part import RaySampler
+
+
+class OSGDecoder(nn.Module):
+    """
+    Triplane decoder that gives RGB and sigma values from sampled features.
+    Using ReLU here instead of Softplus in the original implementation.
+    
+    Reference:
+    EG3D: https://github.com/NVlabs/eg3d/blob/main/eg3d/training/triplane.py#L112
+    """
+    def __init__(self, n_features: int,
+                 hidden_dim: int = 64, num_layers: int = 4, activation: nn.Module = nn.ReLU):
+        super().__init__()
+        self.net = nn.Sequential(
+            nn.Linear(3 * n_features, hidden_dim),
+            activation(),
+            *itertools.chain(*[[
+                nn.Linear(hidden_dim, hidden_dim),
+                activation(),
+            ] for _ in range(num_layers - 2)]),
+            nn.Linear(hidden_dim, 1 + 3),
+        )
+        # init all bias to zero
+        for m in self.modules():
+            if isinstance(m, nn.Linear):
+                nn.init.zeros_(m.bias)
+
+    def forward(self, sampled_features, ray_directions):
+        # Aggregate features by mean
+        # sampled_features = sampled_features.mean(1)
+        # Aggregate features by concatenation
+        _N, n_planes, _M, _C = sampled_features.shape
+        sampled_features = sampled_features.permute(0, 2, 1, 3).reshape(_N, _M, n_planes*_C)
+        x = sampled_features
+
+        N, M, C = x.shape
+        x = x.contiguous().view(N*M, C)
+
+        x = self.net(x)
+        x = x.view(N, M, -1)
+        rgb = torch.sigmoid(x[..., 1:])*(1 + 2*0.001) - 0.001  # Uses sigmoid clamping from MipNeRF
+        sigma = x[..., 0:1]
+
+        return {'rgb': rgb, 'sigma': sigma}
+
+
+class TriplaneSynthesizer(nn.Module):
+    """
+    Synthesizer that renders a triplane volume with planes and a camera.
+    
+    Reference:
+    EG3D: https://github.com/NVlabs/eg3d/blob/main/eg3d/training/triplane.py#L19
+    """
+
+    DEFAULT_RENDERING_KWARGS = {
+        'ray_start': 'auto',
+        'ray_end': 'auto',
+        'box_warp': 2.,
+        'white_back': True,
+        'disparity_space_sampling': False,
+        'clamp_mode': 'softplus',
+        'sampler_bbox_min': -1.,
+        'sampler_bbox_max': 1.,
+    }
+
+    def __init__(self, triplane_dim: int, samples_per_ray: int):
+        super().__init__()
+
+        # attributes
+        self.triplane_dim = triplane_dim
+        self.rendering_kwargs = {
+            **self.DEFAULT_RENDERING_KWARGS,
+            'depth_resolution': samples_per_ray // 2,
+            'depth_resolution_importance': samples_per_ray // 2,
+        }
+
+        # renderings
+        self.renderer = ImportanceRenderer()
+        self.ray_sampler = RaySampler()
+
+        # modules
+        self.decoder = OSGDecoder(n_features=triplane_dim)
+
+    def forward(self, planes, cameras, render_size: int, crop_size: int, start_x: int, start_y:int):
+        # planes: (N, 3, D', H', W')
+        # cameras: (N, M, D_cam)
+        # render_size: int
+        assert planes.shape[0] == cameras.shape[0], "Batch size mismatch for planes and cameras"
+        N, M = cameras.shape[:2]
+        cam2world_matrix = cameras[..., :16].view(N, M, 4, 4)
+        intrinsics = cameras[..., 16:25].view(N, M, 3, 3)
+
+        # Create a batch of rays for volume rendering
+        ray_origins, ray_directions = self.ray_sampler(
+            cam2world_matrix=cam2world_matrix.reshape(-1, 4, 4),
+            intrinsics=intrinsics.reshape(-1, 3, 3),
+            render_size=render_size,
+            crop_size = crop_size,
+            start_x = start_x,
+            start_y = start_y
+        )
+        assert N*M == ray_origins.shape[0], "Batch size mismatch for ray_origins"
+        assert ray_origins.dim() == 3, "ray_origins should be 3-dimensional"
+        # Perform volume rendering
+        rgb_samples, depth_samples, weights_samples = self.renderer(
+            planes.repeat_interleave(M, dim=0), self.decoder, ray_origins, ray_directions, self.rendering_kwargs,
+        )
+
+        # Reshape into 'raw' neural-rendered image
+        Himg = Wimg = crop_size
+        rgb_images = rgb_samples.permute(0, 2, 1).reshape(N, M, rgb_samples.shape[-1], Himg, Wimg).contiguous()
+        depth_images = depth_samples.permute(0, 2, 1).reshape(N, M, 1, Himg, Wimg)
+        weight_images = weights_samples.permute(0, 2, 1).reshape(N, M, 1, Himg, Wimg)
+
+        return {
+            'images_rgb': rgb_images,
+            'images_depth': depth_images,
+            'images_weight': weight_images,
+        }
+
+    def forward_grid(self, planes, grid_size: int, aabb: torch.Tensor = None):
+        # planes: (N, 3, D', H', W')
+        # grid_size: int
+        # aabb: (N, 2, 3)
+        if aabb is None:
+            aabb = torch.tensor([
+                [self.rendering_kwargs['sampler_bbox_min']] * 3,
+                [self.rendering_kwargs['sampler_bbox_max']] * 3,
+            ], device=planes.device, dtype=planes.dtype).unsqueeze(0).repeat(planes.shape[0], 1, 1)
+        assert planes.shape[0] == aabb.shape[0], "Batch size mismatch for planes and aabb"
+        N = planes.shape[0]
+
+        # create grid points for triplane query
+        grid_points = []
+        for i in range(N):
+            grid_points.append(torch.stack(torch.meshgrid(
+                torch.linspace(aabb[i, 0, 0], aabb[i, 1, 0], grid_size, device=planes.device),
+                torch.linspace(aabb[i, 0, 1], aabb[i, 1, 1], grid_size, device=planes.device),
+                torch.linspace(aabb[i, 0, 2], aabb[i, 1, 2], grid_size, device=planes.device),
+                indexing='ij',
+            ), dim=-1).reshape(-1, 3))
+        cube_grid = torch.stack(grid_points, dim=0).to(planes.device)
+
+        features = self.forward_points(planes, cube_grid)
+
+        # reshape into grid
+        features = {
+            k: v.reshape(N, grid_size, grid_size, grid_size, -1)
+            for k, v in features.items()
+        }
+        return features
+
+    def forward_points(self, planes, points: torch.Tensor, chunk_size: int = 2**20):
+        # planes: (N, 3, D', H', W')
+        # points: (N, P, 3)
+        N, P = points.shape[:2]
+
+        # query triplane in chunks
+        outs = []
+        for i in range(0, points.shape[1], chunk_size):
+            chunk_points = points[:, i:i+chunk_size]
+
+            # query triplane
+            chunk_out = self.renderer.run_model_activated(
+                planes=planes,
+                decoder=self.decoder,
+                sample_coordinates=chunk_points,
+                sample_directions=torch.zeros_like(chunk_points),
+                options=self.rendering_kwargs,
+            )
+            outs.append(chunk_out)
+
+        # concatenate the outputs
+        point_features = {
+            k: torch.cat([out[k] for out in outs], dim=1)
+            for k in outs[0].keys()
+        }
+        
+        sig = point_features['sigma']
+        print(sig.mean(), sig.max(), sig.min())
+        return point_features
diff --git a/lrm/models/rendering/utils/__init__.py b/lrm/models/rendering/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..b433b3b1000654825bbfca2b174b3f04bfedacda
--- /dev/null
+++ b/lrm/models/rendering/utils/__init__.py
@@ -0,0 +1,14 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+# SPDX-FileCopyrightText: Copyright (c) 2021-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: LicenseRef-NvidiaProprietary
+#
+# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
+# property and proprietary rights in and to this material, related
+# documentation and any modifications thereto. Any use, reproduction,
+# disclosure or distribution of this material and related documentation
+# without an express license agreement from NVIDIA CORPORATION or
+# its affiliates is strictly prohibited.
diff --git a/lrm/models/rendering/utils/__pycache__/__init__.cpython-310.pyc b/lrm/models/rendering/utils/__pycache__/__init__.cpython-310.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..dafbaa5915ad722f43ccf61d08dafde1813645b1
Binary files /dev/null and b/lrm/models/rendering/utils/__pycache__/__init__.cpython-310.pyc differ
diff --git a/lrm/models/rendering/utils/__pycache__/math_utils.cpython-310.pyc b/lrm/models/rendering/utils/__pycache__/math_utils.cpython-310.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..56f52be43ba55fe9c3114f19138b743b3b634f0e
Binary files /dev/null and b/lrm/models/rendering/utils/__pycache__/math_utils.cpython-310.pyc differ
diff --git a/lrm/models/rendering/utils/__pycache__/ray_marcher.cpython-310.pyc b/lrm/models/rendering/utils/__pycache__/ray_marcher.cpython-310.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..f9d185a23b791933262b745305713bf8fc9550f4
Binary files /dev/null and b/lrm/models/rendering/utils/__pycache__/ray_marcher.cpython-310.pyc differ
diff --git a/lrm/models/rendering/utils/__pycache__/ray_sampler_part.cpython-310.pyc b/lrm/models/rendering/utils/__pycache__/ray_sampler_part.cpython-310.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..254a8f8ce8274d2f7c9e796f52a49d942596631f
Binary files /dev/null and b/lrm/models/rendering/utils/__pycache__/ray_sampler_part.cpython-310.pyc differ
diff --git a/lrm/models/rendering/utils/__pycache__/renderer.cpython-310.pyc b/lrm/models/rendering/utils/__pycache__/renderer.cpython-310.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..6188a0528e0513db931db9f27d8999a13ece3078
Binary files /dev/null and b/lrm/models/rendering/utils/__pycache__/renderer.cpython-310.pyc differ
diff --git a/lrm/models/rendering/utils/math_utils.py b/lrm/models/rendering/utils/math_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..94c00e0cc6ff65817c71a336c0cdb694b0ff5b39
--- /dev/null
+++ b/lrm/models/rendering/utils/math_utils.py
@@ -0,0 +1,123 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+# MIT License
+
+# Copyright (c) 2022 Petr Kellnhofer
+
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+import torch
+
+def transform_vectors(matrix: torch.Tensor, vectors4: torch.Tensor) -> torch.Tensor:
+    """
+    Left-multiplies MxM @ NxM. Returns NxM.
+    """
+    res = torch.matmul(vectors4, matrix.T)
+    return res
+
+
+def normalize_vecs(vectors: torch.Tensor) -> torch.Tensor:
+    """
+    Normalize vector lengths.
+    """
+    return vectors / (torch.norm(vectors, dim=-1, keepdim=True))
+
+def torch_dot(x: torch.Tensor, y: torch.Tensor):
+    """
+    Dot product of two tensors.
+    """
+    return (x * y).sum(-1)
+
+
+def get_ray_limits_box(rays_o: torch.Tensor, rays_d: torch.Tensor, box_side_length):
+    """
+    Author: Petr Kellnhofer
+    Intersects rays with the [-1, 1] NDC volume.
+    Returns min and max distance of entry.
+    Returns -1 for no intersection.
+    https://www.scratchapixel.com/lessons/3d-basic-rendering/minimal-ray-tracer-rendering-simple-shapes/ray-box-intersection
+    """
+    o_shape = rays_o.shape
+    rays_o = rays_o.detach().reshape(-1, 3)
+    rays_d = rays_d.detach().reshape(-1, 3)
+
+
+    bb_min = [-1*(box_side_length/2), -1*(box_side_length/2), -1*(box_side_length/2)]
+    bb_max = [1*(box_side_length/2), 1*(box_side_length/2), 1*(box_side_length/2)]
+    bounds = torch.tensor([bb_min, bb_max], dtype=rays_o.dtype, device=rays_o.device)
+    is_valid = torch.ones(rays_o.shape[:-1], dtype=bool, device=rays_o.device)
+
+    # Precompute inverse for stability.
+    invdir = 1 / rays_d
+    sign = (invdir < 0).long()
+
+    # Intersect with YZ plane.
+    tmin = (bounds.index_select(0, sign[..., 0])[..., 0] - rays_o[..., 0]) * invdir[..., 0]
+    tmax = (bounds.index_select(0, 1 - sign[..., 0])[..., 0] - rays_o[..., 0]) * invdir[..., 0]
+
+    # Intersect with XZ plane.
+    tymin = (bounds.index_select(0, sign[..., 1])[..., 1] - rays_o[..., 1]) * invdir[..., 1]
+    tymax = (bounds.index_select(0, 1 - sign[..., 1])[..., 1] - rays_o[..., 1]) * invdir[..., 1]
+
+    # Resolve parallel rays.
+    is_valid[torch.logical_or(tmin > tymax, tymin > tmax)] = False
+
+    # Use the shortest intersection.
+    tmin = torch.max(tmin, tymin)
+    tmax = torch.min(tmax, tymax)
+
+    # Intersect with XY plane.
+    tzmin = (bounds.index_select(0, sign[..., 2])[..., 2] - rays_o[..., 2]) * invdir[..., 2]
+    tzmax = (bounds.index_select(0, 1 - sign[..., 2])[..., 2] - rays_o[..., 2]) * invdir[..., 2]
+
+    # Resolve parallel rays.
+    is_valid[torch.logical_or(tmin > tzmax, tzmin > tmax)] = False
+
+    # Use the shortest intersection.
+    tmin = torch.max(tmin, tzmin)
+    tmax = torch.min(tmax, tzmax)
+
+    # Mark invalid.
+    tmin[torch.logical_not(is_valid)] = -1
+    tmax[torch.logical_not(is_valid)] = -2
+
+    return tmin.reshape(*o_shape[:-1], 1), tmax.reshape(*o_shape[:-1], 1)
+
+
+def linspace(start: torch.Tensor, stop: torch.Tensor, num: int):
+    """
+    Creates a tensor of shape [num, *start.shape] whose values are evenly spaced from start to end, inclusive.
+    Replicates but the multi-dimensional bahaviour of numpy.linspace in PyTorch.
+    """
+    # create a tensor of 'num' steps from 0 to 1
+    steps = torch.arange(num, dtype=torch.float32, device=start.device) / (num - 1)
+
+    # reshape the 'steps' tensor to [-1, *([1]*start.ndim)] to allow for broadcastings
+    # - using 'steps.reshape([-1, *([1]*start.ndim)])' would be nice here but torchscript
+    #   "cannot statically infer the expected size of a list in this contex", hence the code below
+    for i in range(start.ndim):
+        steps = steps.unsqueeze(-1)
+
+    # the output starts at 'start' and increments until 'stop' in each dimension
+    out = start[None] + steps * (stop - start)[None]
+
+    return out
diff --git a/lrm/models/rendering/utils/ray_marcher.py b/lrm/models/rendering/utils/ray_marcher.py
new file mode 100644
index 0000000000000000000000000000000000000000..31cb6300b7f5dea91317524e54301b3f167f3311
--- /dev/null
+++ b/lrm/models/rendering/utils/ray_marcher.py
@@ -0,0 +1,73 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+# SPDX-FileCopyrightText: Copyright (c) 2021-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: LicenseRef-NvidiaProprietary
+#
+# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
+# property and proprietary rights in and to this material, related
+# documentation and any modifications thereto. Any use, reproduction,
+# disclosure or distribution of this material and related documentation
+# without an express license agreement from NVIDIA CORPORATION or
+# its affiliates is strictly prohibited.
+#
+# Modified by Zexin He
+# The modifications are subject to the same license as the original.
+
+
+"""
+The ray marcher takes the raw output of the implicit representation and uses the volume rendering equation to produce composited colors and depths.
+Based off of the implementation in MipNeRF (this one doesn't do any cone tracing though!)
+"""
+
+import torch
+import torch.nn as nn
+
+
+class MipRayMarcher2(nn.Module):
+    def __init__(self, activation_factory):
+        super().__init__()
+        self.activation_factory = activation_factory
+
+    def run_forward(self, colors, densities, depths, rendering_options):
+        
+        deltas = depths[:, :, 1:] - depths[:, :, :-1] 
+        colors_mid = (colors[:, :, :-1] + colors[:, :, 1:]) / 2
+        densities_mid = (densities[:, :, :-1] + densities[:, :, 1:]) / 2
+        depths_mid = (depths[:, :, :-1] + depths[:, :, 1:]) / 2
+
+    
+
+        # using factory mode for better usability
+        densities_mid = self.activation_factory(rendering_options)(densities_mid)
+
+        density_delta = densities_mid * deltas
+
+        alpha = 1 - torch.exp(-density_delta)
+
+        alpha_shifted = torch.cat([torch.ones_like(alpha[:, :, :1]), 1-alpha + 1e-10], -2)
+        weights = alpha * torch.cumprod(alpha_shifted, -2)[:, :, :-1]
+
+        composite_rgb = torch.sum(weights * colors_mid, -2)
+        weight_total = weights.sum(2)
+        composite_depth = torch.sum(weights * depths_mid, -2) / weight_total
+
+        # clip the composite to min/max range of depths
+        composite_depth = torch.nan_to_num(composite_depth, float('inf'))
+        composite_depth = torch.clamp(composite_depth, torch.min(depths), torch.max(depths))
+
+        if rendering_options.get('white_back', False):
+            composite_rgb = composite_rgb + 1 - weight_total
+
+        # rendered value scale is 0-1, comment out original mipnerf scaling
+        # composite_rgb = composite_rgb * 2 - 1 # Scale to (-1, 1)
+
+        return composite_rgb, composite_depth, weights
+
+
+    def forward(self, colors, densities, depths, rendering_options):
+        composite_rgb, composite_depth, weights = self.run_forward(colors, densities, depths, rendering_options)
+
+        return composite_rgb, composite_depth, weights
diff --git a/lrm/models/rendering/utils/ray_sampler_part.py b/lrm/models/rendering/utils/ray_sampler_part.py
new file mode 100644
index 0000000000000000000000000000000000000000..2ba8a9676ac41ae79bb0aaae4e9988fe8c37b547
--- /dev/null
+++ b/lrm/models/rendering/utils/ray_sampler_part.py
@@ -0,0 +1,94 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+# SPDX-FileCopyrightText: Copyright (c) 2021-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: LicenseRef-NvidiaProprietary
+#
+# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
+# property and proprietary rights in and to this material, related
+# documentation and any modifications thereto. Any use, reproduction,
+# disclosure or distribution of this material and related documentation
+# without an express license agreement from NVIDIA CORPORATION or
+# its affiliates is strictly prohibited.
+#
+# Modified by Zexin He
+# The modifications are subject to the same license as the original.
+
+
+"""
+The ray sampler is a module that takes in camera matrices and resolution and batches of rays.
+Expects cam2world matrices that use the OpenCV camera coordinate system conventions.
+"""
+
+import torch
+
+class RaySampler(torch.nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.ray_origins_h, self.ray_directions, self.depths, self.image_coords, self.rendering_options = None, None, None, None, None
+
+
+    def forward(self, cam2world_matrix, intrinsics, render_size, crop_size, start_x, start_y):
+        """
+        Create batches of rays and return origins and directions.
+
+        cam2world_matrix: (N, 4, 4)
+        intrinsics: (N, 3, 3)
+        render_size: int
+
+        ray_origins: (N, M, 3)
+        ray_dirs: (N, M, 2)
+        """
+
+        N, M = cam2world_matrix.shape[0], crop_size**2
+        cam_locs_world = cam2world_matrix[:, :3, 3]
+        fx = intrinsics[:, 0, 0]
+        fy = intrinsics[:, 1, 1]
+        cx = intrinsics[:, 0, 2]
+        cy = intrinsics[:, 1, 2]
+        sk = intrinsics[:, 0, 1]
+
+        uv = torch.stack(torch.meshgrid(
+            torch.arange(render_size, dtype=torch.float32, device=cam2world_matrix.device),
+            torch.arange(render_size, dtype=torch.float32, device=cam2world_matrix.device),
+            indexing='ij',
+        )) 
+        if crop_size < render_size:
+            patch_uv = []
+            for i in range(cam2world_matrix.shape[0]):
+                patch_uv.append(uv.clone()[None, :, start_y:start_y+crop_size, start_x:start_x+crop_size])
+            uv = torch.cat(patch_uv, 0)
+            uv = uv.flip(1).reshape(cam2world_matrix.shape[0], 2, -1).transpose(2, 1)
+        else:
+            uv = uv.flip(0).reshape(2, -1).transpose(1, 0)
+            uv = uv.unsqueeze(0).repeat(cam2world_matrix.shape[0], 1, 1)
+        # uv = uv.unsqueeze(0).repeat(cam2world_matrix.shape[0], 1, 1)
+        # uv = uv.flip(1).reshape(cam2world_matrix.shape[0], 2, -1).transpose(2, 1)
+        x_cam = uv[:, :, 0].view(N, -1) * (1./render_size) + (0.5/render_size)
+        y_cam = uv[:, :, 1].view(N, -1) * (1./render_size) + (0.5/render_size)
+        z_cam = torch.ones((N, M), device=cam2world_matrix.device)
+
+        x_lift = (x_cam - cx.unsqueeze(-1) + cy.unsqueeze(-1)*sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)*y_cam/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z_cam
+        y_lift = (y_cam - cy.unsqueeze(-1)) / fy.unsqueeze(-1) * z_cam
+
+        cam_rel_points = torch.stack((x_lift, y_lift, z_cam, torch.ones_like(z_cam)), dim=-1).float()
+
+        _opencv2blender = torch.tensor([
+            [1, 0, 0, 0],
+            [0, -1, 0, 0],
+            [0, 0, -1, 0],
+            [0, 0, 0, 1],
+        ], dtype=torch.float32, device=cam2world_matrix.device).unsqueeze(0).repeat(N, 1, 1)
+
+        # added float here
+        cam2world_matrix = torch.bmm(cam2world_matrix.float(), _opencv2blender.float())
+
+        world_rel_points = torch.bmm(cam2world_matrix.float(), cam_rel_points.permute(0, 2, 1)).permute(0, 2, 1)[:, :, :3]
+
+        ray_dirs = world_rel_points - cam_locs_world[:, None, :]
+        ray_dirs = torch.nn.functional.normalize(ray_dirs, dim=2)
+
+        ray_origins = cam_locs_world.unsqueeze(1).repeat(1, ray_dirs.shape[1], 1)
+        return ray_origins, ray_dirs
\ No newline at end of file
diff --git a/lrm/models/rendering/utils/renderer.py b/lrm/models/rendering/utils/renderer.py
new file mode 100644
index 0000000000000000000000000000000000000000..0606e9273d4f26f02fa47e5735da58b4b5d25d9b
--- /dev/null
+++ b/lrm/models/rendering/utils/renderer.py
@@ -0,0 +1,314 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+# SPDX-FileCopyrightText: Copyright (c) 2021-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: LicenseRef-NvidiaProprietary
+#
+# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
+# property and proprietary rights in and to this material, related
+# documentation and any modifications thereto. Any use, reproduction,
+# disclosure or distribution of this material and related documentation
+# without an express license agreement from NVIDIA CORPORATION or
+# its affiliates is strictly prohibited.
+#
+# Modified by Zexin He
+# The modifications are subject to the same license as the original.
+
+
+"""
+The renderer is a module that takes in rays, decides where to sample along each
+ray, and computes pixel colors using the volume rendering equation.
+"""
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from .ray_marcher import MipRayMarcher2
+from . import math_utils
+
+def generate_planes():
+    """
+    Defines planes by the three vectors that form the "axes" of the
+    plane. Should work with arbitrary number of planes and planes of
+    arbitrary orientation.
+
+    Bugfix reference: https://github.com/NVlabs/eg3d/issues/67
+    """
+    return torch.tensor([[[1, 0, 0],
+                            [0, 1, 0],
+                            [0, 0, 1]],
+                            [[1, 0, 0],
+                            [0, 0, 1],
+                            [0, 1, 0]],
+                            [[0, 0, 1],
+                            [0, 1, 0],
+                            [1, 0, 0]]], dtype=torch.float32)
+
+def project_onto_planes(planes, coordinates):
+    """
+    Does a projection of a 3D point onto a batch of 2D planes,
+    returning 2D plane coordinates.
+
+    Takes plane axes of shape n_planes, 3, 3
+    # Takes coordinates of shape N, M, 3
+    # returns projections of shape N*n_planes, M, 2
+    """
+    N, M, C = coordinates.shape
+    n_planes, _, _ = planes.shape
+    coordinates = coordinates.unsqueeze(1).expand(-1, n_planes, -1, -1).reshape(N*n_planes, M, 3)
+    inv_planes = torch.linalg.inv(planes).unsqueeze(0).expand(N, -1, -1, -1).reshape(N*n_planes, 3, 3)
+    coordinates = coordinates.to(inv_planes.device)
+    projections = torch.bmm(coordinates, inv_planes)
+    return projections[..., :2]
+
+def sample_from_planes(plane_axes, plane_features, coordinates, mode='bilinear', padding_mode='zeros', box_warp=None):
+    assert padding_mode == 'zeros'
+    N, n_planes, C, H, W = plane_features.shape
+    _, M, _ = coordinates.shape
+    plane_features = plane_features.view(N*n_planes, C, H, W)
+
+    coordinates = (2/box_warp) * coordinates # add specific box bounds
+    # half added here
+    projected_coordinates = project_onto_planes(plane_axes, coordinates).unsqueeze(1)
+    # removed float from projected_coordinates
+    output_features = torch.nn.functional.grid_sample(plane_features.float(), projected_coordinates.float(), mode=mode, padding_mode=padding_mode, align_corners=False).permute(0, 3, 2, 1).reshape(N, n_planes, M, C)
+    return output_features
+
+def sample_from_3dgrid(grid, coordinates):
+    """
+    Expects coordinates in shape (batch_size, num_points_per_batch, 3)
+    Expects grid in shape (1, channels, H, W, D)
+    (Also works if grid has batch size)
+    Returns sampled features of shape (batch_size, num_points_per_batch, feature_channels)
+    """
+    batch_size, n_coords, n_dims = coordinates.shape
+    sampled_features = torch.nn.functional.grid_sample(grid.expand(batch_size, -1, -1, -1, -1),
+                                                       coordinates.reshape(batch_size, 1, 1, -1, n_dims),
+                                                       mode='bilinear', padding_mode='zeros', align_corners=False)
+    N, C, H, W, D = sampled_features.shape
+    sampled_features = sampled_features.permute(0, 4, 3, 2, 1).reshape(N, H*W*D, C)
+    return sampled_features
+
+class ImportanceRenderer(torch.nn.Module):
+    """
+    Modified original version to filter out-of-box samples as TensoRF does.
+    
+    Reference:
+    TensoRF: https://github.com/apchenstu/TensoRF/blob/main/models/tensorBase.py#L277
+    """
+    def __init__(self):
+        super().__init__()
+        self.activation_factory = self._build_activation_factory()
+        self.ray_marcher = MipRayMarcher2(self.activation_factory)
+        self.plane_axes = generate_planes()
+
+    def _build_activation_factory(self):
+        def activation_factory(options: dict):
+            if options['clamp_mode'] == 'softplus':
+                return lambda x: F.softplus(x - 1)  # activation bias of -1 makes things initialize better
+            else:
+                assert False, "Renderer only supports `clamp_mode`=`softplus`!"
+        return activation_factory
+
+    def _forward_pass(self, depths: torch.Tensor, ray_directions: torch.Tensor, ray_origins: torch.Tensor,
+                        planes: torch.Tensor, decoder: nn.Module, rendering_options: dict):
+        """
+        Additional filtering is applied to filter out-of-box samples.
+        Modifications made by Zexin He.
+        """
+
+        # context related variables
+        batch_size, num_rays, samples_per_ray, _ = depths.shape
+        device = planes.device
+        depths = depths.to(device)
+        ray_directions = ray_directions.to(device)
+        ray_origins = ray_origins.to(device)
+        # define sample points with depths
+        sample_directions = ray_directions.unsqueeze(-2).expand(-1, -1, samples_per_ray, -1).reshape(batch_size, -1, 3)
+        sample_coordinates = (ray_origins.unsqueeze(-2) + depths * ray_directions.unsqueeze(-2)).reshape(batch_size, -1, 3)
+
+        # filter out-of-box samples
+        mask_inbox = \
+            (rendering_options['sampler_bbox_min'] <= sample_coordinates) & \
+                (sample_coordinates <= rendering_options['sampler_bbox_max'])
+        mask_inbox = mask_inbox.all(-1)
+
+        # forward model according to all samples
+        _out = self.run_model(planes, decoder, sample_coordinates, sample_directions, rendering_options)
+
+        # set out-of-box samples to zeros(rgb) & -inf(sigma)
+        SAFE_GUARD = 3
+        DATA_TYPE = _out['sigma'].dtype
+        colors_pass = torch.zeros(batch_size, num_rays * samples_per_ray, 3, device=device, dtype=DATA_TYPE)
+        densities_pass = torch.nan_to_num(torch.full((batch_size, num_rays * samples_per_ray, 1), -float('inf'), device=device, dtype=DATA_TYPE)) / SAFE_GUARD
+        colors_pass[mask_inbox], densities_pass[mask_inbox] = _out['rgb'][mask_inbox], _out['sigma'][mask_inbox]
+
+        # reshape back
+        colors_pass = colors_pass.reshape(batch_size, num_rays, samples_per_ray, colors_pass.shape[-1])
+        densities_pass = densities_pass.reshape(batch_size, num_rays, samples_per_ray, densities_pass.shape[-1])
+
+        return colors_pass, densities_pass
+
+    def forward(self, planes, decoder, ray_origins, ray_directions, rendering_options):
+        # self.plane_axes = self.plane_axes.to(ray_origins.device)
+
+        if rendering_options['ray_start'] == rendering_options['ray_end'] == 'auto':
+            ray_start, ray_end = math_utils.get_ray_limits_box(ray_origins, ray_directions, box_side_length=rendering_options['box_warp'])
+            is_ray_valid = ray_end > ray_start
+            if torch.any(is_ray_valid).item():
+                ray_start[~is_ray_valid] = ray_start[is_ray_valid].min()
+                ray_end[~is_ray_valid] = ray_start[is_ray_valid].max()
+            depths_coarse = self.sample_stratified(ray_origins, ray_start, ray_end, rendering_options['depth_resolution'], rendering_options['disparity_space_sampling'])
+        else:
+            # Create stratified depth samples
+            depths_coarse = self.sample_stratified(ray_origins, rendering_options['ray_start'], rendering_options['ray_end'], rendering_options['depth_resolution'], rendering_options['disparity_space_sampling'])
+        
+        depths_coarse =  depths_coarse.to(planes.device)
+
+        # Coarse Pass
+        colors_coarse, densities_coarse = self._forward_pass(
+            depths=depths_coarse, ray_directions=ray_directions, ray_origins=ray_origins,
+            planes=planes, decoder=decoder, rendering_options=rendering_options)
+
+        # Fine Pass
+        N_importance = rendering_options['depth_resolution_importance']
+        if N_importance > 0:
+            _, _, weights = self.ray_marcher(colors_coarse, densities_coarse, depths_coarse, rendering_options)
+
+            depths_fine = self.sample_importance(depths_coarse, weights, N_importance)
+
+            colors_fine, densities_fine = self._forward_pass(
+                depths=depths_fine, ray_directions=ray_directions, ray_origins=ray_origins,
+                planes=planes, decoder=decoder, rendering_options=rendering_options)
+
+            all_depths, all_colors, all_densities = self.unify_samples(depths_coarse, colors_coarse, densities_coarse,
+                                                                  depths_fine, colors_fine, densities_fine)
+
+            # Aggregate
+            rgb_final, depth_final, weights = self.ray_marcher(all_colors, all_densities, all_depths, rendering_options)
+        else:
+            rgb_final, depth_final, weights = self.ray_marcher(colors_coarse, densities_coarse, depths_coarse, rendering_options)
+
+        return rgb_final, depth_final, weights.sum(2)
+
+    def run_model(self, planes, decoder, sample_coordinates, sample_directions, options):
+        plane_axes = self.plane_axes.to(planes.device)
+        sampled_features = sample_from_planes(plane_axes, planes, sample_coordinates, padding_mode='zeros', box_warp=options['box_warp'])
+
+        out = decoder(sampled_features, sample_directions)
+        if options.get('density_noise', 0) > 0:
+            out['sigma'] += torch.randn_like(out['sigma']) * options['density_noise']
+        return out
+
+    def run_model_activated(self, planes, decoder, sample_coordinates, sample_directions, options):
+        out = self.run_model(planes, decoder, sample_coordinates, sample_directions, options)
+        out['sigma'] = self.activation_factory(options)(out['sigma'])
+        return out
+
+    def sort_samples(self, all_depths, all_colors, all_densities):
+        _, indices = torch.sort(all_depths, dim=-2)
+        all_depths = torch.gather(all_depths, -2, indices)
+        all_colors = torch.gather(all_colors, -2, indices.expand(-1, -1, -1, all_colors.shape[-1]))
+        all_densities = torch.gather(all_densities, -2, indices.expand(-1, -1, -1, 1))
+        return all_depths, all_colors, all_densities
+
+    def unify_samples(self, depths1, colors1, densities1, depths2, colors2, densities2):
+        all_depths = torch.cat([depths1, depths2], dim = -2)
+        all_colors = torch.cat([colors1, colors2], dim = -2)
+        all_densities = torch.cat([densities1, densities2], dim = -2)
+
+        _, indices = torch.sort(all_depths, dim=-2)
+        all_depths = torch.gather(all_depths, -2, indices)
+        all_colors = torch.gather(all_colors, -2, indices.expand(-1, -1, -1, all_colors.shape[-1]))
+        all_densities = torch.gather(all_densities, -2, indices.expand(-1, -1, -1, 1))
+
+        return all_depths, all_colors, all_densities
+
+    def sample_stratified(self, ray_origins, ray_start, ray_end, depth_resolution, disparity_space_sampling=False):
+        """
+        Return depths of approximately uniformly spaced samples along rays.
+        """
+        N, M, _ = ray_origins.shape
+        if disparity_space_sampling:
+            depths_coarse = torch.linspace(0,
+                                    1,
+                                    depth_resolution,
+                                    device=ray_origins.device).reshape(1, 1, depth_resolution, 1).repeat(N, M, 1, 1)
+            depth_delta = 1/(depth_resolution - 1)
+            depths_coarse += torch.rand_like(depths_coarse) * depth_delta
+            depths_coarse = 1./(1./ray_start * (1. - depths_coarse) + 1./ray_end * depths_coarse)
+        else:
+            if type(ray_start) == torch.Tensor:
+                depths_coarse = math_utils.linspace(ray_start, ray_end, depth_resolution).permute(1,2,0,3)
+                depth_delta = (ray_end - ray_start) / (depth_resolution - 1)
+                depths_coarse += torch.rand_like(depths_coarse) * depth_delta[..., None]
+            else:
+                depths_coarse = torch.linspace(ray_start, ray_end, depth_resolution, device=ray_origins.device).reshape(1, 1, depth_resolution, 1).repeat(N, M, 1, 1)
+                depth_delta = (ray_end - ray_start)/(depth_resolution - 1)
+                depths_coarse += torch.rand_like(depths_coarse) * depth_delta
+
+        return depths_coarse
+
+    def sample_importance(self, z_vals, weights, N_importance):
+        """
+        Return depths of importance sampled points along rays. See NeRF importance sampling for more.
+        """
+        with torch.no_grad():
+            batch_size, num_rays, samples_per_ray, _ = z_vals.shape
+
+            z_vals = z_vals.reshape(batch_size * num_rays, samples_per_ray)
+            weights = weights.reshape(batch_size * num_rays, -1) # -1 to account for loss of 1 sample in MipRayMarcher
+
+            # smooth weights
+            weights = torch.nn.functional.max_pool1d(weights.unsqueeze(1).float(), 2, 1, padding=1)
+            weights = torch.nn.functional.avg_pool1d(weights, 2, 1).squeeze()
+            weights = weights + 0.01
+
+            z_vals_mid = 0.5 * (z_vals[: ,:-1] + z_vals[: ,1:])
+            importance_z_vals = self.sample_pdf(z_vals_mid, weights[:, 1:-1],
+                                             N_importance).detach().reshape(batch_size, num_rays, N_importance, 1)
+        return importance_z_vals
+
+    def sample_pdf(self, bins, weights, N_importance, det=False, eps=1e-5):
+        """
+        Sample @N_importance samples from @bins with distribution defined by @weights.
+        Inputs:
+            bins: (N_rays, N_samples_+1) where N_samples_ is "the number of coarse samples per ray - 2"
+            weights: (N_rays, N_samples_)
+            N_importance: the number of samples to draw from the distribution
+            det: deterministic or not
+            eps: a small number to prevent division by zero
+        Outputs:
+            samples: the sampled samples
+        """
+        N_rays, N_samples_ = weights.shape
+        weights = weights + eps # prevent division by zero (don't do inplace op!)
+        pdf = weights / torch.sum(weights, -1, keepdim=True) # (N_rays, N_samples_)
+        cdf = torch.cumsum(pdf, -1) # (N_rays, N_samples), cumulative distribution function
+        cdf = torch.cat([torch.zeros_like(cdf[: ,:1]), cdf], -1)  # (N_rays, N_samples_+1)
+                                                                   # padded to 0~1 inclusive
+
+        if det:
+            u = torch.linspace(0, 1, N_importance, device=bins.device)
+            u = u.expand(N_rays, N_importance)
+        else:
+            u = torch.rand(N_rays, N_importance, device=bins.device)
+        u = u.contiguous()
+
+        inds = torch.searchsorted(cdf, u, right=True)
+        below = torch.clamp_min(inds-1, 0)
+        above = torch.clamp_max(inds, N_samples_)
+
+        inds_sampled = torch.stack([below, above], -1).view(N_rays, 2*N_importance)
+        cdf_g = torch.gather(cdf, 1, inds_sampled).view(N_rays, N_importance, 2)
+        bins_g = torch.gather(bins, 1, inds_sampled).view(N_rays, N_importance, 2)
+
+        denom = cdf_g[...,1]-cdf_g[...,0]
+        denom[denom<eps] = 1 # denom equals 0 means a bin has weight 0, in which case it will not be sampled
+                             # anyway, therefore any value for it is fine (set to 1 here)
+
+        samples = bins_g[...,0] + (u-cdf_g[...,0])/denom * (bins_g[...,1]-bins_g[...,0])
+        return samples
diff --git a/lrm/models/transformer.py b/lrm/models/transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..353af5071d482cd2965a6352087f6dffbe34075e
--- /dev/null
+++ b/lrm/models/transformer.py
@@ -0,0 +1,135 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+import torch
+import torch.nn as nn
+
+
+class ModLN(nn.Module):
+    """
+    Modulation with adaLN.
+    
+    References:
+    DiT: https://github.com/facebookresearch/DiT/blob/main/models.py#L101
+    """
+    def __init__(self, inner_dim: int, mod_dim: int, eps: float):
+        super().__init__()
+        self.norm = nn.LayerNorm(inner_dim, eps=eps)
+        self.mlp = nn.Sequential(
+            nn.SiLU(),
+            nn.Linear(mod_dim, inner_dim * 2),
+        )
+
+    @staticmethod
+    def modulate(x, shift, scale):
+        # x: [N, L, D]
+        # shift, scale: [N, D]
+        return x * (1 + scale.unsqueeze(1)) + shift.unsqueeze(1)
+
+    def forward(self, x, cond):
+        shift, scale = self.mlp(cond).chunk(2, dim=-1)  # [N, D]
+        return self.modulate(self.norm(x), shift, scale)  # [N, L, D]
+
+
+class ConditionModulationBlock(nn.Module):
+    """
+    Transformer block that takes in a cross-attention condition and another modulation vector applied to sub-blocks.
+    """
+    # use attention from torch.nn.MultiHeadAttention
+    # Block contains a cross-attention layer, a self-attention layer, and a MLP
+    def __init__(self, inner_dim: int, cond_dim: int, mod_dim: int, num_heads: int, eps: float,
+                 attn_drop: float = 0., attn_bias: bool = False,
+                 mlp_ratio: float = 4., mlp_drop: float = 0.):
+        super().__init__()
+        self.norm1 = ModLN(inner_dim, mod_dim, eps)
+        self.cross_attn = nn.MultiheadAttention(
+            embed_dim=inner_dim, num_heads=num_heads, kdim=cond_dim, vdim=cond_dim,
+            dropout=attn_drop, bias=attn_bias, batch_first=True)
+        self.norm2 = ModLN(inner_dim, mod_dim, eps)
+        self.self_attn = nn.MultiheadAttention(
+            embed_dim=inner_dim, num_heads=num_heads,
+            dropout=attn_drop, bias=attn_bias, batch_first=True)
+        self.norm3 = ModLN(inner_dim, mod_dim, eps)
+        self.mlp = nn.Sequential(
+            nn.Linear(inner_dim, int(inner_dim * mlp_ratio)),
+            nn.GELU(),
+            nn.Dropout(mlp_drop),
+            nn.Linear(int(inner_dim * mlp_ratio), inner_dim),
+            nn.Dropout(mlp_drop),
+        )
+
+    def forward(self, x, cond, mod):
+        # x: [N, L, D]
+        # cond: [N, L_cond, D_cond]
+        # mod: [N, D_mod]
+        x = x + self.cross_attn(self.norm1(x, mod), cond, cond, need_weights=False)[0]
+        before_sa = self.norm2(x, mod)
+        x = x + self.self_attn(before_sa, before_sa, before_sa, need_weights=False)[0]
+        x = x + self.mlp(self.norm3(x, mod))
+        return x
+
+
+class TriplaneTransformer(nn.Module):
+    """
+    Transformer with condition and modulation that generates a triplane representation.
+    
+    Reference:
+    Timm: https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/vision_transformer.py#L486
+    """
+    def __init__(self, inner_dim: int, image_feat_dim: int, camera_embed_dim: int,
+                 triplane_low_res: int, triplane_high_res: int, triplane_dim: int,
+                 num_layers: int, num_heads: int,
+                 eps: float = 1e-6):
+        super().__init__()
+
+        # attributes
+        self.triplane_low_res = triplane_low_res
+        self.triplane_high_res = triplane_high_res
+        self.triplane_dim = triplane_dim
+
+        # modules
+        # initialize pos_embed with 1/sqrt(dim) * N(0, 1)
+        self.pos_embed = nn.Parameter(torch.randn(1, 3*triplane_low_res**2, inner_dim) * (1. / inner_dim) ** 0.5)
+        self.layers = nn.ModuleList([
+            ConditionModulationBlock(
+                inner_dim=inner_dim, cond_dim=image_feat_dim, mod_dim=camera_embed_dim, num_heads=num_heads, eps=eps)
+            for _ in range(num_layers)
+        ])
+        self.norm = nn.LayerNorm(inner_dim, eps=eps)
+        self.deconv = nn.ConvTranspose2d(inner_dim, triplane_dim, kernel_size=2, stride=2, padding=0)
+
+    def forward(self, image_feats, camera_embeddings):
+        # image_feats: [N, L_cond, D_cond]
+        # camera_embeddings: [N, D_mod]
+
+        assert image_feats.shape[0] == camera_embeddings.shape[0], \
+            f"Mismatched batch size: {image_feats.shape[0]} vs {camera_embeddings.shape[0]}"
+
+        N = image_feats.shape[0]
+        H = W = self.triplane_low_res
+        L = 3 * H * W
+
+        x = self.pos_embed.repeat(N, 1, 1)  # [N, L, D]
+        for layer in self.layers:
+            x = layer(x, image_feats, camera_embeddings)
+        x = self.norm(x)
+
+        # separate each plane and apply deconv
+        x = x.view(N, 3, H, W, -1)
+        x = torch.einsum('nihwd->indhw', x)  # [3, N, D, H, W]
+        x = x.contiguous().view(3*N, -1, H, W)  # [3*N, D, H, W]
+        x = self.deconv(x)  # [3*N, D', H', W']
+        x = x.view(3, N, *x.shape[-3:])  # [3, N, D', H', W']
+        x = torch.einsum('indhw->nidhw', x)  # [N, 3, D', H', W']
+        x = x.contiguous()
+
+        assert self.triplane_high_res == x.shape[-2], \
+            f"Output triplane resolution does not match with expected: {x.shape[-2]} vs {self.triplane_high_res}"
+        assert self.triplane_dim == x.shape[-3], \
+            f"Output triplane dimension does not match with expected: {x.shape[-3]} vs {self.triplane_dim}"
+
+        return x
diff --git a/modeling.py b/modeling.py
new file mode 100644
index 0000000000000000000000000000000000000000..8d43ccd646a94f943fb351a18a1ea2b7b7b2290a
--- /dev/null
+++ b/modeling.py
@@ -0,0 +1,84 @@
+
+#### modeling.py
+import torch.nn as nn
+from transformers import PreTrainedModel, PretrainedConfig
+import torch
+# import dinowrapper
+from lrm.models.encoders.dino_wrapper2 import DinoWrapper
+from lrm.models.transformer import TriplaneTransformer
+from lrm.models.rendering.synthesizer_part import TriplaneSynthesizer
+
+class CameraEmbedder(nn.Module):
+    def __init__(self, raw_dim: int, embed_dim: int):
+        super().__init__()
+        self.mlp = nn.Sequential(
+            nn.Linear(raw_dim, embed_dim),
+            nn.SiLU(),
+            nn.Linear(embed_dim, embed_dim),
+        )
+
+    def forward(self, x):
+        return self.mlp(x)
+
+class LRMGeneratorConfig(PretrainedConfig):
+    model_type = "lrm_generator"
+
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        self.camera_embed_dim = kwargs.get("camera_embed_dim", 1024)
+        self.rendering_samples_per_ray = kwargs.get("rendering_samples_per_ray", 128)
+        self.transformer_dim = kwargs.get("transformer_dim", 1024)
+        self.transformer_layers = kwargs.get("transformer_layers", 16)
+        self.transformer_heads = kwargs.get("transformer_heads", 16)
+        self.triplane_low_res = kwargs.get("triplane_low_res", 32)
+        self.triplane_high_res = kwargs.get("triplane_high_res", 64)
+        self.triplane_dim = kwargs.get("triplane_dim", 80)
+        self.encoder_freeze = kwargs.get("encoder_freeze", False)
+        self.encoder_model_name = kwargs.get("encoder_model_name", 'facebook/dinov2-base')
+        self.encoder_feat_dim = kwargs.get("encoder_feat_dim", 768)
+
+class LRMGenerator(PreTrainedModel):
+    config_class = LRMGeneratorConfig
+
+    def __init__(self, config: LRMGeneratorConfig):
+        super().__init__(config)
+
+        self.encoder_feat_dim = config.encoder_feat_dim
+        self.camera_embed_dim = config.camera_embed_dim
+
+        self.encoder = DinoWrapper(
+            model_name=config.encoder_model_name,
+            freeze=config.encoder_freeze,
+        )
+        self.camera_embedder = CameraEmbedder(
+            raw_dim=12 + 4, embed_dim=config.camera_embed_dim,
+        )
+        self.transformer = TriplaneTransformer(
+            inner_dim=config.transformer_dim, num_layers=config.transformer_layers, num_heads=config.transformer_heads,
+            image_feat_dim=config.encoder_feat_dim,
+            camera_embed_dim=config.camera_embed_dim,
+            triplane_low_res=config.triplane_low_res, triplane_high_res=config.triplane_high_res, triplane_dim=config.triplane_dim,
+        )
+        self.synthesizer = TriplaneSynthesizer(
+            triplane_dim=config.triplane_dim, samples_per_ray=config.rendering_samples_per_ray,
+        )
+
+    def forward(self, image, camera):
+        assert image.shape[0] == camera.shape[0], "Batch size mismatch"
+        N = image.shape[0]
+
+        # encode image
+        image_feats = self.encoder(image)
+        assert image_feats.shape[-1] == self.encoder_feat_dim, \
+            f"Feature dimension mismatch: {image_feats.shape[-1]} vs {self.encoder_feat_dim}"
+
+        # embed camera
+        camera_embeddings = self.camera_embedder(camera)
+        assert camera_embeddings.shape[-1] == self.camera_embed_dim, \
+            f"Feature dimension mismatch: {camera_embeddings.shape[-1]} vs {self.camera_embed_dim}"
+
+        # transformer generating planes
+        planes = self.transformer(image_feats, camera_embeddings)
+        assert planes.shape[0] == N, "Batch size mismatch for planes"
+        assert planes.shape[1] == 3, "Planes should have 3 channels"
+        return planes
diff --git a/results/40_prompt_images_provided/A DSLR photo of Sydney Opera House.mp4 b/results/40_prompt_images_provided/A DSLR photo of Sydney Opera House.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..8adcb328ab57364cefdd458aa85baa637bc6df47
Binary files /dev/null and b/results/40_prompt_images_provided/A DSLR photo of Sydney Opera House.mp4 differ
diff --git a/results/40_prompt_images_provided/A crab, low poly.mp4 b/results/40_prompt_images_provided/A crab, low poly.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..5771e6d9b27b4436f5151b9db1daae0fd9fb3744
Binary files /dev/null and b/results/40_prompt_images_provided/A crab, low poly.mp4 differ
diff --git a/results/40_prompt_images_provided/A product photo of a toy tank.mp4 b/results/40_prompt_images_provided/A product photo of a toy tank.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..3138d34535646273108aabb1d299d4d1254c6d0e
Binary files /dev/null and b/results/40_prompt_images_provided/A product photo of a toy tank.mp4 differ
diff --git a/results/40_prompt_images_provided/A statue of angel, blender.mp4 b/results/40_prompt_images_provided/A statue of angel, blender.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..2611eb2370cd42faefe7a02b6d306c0d8a1e66ea
Binary files /dev/null and b/results/40_prompt_images_provided/A statue of angel, blender.mp4 differ
diff --git a/results/40_prompt_images_provided/Daenerys Targaryen from game of throne.mp4 b/results/40_prompt_images_provided/Daenerys Targaryen from game of throne.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..48ce5f814b058114fe77ed0462ca1ba250afb394
Binary files /dev/null and b/results/40_prompt_images_provided/Daenerys Targaryen from game of throne.mp4 differ
diff --git a/results/40_prompt_images_provided/Darth Vader helmet,g highly detailed.mp4 b/results/40_prompt_images_provided/Darth Vader helmet,g highly detailed.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..2516120dba0faa363772fc7c0591864e38403e39
Binary files /dev/null and b/results/40_prompt_images_provided/Darth Vader helmet,g highly detailed.mp4 differ
diff --git a/results/40_prompt_images_provided/Fisherman House, cute, cartoon, blender, stylized.mp4 b/results/40_prompt_images_provided/Fisherman House, cute, cartoon, blender, stylized.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..294154b62f170ef56594ec86c6e4d0b060fda0d8
Binary files /dev/null and b/results/40_prompt_images_provided/Fisherman House, cute, cartoon, blender, stylized.mp4 differ
diff --git a/results/40_prompt_images_provided/Handpainted watercolor windmill, hand-painted.mp4 b/results/40_prompt_images_provided/Handpainted watercolor windmill, hand-painted.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..44182e80f87a67deea461ba42adec44582b76978
Binary files /dev/null and b/results/40_prompt_images_provided/Handpainted watercolor windmill, hand-painted.mp4 differ
diff --git a/results/40_prompt_images_provided/Little italian town, hand-painted style.mp4 b/results/40_prompt_images_provided/Little italian town, hand-painted style.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..9bd5aaef103bb6eff82b6f13357040e36febd03e
Binary files /dev/null and b/results/40_prompt_images_provided/Little italian town, hand-painted style.mp4 differ
diff --git a/results/40_prompt_images_provided/Mr Bean Cartoon doing a T Pose.mp4 b/results/40_prompt_images_provided/Mr Bean Cartoon doing a T Pose.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..7e4aa8baf05e4206736b4472a3d940c60786e4d2
Binary files /dev/null and b/results/40_prompt_images_provided/Mr Bean Cartoon doing a T Pose.mp4 differ
diff --git a/results/40_prompt_images_provided/Pikachu with hat.mp4 b/results/40_prompt_images_provided/Pikachu with hat.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..6db03526136313d4a8552fce8848aab212ca214b
Binary files /dev/null and b/results/40_prompt_images_provided/Pikachu with hat.mp4 differ
diff --git a/results/40_prompt_images_provided/Samurai koala bear.mp4 b/results/40_prompt_images_provided/Samurai koala bear.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..c42760abd477c8347478dd49eb264ced31fa76d2
Binary files /dev/null and b/results/40_prompt_images_provided/Samurai koala bear.mp4 differ
diff --git a/results/40_prompt_images_provided/a DSLR photo of a ghost eating a hamburger.mp4 b/results/40_prompt_images_provided/a DSLR photo of a ghost eating a hamburger.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..02a6055ca33e74e68ac814397ee0eba1e065438c
Binary files /dev/null and b/results/40_prompt_images_provided/a DSLR photo of a ghost eating a hamburger.mp4 differ
diff --git a/results/40_prompt_images_provided/a DSLR photo of a squirrel playing guitar.mp4 b/results/40_prompt_images_provided/a DSLR photo of a squirrel playing guitar.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..b52f07e9f8eef35e6766bfe30f62b29cb2e5e31c
Binary files /dev/null and b/results/40_prompt_images_provided/a DSLR photo of a squirrel playing guitar.mp4 differ
diff --git a/results/40_prompt_images_provided/animal skull pile.mp4 b/results/40_prompt_images_provided/animal skull pile.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..51da03dec699558b3ae1c809e40764f359e72dfd
Binary files /dev/null and b/results/40_prompt_images_provided/animal skull pile.mp4 differ
diff --git a/results/40_prompt_images_provided/army Jacket, 3D scan.mp4 b/results/40_prompt_images_provided/army Jacket, 3D scan.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..f17bacdc11879b18194bb756309b0f4c73e9b2c2
Binary files /dev/null and b/results/40_prompt_images_provided/army Jacket, 3D scan.mp4 differ
diff --git a/results/40_prompt_images_provided/beautiful, intricate butterfly.mp4 b/results/40_prompt_images_provided/beautiful, intricate butterfly.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..16492a9e59feedd32f121b8065891103a7a6ae2e
Binary files /dev/null and b/results/40_prompt_images_provided/beautiful, intricate butterfly.mp4 differ
diff --git a/results/40_prompt_images_provided/girl riding wolf, cute, cartoon, blender.mp4 b/results/40_prompt_images_provided/girl riding wolf, cute, cartoon, blender.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..a94bdf89d1f89f036f5f3ccdc9e88773abf821f7
Binary files /dev/null and b/results/40_prompt_images_provided/girl riding wolf, cute, cartoon, blender.mp4 differ
diff --git a/results/40_prompt_images_provided/mecha vampire girl chibi.mp4 b/results/40_prompt_images_provided/mecha vampire girl chibi.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..ee3d0865f91f158d59819e468565597ba723e5de
Binary files /dev/null and b/results/40_prompt_images_provided/mecha vampire girl chibi.mp4 differ
diff --git a/results/40_prompt_images_provided/military Mech, future, scifi.mp4 b/results/40_prompt_images_provided/military Mech, future, scifi.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..1f8345e64a6f001b5b79e028fa182f1ee79dcfd7
Binary files /dev/null and b/results/40_prompt_images_provided/military Mech, future, scifi.mp4 differ