diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 0000000000000000000000000000000000000000..08b500a221857ec3f451338e80b4a9ab1173a1af --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,80 @@ +# Code of Conduct + +## Our Pledge + +In the interest of fostering an open and welcoming environment, we as +contributors and maintainers pledge to make participation in our project and +our community a harassment-free experience for everyone, regardless of age, body +size, disability, ethnicity, sex characteristics, gender identity and expression, +level of experience, education, socio-economic status, nationality, personal +appearance, race, religion, or sexual identity and orientation. + +## Our Standards + +Examples of behavior that contributes to creating a positive environment +include: + +* Using welcoming and inclusive language +* Being respectful of differing viewpoints and experiences +* Gracefully accepting constructive criticism +* Focusing on what is best for the community +* Showing empathy towards other community members + +Examples of unacceptable behavior by participants include: + +* The use of sexualized language or imagery and unwelcome sexual attention or + advances +* Trolling, insulting/derogatory comments, and personal or political attacks +* Public or private harassment +* Publishing others' private information, such as a physical or electronic + address, without explicit permission +* Other conduct which could reasonably be considered inappropriate in a + professional setting + +## Our Responsibilities + +Project maintainers are responsible for clarifying the standards of acceptable +behavior and are expected to take appropriate and fair corrective action in +response to any instances of unacceptable behavior. + +Project maintainers have the right and responsibility to remove, edit, or +reject comments, commits, code, wiki edits, issues, and other contributions +that are not aligned to this Code of Conduct, or to ban temporarily or +permanently any contributor for other behaviors that they deem inappropriate, +threatening, offensive, or harmful. + +## Scope + +This Code of Conduct applies within all project spaces, and it also applies when +an individual is representing the project or its community in public spaces. +Examples of representing a project or community include using an official +project e-mail address, posting via an official social media account, or acting +as an appointed representative at an online or offline event. Representation of +a project may be further defined and clarified by project maintainers. + +This Code of Conduct also applies outside the project spaces when there is a +reasonable belief that an individual's behavior may have a negative impact on +the project or its community. + +## Enforcement + +Instances of abusive, harassing, or otherwise unacceptable behavior may be +reported by contacting the project team at . All +complaints will be reviewed and investigated and will result in a response that +is deemed necessary and appropriate to the circumstances. The project team is +obligated to maintain confidentiality with regard to the reporter of an incident. +Further details of specific enforcement policies may be posted separately. + +Project maintainers who do not follow or enforce the Code of Conduct in good +faith may face temporary or permanent repercussions as determined by other +members of the project's leadership. + +## Attribution + +This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, +available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html + +[homepage]: https://www.contributor-covenant.org + +For answers to common questions about this code of conduct, see +https://www.contributor-covenant.org/faq diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000000000000000000000000000000000000..c88cc4f734d301267f3e7c00f6cfe4baf9a8222c --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,31 @@ +# Contributing to PoseDiffusion +We want to make contributing to this project as easy and transparent as +possible. + +## Pull Requests +We actively welcome your pull requests. + +1. Fork the repo and create your branch from `main`. +2. If you've added code that should be tested, add tests. +3. If you've changed APIs, update the documentation. +4. Ensure the test suite passes. +5. Make sure your code lints. +6. If you haven't already, complete the Contributor License Agreement ("CLA"). + +## Contributor License Agreement ("CLA") +In order to accept your pull request, we need you to submit a CLA. You only need +to do this once to work on any of Facebook's open source projects. + +Complete your CLA here: + +## Issues +We use GitHub issues to track public bugs. Please ensure your description is +clear and has sufficient instructions to be able to reproduce the issue. + +Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe +disclosure of security bugs. In those cases, please go through the process +outlined on that page and do not file a public issue. + +## License +By contributing to PoseDiffusion, you agree that your contributions will be licensed +under the LICENSE file in the root directory of this source tree. \ No newline at end of file diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000000000000000000000000000000000000..e395ca3e2cdebf48a6375a3c1022d10caabba7db --- /dev/null +++ b/LICENSE @@ -0,0 +1,399 @@ +Attribution-NonCommercial 4.0 International + +======================================================================= + +Creative Commons Corporation ("Creative Commons") is not a law firm and +does not provide legal services or legal advice. Distribution of +Creative Commons public licenses does not create a lawyer-client or +other relationship. Creative Commons makes its licenses and related +information available on an "as-is" basis. Creative Commons gives no +warranties regarding its licenses, any material licensed under their +terms and conditions, or any related information. Creative Commons +disclaims all liability for damages resulting from their use to the +fullest extent possible. + +Using Creative Commons Public Licenses + +Creative Commons public licenses provide a standard set of terms and +conditions that creators and other rights holders may use to share +original works of authorship and other material subject to copyright +and certain other rights specified in the public license below. The +following considerations are for informational purposes only, are not +exhaustive, and do not form part of our licenses. + + Considerations for licensors: Our public licenses are + intended for use by those authorized to give the public + permission to use material in ways otherwise restricted by + copyright and certain other rights. Our licenses are + irrevocable. Licensors should read and understand the terms + and conditions of the license they choose before applying it. + Licensors should also secure all rights necessary before + applying our licenses so that the public can reuse the + material as expected. Licensors should clearly mark any + material not subject to the license. This includes other CC- + licensed material, or material used under an exception or + limitation to copyright. More considerations for licensors: + wiki.creativecommons.org/Considerations_for_licensors + + Considerations for the public: By using one of our public + licenses, a licensor grants the public permission to use the + licensed material under specified terms and conditions. If + the licensor's permission is not necessary for any reason--for + example, because of any applicable exception or limitation to + copyright--then that use is not regulated by the license. Our + licenses grant only permissions under copyright and certain + other rights that a licensor has authority to grant. Use of + the licensed material may still be restricted for other + reasons, including because others have copyright or other + rights in the material. A licensor may make special requests, + such as asking that all changes be marked or described. + Although not required by our licenses, you are encouraged to + respect those requests where reasonable. More_considerations + for the public: + wiki.creativecommons.org/Considerations_for_licensees + +======================================================================= + +Creative Commons Attribution-NonCommercial 4.0 International Public +License + +By exercising the Licensed Rights (defined below), You accept and agree +to be bound by the terms and conditions of this Creative Commons +Attribution-NonCommercial 4.0 International Public License ("Public +License"). To the extent this Public License may be interpreted as a +contract, You are granted the Licensed Rights in consideration of Your +acceptance of these terms and conditions, and the Licensor grants You +such rights in consideration of benefits the Licensor receives from +making the Licensed Material available under these terms and +conditions. + +Section 1 -- Definitions. + + a. Adapted Material means material subject to Copyright and Similar + Rights that is derived from or based upon the Licensed Material + and in which the Licensed Material is translated, altered, + arranged, transformed, or otherwise modified in a manner requiring + permission under the Copyright and Similar Rights held by the + Licensor. For purposes of this Public License, where the Licensed + Material is a musical work, performance, or sound recording, + Adapted Material is always produced where the Licensed Material is + synched in timed relation with a moving image. + + b. Adapter's License means the license You apply to Your Copyright + and Similar Rights in Your contributions to Adapted Material in + accordance with the terms and conditions of this Public License. + + c. Copyright and Similar Rights means copyright and/or similar rights + closely related to copyright including, without limitation, + performance, broadcast, sound recording, and Sui Generis Database + Rights, without regard to how the rights are labeled or + categorized. For purposes of this Public License, the rights + specified in Section 2(b)(1)-(2) are not Copyright and Similar + Rights. + d. Effective Technological Measures means those measures that, in the + absence of proper authority, may not be circumvented under laws + fulfilling obligations under Article 11 of the WIPO Copyright + Treaty adopted on December 20, 1996, and/or similar international + agreements. + + e. Exceptions and Limitations means fair use, fair dealing, and/or + any other exception or limitation to Copyright and Similar Rights + that applies to Your use of the Licensed Material. + + f. Licensed Material means the artistic or literary work, database, + or other material to which the Licensor applied this Public + License. + + g. Licensed Rights means the rights granted to You subject to the + terms and conditions of this Public License, which are limited to + all Copyright and Similar Rights that apply to Your use of the + Licensed Material and that the Licensor has authority to license. + + h. Licensor means the individual(s) or entity(ies) granting rights + under this Public License. + + i. NonCommercial means not primarily intended for or directed towards + commercial advantage or monetary compensation. For purposes of + this Public License, the exchange of the Licensed Material for + other material subject to Copyright and Similar Rights by digital + file-sharing or similar means is NonCommercial provided there is + no payment of monetary compensation in connection with the + exchange. + + j. Share means to provide material to the public by any means or + process that requires permission under the Licensed Rights, such + as reproduction, public display, public performance, distribution, + dissemination, communication, or importation, and to make material + available to the public including in ways that members of the + public may access the material from a place and at a time + individually chosen by them. + + k. Sui Generis Database Rights means rights other than copyright + resulting from Directive 96/9/EC of the European Parliament and of + the Council of 11 March 1996 on the legal protection of databases, + as amended and/or succeeded, as well as other essentially + equivalent rights anywhere in the world. + + l. You means the individual or entity exercising the Licensed Rights + under this Public License. Your has a corresponding meaning. + +Section 2 -- Scope. + + a. License grant. + + 1. Subject to the terms and conditions of this Public License, + the Licensor hereby grants You a worldwide, royalty-free, + non-sublicensable, non-exclusive, irrevocable license to + exercise the Licensed Rights in the Licensed Material to: + + a. reproduce and Share the Licensed Material, in whole or + in part, for NonCommercial purposes only; and + + b. produce, reproduce, and Share Adapted Material for + NonCommercial purposes only. + + 2. Exceptions and Limitations. For the avoidance of doubt, where + Exceptions and Limitations apply to Your use, this Public + License does not apply, and You do not need to comply with + its terms and conditions. + + 3. Term. The term of this Public License is specified in Section + 6(a). + + 4. Media and formats; technical modifications allowed. The + Licensor authorizes You to exercise the Licensed Rights in + all media and formats whether now known or hereafter created, + and to make technical modifications necessary to do so. The + Licensor waives and/or agrees not to assert any right or + authority to forbid You from making technical modifications + necessary to exercise the Licensed Rights, including + technical modifications necessary to circumvent Effective + Technological Measures. For purposes of this Public License, + simply making modifications authorized by this Section 2(a) + (4) never produces Adapted Material. + + 5. Downstream recipients. + + a. Offer from the Licensor -- Licensed Material. Every + recipient of the Licensed Material automatically + receives an offer from the Licensor to exercise the + Licensed Rights under the terms and conditions of this + Public License. + + b. No downstream restrictions. You may not offer or impose + any additional or different terms or conditions on, or + apply any Effective Technological Measures to, the + Licensed Material if doing so restricts exercise of the + Licensed Rights by any recipient of the Licensed + Material. + + 6. No endorsement. Nothing in this Public License constitutes or + may be construed as permission to assert or imply that You + are, or that Your use of the Licensed Material is, connected + with, or sponsored, endorsed, or granted official status by, + the Licensor or others designated to receive attribution as + provided in Section 3(a)(1)(A)(i). + + b. Other rights. + + 1. Moral rights, such as the right of integrity, are not + licensed under this Public License, nor are publicity, + privacy, and/or other similar personality rights; however, to + the extent possible, the Licensor waives and/or agrees not to + assert any such rights held by the Licensor to the limited + extent necessary to allow You to exercise the Licensed + Rights, but not otherwise. + + 2. Patent and trademark rights are not licensed under this + Public License. + + 3. To the extent possible, the Licensor waives any right to + collect royalties from You for the exercise of the Licensed + Rights, whether directly or through a collecting society + under any voluntary or waivable statutory or compulsory + licensing scheme. In all other cases the Licensor expressly + reserves any right to collect such royalties, including when + the Licensed Material is used other than for NonCommercial + purposes. + +Section 3 -- License Conditions. + +Your exercise of the Licensed Rights is expressly made subject to the +following conditions. + + a. Attribution. + + 1. If You Share the Licensed Material (including in modified + form), You must: + + a. retain the following if it is supplied by the Licensor + with the Licensed Material: + + i. identification of the creator(s) of the Licensed + Material and any others designated to receive + attribution, in any reasonable manner requested by + the Licensor (including by pseudonym if + designated); + + ii. a copyright notice; + + iii. a notice that refers to this Public License; + + iv. a notice that refers to the disclaimer of + warranties; + + v. a URI or hyperlink to the Licensed Material to the + extent reasonably practicable; + + b. indicate if You modified the Licensed Material and + retain an indication of any previous modifications; and + + c. indicate the Licensed Material is licensed under this + Public License, and include the text of, or the URI or + hyperlink to, this Public License. + + 2. You may satisfy the conditions in Section 3(a)(1) in any + reasonable manner based on the medium, means, and context in + which You Share the Licensed Material. For example, it may be + reasonable to satisfy the conditions by providing a URI or + hyperlink to a resource that includes the required + information. + + 3. If requested by the Licensor, You must remove any of the + information required by Section 3(a)(1)(A) to the extent + reasonably practicable. + + 4. If You Share Adapted Material You produce, the Adapter's + License You apply must not prevent recipients of the Adapted + Material from complying with this Public License. + +Section 4 -- Sui Generis Database Rights. + +Where the Licensed Rights include Sui Generis Database Rights that +apply to Your use of the Licensed Material: + + a. for the avoidance of doubt, Section 2(a)(1) grants You the right + to extract, reuse, reproduce, and Share all or a substantial + portion of the contents of the database for NonCommercial purposes + only; + + b. if You include all or a substantial portion of the database + contents in a database in which You have Sui Generis Database + Rights, then the database in which You have Sui Generis Database + Rights (but not its individual contents) is Adapted Material; and + + c. You must comply with the conditions in Section 3(a) if You Share + all or a substantial portion of the contents of the database. + +For the avoidance of doubt, this Section 4 supplements and does not +replace Your obligations under this Public License where the Licensed +Rights include other Copyright and Similar Rights. + +Section 5 -- Disclaimer of Warranties and Limitation of Liability. + + a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE + EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS + AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF + ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, + IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, + WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR + PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, + ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT + KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT + ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. + + b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE + TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, + NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, + INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, + COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR + USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN + ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR + DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR + IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. + + c. The disclaimer of warranties and limitation of liability provided + above shall be interpreted in a manner that, to the extent + possible, most closely approximates an absolute disclaimer and + waiver of all liability. + +Section 6 -- Term and Termination. + + a. This Public License applies for the term of the Copyright and + Similar Rights licensed here. However, if You fail to comply with + this Public License, then Your rights under this Public License + terminate automatically. + + b. Where Your right to use the Licensed Material has terminated under + Section 6(a), it reinstates: + + 1. automatically as of the date the violation is cured, provided + it is cured within 30 days of Your discovery of the + violation; or + + 2. upon express reinstatement by the Licensor. + + For the avoidance of doubt, this Section 6(b) does not affect any + right the Licensor may have to seek remedies for Your violations + of this Public License. + + c. For the avoidance of doubt, the Licensor may also offer the + Licensed Material under separate terms or conditions or stop + distributing the Licensed Material at any time; however, doing so + will not terminate this Public License. + + d. Sections 1, 5, 6, 7, and 8 survive termination of this Public + License. + +Section 7 -- Other Terms and Conditions. + + a. The Licensor shall not be bound by any additional or different + terms or conditions communicated by You unless expressly agreed. + + b. Any arrangements, understandings, or agreements regarding the + Licensed Material not stated herein are separate from and + independent of the terms and conditions of this Public License. + +Section 8 -- Interpretation. + + a. For the avoidance of doubt, this Public License does not, and + shall not be interpreted to, reduce, limit, restrict, or impose + conditions on any use of the Licensed Material that could lawfully + be made without permission under this Public License. + + b. To the extent possible, if any provision of this Public License is + deemed unenforceable, it shall be automatically reformed to the + minimum extent necessary to make it enforceable. If the provision + cannot be reformed, it shall be severed from this Public License + without affecting the enforceability of the remaining terms and + conditions. + + c. No term or condition of this Public License will be waived and no + failure to comply consented to unless expressly agreed to by the + Licensor. + + d. Nothing in this Public License constitutes or may be interpreted + as a limitation upon, or waiver of, any privileges and immunities + that apply to the Licensor or You, including from the legal + processes of any jurisdiction or authority. + +======================================================================= + +Creative Commons is not a party to its public +licenses. Notwithstanding, Creative Commons may elect to apply one of +its public licenses to material it publishes and in those instances +will be considered the “Licensor.” The text of the Creative Commons +public licenses is dedicated to the public domain under the CC0 Public +Domain Dedication. Except for the limited purpose of indicating that +material is shared under a Creative Commons public license or as +otherwise permitted by the Creative Commons policies published at +creativecommons.org/policies, Creative Commons does not authorize the +use of the trademark "Creative Commons" or any other trademark or logo +of Creative Commons without its prior written consent including, +without limitation, in connection with any unauthorized modifications +to any of its public licenses or any other arrangements, +understandings, or agreements concerning use of licensed material. For +the avoidance of doubt, this paragraph does not form part of the +public licenses. + +Creative Commons may be contacted at creativecommons.org. \ No newline at end of file diff --git a/README.md b/README.md index bc5f30d6632ac0efdc7be2e9095e9e9579af2e33..08ef81ec7ca35be21752575a0c040a3d64947cad 100644 --- a/README.md +++ b/README.md @@ -1,199 +1,89 @@ ---- -library_name: transformers -tags: [] ---- +# [ECCV 2024] VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models -# Model Card for Model ID +[Porject page](https://junlinhan.github.io/projects/vfusion3d.html), [Paper link](https://arxiv.org/abs/2403.12034) - +VFusion3D is a large, feed-forward 3D generative model trained with a small amount of 3D data and a large volume of synthetic multi-view data. It is the first work exploring scalable 3D generative/reconstruction models as a step towards a 3D foundation. +[VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models](https://junlinhan.github.io/projects/vfusion3d.html)
+[Junlin Han](https://junlinhan.github.io/), [Filippos Kokkinos](https://www.fkokkinos.com/), [Philip Torr](https://www.robots.ox.ac.uk/~phst/)
+GenAI, Meta and TVG, University of Oxford
+European Conference on Computer Vision (ECCV), 2024 -## Model Details +## News -### Model Description +- [25.07.2024] Release weights and inference code for VFusion3D. - +## Results and Comparisons -This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. +### 3D Generation Results + -- **Developed by:** [More Information Needed] -- **Funded by [optional]:** [More Information Needed] -- **Shared by [optional]:** [More Information Needed] -- **Model type:** [More Information Needed] -- **Language(s) (NLP):** [More Information Needed] -- **License:** [More Information Needed] -- **Finetuned from model [optional]:** [More Information Needed] + -### Model Sources [optional] +### User Study Results + - -- **Repository:** [More Information Needed] -- **Paper [optional]:** [More Information Needed] -- **Demo [optional]:** [More Information Needed] +## Setup -## Uses +### Installation +``` +git clone https://github.com/facebookresearch/vfusion3d +cd vfusion3d +``` - +### Environment +We provide a simple installation script that, by default, sets up a conda environment with Python 3.8.19, PyTorch 2.3, and CUDA 12.1. Similar package versions should also work. -### Direct Use +``` +source install.sh +``` - +## Quick Start -[More Information Needed] +### Pretrained Models -### Downstream Use [optional] +- Model weights are available here [Google Drive](https://drive.google.com/file/d/1b-KKSh9VquJdzmXzZBE4nKbXnbeua42X/view?usp=sharing). Please download it and put it inside ./checkpoints/ - -[More Information Needed] +### Prepare Images +- We put some sample inputs under `assets/40_prompt_images`, which is the 40 MVDream prompt images used in the paper. Results of them are also provided under `results/40_prompt_images_provided`. -### Out-of-Scope Use +### Inference +- Run the inference script to get 3D assets. +- You may specify which form of output to generate by setting the flags `--export_video` and `--export_mesh`. +- Change `--source_path` and `--dump_path` if you want to run it on other image folders. - + ``` + # Example usages + # Render a video + python -m lrm.inferrer --export_video --resume ./checkpoints/vfusion3dckpt + + # Export mesh + python -m lrm.inferrer --export_mesh --resume ./checkpoints/vfusion3dckpt + ``` -[More Information Needed] -## Bias, Risks, and Limitations +## Acknowledgement - +- This inference code of VFusion3D heavily borrows from [OpenLRM](https://github.com/3DTopia/OpenLRM). -[More Information Needed] +## Citation -### Recommendations +If you find this work useful, please cite us: - -Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. +``` +@article{han2024vfusion3d, + title={VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models}, + author={Junlin Han and Filippos Kokkinos and Philip Torr}, + journal={European Conference on Computer Vision (ECCV)}, + year={2024} +} +``` -## How to Get Started with the Model +## License -Use the code below to get started with the model. - -[More Information Needed] - -## Training Details - -### Training Data - - - -[More Information Needed] - -### Training Procedure - - - -#### Preprocessing [optional] - -[More Information Needed] - - -#### Training Hyperparameters - -- **Training regime:** [More Information Needed] - -#### Speeds, Sizes, Times [optional] - - - -[More Information Needed] - -## Evaluation - - - -### Testing Data, Factors & Metrics - -#### Testing Data - - - -[More Information Needed] - -#### Factors - - - -[More Information Needed] - -#### Metrics - - - -[More Information Needed] - -### Results - -[More Information Needed] - -#### Summary - - - -## Model Examination [optional] - - - -[More Information Needed] - -## Environmental Impact - - - -Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - -- **Hardware Type:** [More Information Needed] -- **Hours used:** [More Information Needed] -- **Cloud Provider:** [More Information Needed] -- **Compute Region:** [More Information Needed] -- **Carbon Emitted:** [More Information Needed] - -## Technical Specifications [optional] - -### Model Architecture and Objective - -[More Information Needed] - -### Compute Infrastructure - -[More Information Needed] - -#### Hardware - -[More Information Needed] - -#### Software - -[More Information Needed] - -## Citation [optional] - - - -**BibTeX:** - -[More Information Needed] - -**APA:** - -[More Information Needed] - -## Glossary [optional] - - - -[More Information Needed] - -## More Information [optional] - -[More Information Needed] - -## Model Card Authors [optional] - -[More Information Needed] - -## Model Card Contact - -[More Information Needed] \ No newline at end of file +- The majority of VFusion3D is licensed under CC-BY-NC, however portions of the project are available under separate license terms: OpenLRM as a whole is licensed under the Apache License, Version 2.0, while certain components are covered by NVIDIA's proprietary license. +- The model weights of VFusion3D is also licensed under CC-BY-NC. diff --git a/__init__.py b/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ff1adc86a447203216b139815a88c4356b4cab73 --- /dev/null +++ b/__init__.py @@ -0,0 +1 @@ +from .modeling import LRMGenerator, LRMGeneratorConfig diff --git a/assets/40_prompt_images/A 3D scan of AK47, weapon.jpeg b/assets/40_prompt_images/A 3D scan of AK47, weapon.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..30d3c272af90e3b2f15fa11e6942de3e9033bad1 Binary files /dev/null and b/assets/40_prompt_images/A 3D scan of AK47, weapon.jpeg differ diff --git a/assets/40_prompt_images/A DSLR photo of Sydney Opera House.jpg b/assets/40_prompt_images/A DSLR photo of Sydney Opera House.jpg new file mode 100644 index 0000000000000000000000000000000000000000..a9813a060c4aacaa9629cc96007a8caebfa9f95b Binary files /dev/null and b/assets/40_prompt_images/A DSLR photo of Sydney Opera House.jpg differ diff --git a/assets/40_prompt_images/A bald eagle carved out of wood.jpg b/assets/40_prompt_images/A bald eagle carved out of wood.jpg new file mode 100644 index 0000000000000000000000000000000000000000..c399dca0716fe69c0f8983649cf2b2cf5b011cb4 Binary files /dev/null and b/assets/40_prompt_images/A bald eagle carved out of wood.jpg differ diff --git a/assets/40_prompt_images/A bulldog wearing a black pirate hat.jpeg b/assets/40_prompt_images/A bulldog wearing a black pirate hat.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..3bf3cc0ae812bed719c8b4397dd42aed602f7000 Binary files /dev/null and b/assets/40_prompt_images/A bulldog wearing a black pirate hat.jpeg differ diff --git a/assets/40_prompt_images/A crab, low poly.jpg b/assets/40_prompt_images/A crab, low poly.jpg new file mode 100644 index 0000000000000000000000000000000000000000..4e67107ce5c28ad7e9ac73162fe9dad95ec074ee Binary files /dev/null and b/assets/40_prompt_images/A crab, low poly.jpg differ diff --git a/assets/40_prompt_images/A photo of a horse walking.jpeg b/assets/40_prompt_images/A photo of a horse walking.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..86ec571909e0ab82ac1190164bd69fcdd56960d2 Binary files /dev/null and b/assets/40_prompt_images/A photo of a horse walking.jpeg differ diff --git a/assets/40_prompt_images/A pig wearing a backpack.jpeg b/assets/40_prompt_images/A pig wearing a backpack.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..075e8238fd8b467d8fde6c495c3d762e31c952ff Binary files /dev/null and b/assets/40_prompt_images/A pig wearing a backpack.jpeg differ diff --git a/assets/40_prompt_images/A product photo of a toy tank.jpg b/assets/40_prompt_images/A product photo of a toy tank.jpg new file mode 100644 index 0000000000000000000000000000000000000000..67b0fbc039b0e338d16f511f8182475139584b44 Binary files /dev/null and b/assets/40_prompt_images/A product photo of a toy tank.jpg differ diff --git a/assets/40_prompt_images/A see no evil monkey on a kick drum.jpg b/assets/40_prompt_images/A see no evil monkey on a kick drum.jpg new file mode 100644 index 0000000000000000000000000000000000000000..4e8f46a233d98b54ea3d74dd805d57eb0ee4998c Binary files /dev/null and b/assets/40_prompt_images/A see no evil monkey on a kick drum.jpg differ diff --git a/assets/40_prompt_images/A statue of angel, blender.jpg b/assets/40_prompt_images/A statue of angel, blender.jpg new file mode 100644 index 0000000000000000000000000000000000000000..ffa76760f799fd4f11f6aa79e54d18b3c5b49b3d Binary files /dev/null and b/assets/40_prompt_images/A statue of angel, blender.jpg differ diff --git a/assets/40_prompt_images/Corgi riding a rocket.jpeg b/assets/40_prompt_images/Corgi riding a rocket.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..c21f3911c84b84f71170e732538a4cd891138d5e Binary files /dev/null and b/assets/40_prompt_images/Corgi riding a rocket.jpeg differ diff --git a/assets/40_prompt_images/Daenerys Targaryen from game of throne.jpg b/assets/40_prompt_images/Daenerys Targaryen from game of throne.jpg new file mode 100644 index 0000000000000000000000000000000000000000..700073d917d2ce80e42ec6212284fa43679ea01a Binary files /dev/null and b/assets/40_prompt_images/Daenerys Targaryen from game of throne.jpg differ diff --git a/assets/40_prompt_images/Darth Vader helmet,g highly detailed.jpg b/assets/40_prompt_images/Darth Vader helmet,g highly detailed.jpg new file mode 100644 index 0000000000000000000000000000000000000000..a55926dc6d315ae8649b450bca32b78f30a93325 Binary files /dev/null and b/assets/40_prompt_images/Darth Vader helmet,g highly detailed.jpg differ diff --git a/assets/40_prompt_images/Dragon armor.jpeg b/assets/40_prompt_images/Dragon armor.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..980390085b60b2d91cb92f5478bef1170ec95623 Binary files /dev/null and b/assets/40_prompt_images/Dragon armor.jpeg differ diff --git a/assets/40_prompt_images/Fisherman House, cute, cartoon, blender, stylized.jpg b/assets/40_prompt_images/Fisherman House, cute, cartoon, blender, stylized.jpg new file mode 100644 index 0000000000000000000000000000000000000000..5382c16b5586ac34e5e4e20121747c1c6f6d11c8 Binary files /dev/null and b/assets/40_prompt_images/Fisherman House, cute, cartoon, blender, stylized.jpg differ diff --git a/assets/40_prompt_images/Flying Dragon, highly detailed, breathing fire.jpeg b/assets/40_prompt_images/Flying Dragon, highly detailed, breathing fire.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..a6dd85b38483f0048fe41eb1828a12f0ccc1edfc Binary files /dev/null and b/assets/40_prompt_images/Flying Dragon, highly detailed, breathing fire.jpeg differ diff --git a/assets/40_prompt_images/Handpainted watercolor windmill, hand-painted.jpg b/assets/40_prompt_images/Handpainted watercolor windmill, hand-painted.jpg new file mode 100644 index 0000000000000000000000000000000000000000..ada1875148b962387133b495baea884198c03217 Binary files /dev/null and b/assets/40_prompt_images/Handpainted watercolor windmill, hand-painted.jpg differ diff --git a/assets/40_prompt_images/Katana.jpeg b/assets/40_prompt_images/Katana.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..6e0518179f7a671144f817b6a891d464c27d5979 Binary files /dev/null and b/assets/40_prompt_images/Katana.jpeg differ diff --git a/assets/40_prompt_images/Little italian town, hand-painted style.jpg b/assets/40_prompt_images/Little italian town, hand-painted style.jpg new file mode 100644 index 0000000000000000000000000000000000000000..e80e9556db92d72ea548d453d705f9d98d561115 Binary files /dev/null and b/assets/40_prompt_images/Little italian town, hand-painted style.jpg differ diff --git a/assets/40_prompt_images/Mr Bean Cartoon doing a T Pose.jpg b/assets/40_prompt_images/Mr Bean Cartoon doing a T Pose.jpg new file mode 100644 index 0000000000000000000000000000000000000000..0c4efed01a6a93336f5b63fa6e34c5f84edf61ee Binary files /dev/null and b/assets/40_prompt_images/Mr Bean Cartoon doing a T Pose.jpg differ diff --git a/assets/40_prompt_images/Pedestal Fan (White).jpeg b/assets/40_prompt_images/Pedestal Fan (White).jpeg new file mode 100644 index 0000000000000000000000000000000000000000..0a9e6d9122fed02f81eddcf743e859183cf7a6a2 Binary files /dev/null and b/assets/40_prompt_images/Pedestal Fan (White).jpeg differ diff --git a/assets/40_prompt_images/Pikachu with hat.jpg b/assets/40_prompt_images/Pikachu with hat.jpg new file mode 100644 index 0000000000000000000000000000000000000000..209e659bbb762fc4eead32da3f364f40cc47ee80 Binary files /dev/null and b/assets/40_prompt_images/Pikachu with hat.jpg differ diff --git a/assets/40_prompt_images/Samurai koala bear.jpg b/assets/40_prompt_images/Samurai koala bear.jpg new file mode 100644 index 0000000000000000000000000000000000000000..e8528fcd3ab8db90ab142673db7c2d244a6462bf Binary files /dev/null and b/assets/40_prompt_images/Samurai koala bear.jpg differ diff --git a/assets/40_prompt_images/TRUMP figure.jpg b/assets/40_prompt_images/TRUMP figure.jpg new file mode 100644 index 0000000000000000000000000000000000000000..6506341e7d644ad7be8a8c0d7569e543029a94f2 Binary files /dev/null and b/assets/40_prompt_images/TRUMP figure.jpg differ diff --git a/assets/40_prompt_images/Viking axe, fantasy, weapon, blender, 8k, HD.jpg b/assets/40_prompt_images/Viking axe, fantasy, weapon, blender, 8k, HD.jpg new file mode 100644 index 0000000000000000000000000000000000000000..b386bbe9e771e6763b0f044c87f6d0afb27a5991 Binary files /dev/null and b/assets/40_prompt_images/Viking axe, fantasy, weapon, blender, 8k, HD.jpg differ diff --git a/assets/40_prompt_images/a DSLR photo of a frog wearing a sweater.jpg b/assets/40_prompt_images/a DSLR photo of a frog wearing a sweater.jpg new file mode 100644 index 0000000000000000000000000000000000000000..e7e2da782249bd67b6e638926248ecdbc1edda2a Binary files /dev/null and b/assets/40_prompt_images/a DSLR photo of a frog wearing a sweater.jpg differ diff --git a/assets/40_prompt_images/a DSLR photo of a ghost eating a hamburger.jpg b/assets/40_prompt_images/a DSLR photo of a ghost eating a hamburger.jpg new file mode 100644 index 0000000000000000000000000000000000000000..0b5b72f32bae9b399f4a13ad9de4b9f18fd7d4a0 Binary files /dev/null and b/assets/40_prompt_images/a DSLR photo of a ghost eating a hamburger.jpg differ diff --git a/assets/40_prompt_images/a DSLR photo of a peacock on a surfboard.jpeg b/assets/40_prompt_images/a DSLR photo of a peacock on a surfboard.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..9b3345c1aa637c434f1c03448b79bb367bc5415c Binary files /dev/null and b/assets/40_prompt_images/a DSLR photo of a peacock on a surfboard.jpeg differ diff --git a/assets/40_prompt_images/a DSLR photo of a squirrel playing guitar.jpg b/assets/40_prompt_images/a DSLR photo of a squirrel playing guitar.jpg new file mode 100644 index 0000000000000000000000000000000000000000..edd12b7ec3eb99eb636f3541e3bcae4858de17bd Binary files /dev/null and b/assets/40_prompt_images/a DSLR photo of a squirrel playing guitar.jpg differ diff --git a/assets/40_prompt_images/a DSLR photo of an eggshell broken in two with an adorable chick standing next to it.jpeg b/assets/40_prompt_images/a DSLR photo of an eggshell broken in two with an adorable chick standing next to it.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..3cdc1a155f3f01bcfcab843fc0b85e9586b5d31c Binary files /dev/null and b/assets/40_prompt_images/a DSLR photo of an eggshell broken in two with an adorable chick standing next to it.jpeg differ diff --git a/assets/40_prompt_images/an astronaut riding a horse.jpeg b/assets/40_prompt_images/an astronaut riding a horse.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..23af0cf9eec13cca0db613eae90e9ed30b696eef Binary files /dev/null and b/assets/40_prompt_images/an astronaut riding a horse.jpeg differ diff --git a/assets/40_prompt_images/animal skull pile.jpg b/assets/40_prompt_images/animal skull pile.jpg new file mode 100644 index 0000000000000000000000000000000000000000..bebef28be9075ef2f8816bb1228d900b9168fe7a Binary files /dev/null and b/assets/40_prompt_images/animal skull pile.jpg differ diff --git a/assets/40_prompt_images/army Jacket, 3D scan.jpg b/assets/40_prompt_images/army Jacket, 3D scan.jpg new file mode 100644 index 0000000000000000000000000000000000000000..4b6e1c5e8287c2d82abc8502c8cef214f4e13f03 Binary files /dev/null and b/assets/40_prompt_images/army Jacket, 3D scan.jpg differ diff --git a/assets/40_prompt_images/baby yoda in the style of Mormookiee.jpg b/assets/40_prompt_images/baby yoda in the style of Mormookiee.jpg new file mode 100644 index 0000000000000000000000000000000000000000..44b855965e758586ccbb0b0305cc54ca12abdd0e Binary files /dev/null and b/assets/40_prompt_images/baby yoda in the style of Mormookiee.jpg differ diff --git a/assets/40_prompt_images/beautiful, intricate butterfly.jpg b/assets/40_prompt_images/beautiful, intricate butterfly.jpg new file mode 100644 index 0000000000000000000000000000000000000000..035746d20194888b202795ebab053d3b131e830a Binary files /dev/null and b/assets/40_prompt_images/beautiful, intricate butterfly.jpg differ diff --git a/assets/40_prompt_images/girl riding wolf, cute, cartoon, blender.jpg b/assets/40_prompt_images/girl riding wolf, cute, cartoon, blender.jpg new file mode 100644 index 0000000000000000000000000000000000000000..3374d4572e3d6872e1669d9dabf0065ec3fa0b78 Binary files /dev/null and b/assets/40_prompt_images/girl riding wolf, cute, cartoon, blender.jpg differ diff --git a/assets/40_prompt_images/mecha vampire girl chibi.jpg b/assets/40_prompt_images/mecha vampire girl chibi.jpg new file mode 100644 index 0000000000000000000000000000000000000000..661e0f81b7973ce7ed47187b7d038bf6687531f7 Binary files /dev/null and b/assets/40_prompt_images/mecha vampire girl chibi.jpg differ diff --git a/assets/40_prompt_images/military Mech, future, scifi.jpg b/assets/40_prompt_images/military Mech, future, scifi.jpg new file mode 100644 index 0000000000000000000000000000000000000000..d5a709490519af18909bac394244dd9cead33fcf Binary files /dev/null and b/assets/40_prompt_images/military Mech, future, scifi.jpg differ diff --git a/assets/40_prompt_images/motorcycle, scifi, blender.jpeg b/assets/40_prompt_images/motorcycle, scifi, blender.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..5f54a0ede7c417feb56706f2b773aacc54c101b4 Binary files /dev/null and b/assets/40_prompt_images/motorcycle, scifi, blender.jpeg differ diff --git a/assets/40_prompt_images/saber from fate stay night, 3D, girl, anime.jpeg b/assets/40_prompt_images/saber from fate stay night, 3D, girl, anime.jpeg new file mode 100644 index 0000000000000000000000000000000000000000..b449e2c3bc95a18d08d86db8d49d6c4ee2d74434 Binary files /dev/null and b/assets/40_prompt_images/saber from fate stay night, 3D, girl, anime.jpeg differ diff --git a/install.sh b/install.sh new file mode 100644 index 0000000000000000000000000000000000000000..2c54722629fb7d5ce802359113b8632cff1a2aef --- /dev/null +++ b/install.sh @@ -0,0 +1,25 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + +# This Script Assumes Python 3.8.19, CUDA 12.1. Similar package versions might still work but they are not tested. + +conda deactivate + +# Set environment variables +export ENV_NAME=vfusion3d +export PYTHON_VERSION=3.8.19 +export CUDA_VERSION=12.1 + +# Create a new conda environment and activate it +conda create -n $ENV_NAME python=$PYTHON_VERSION +conda activate $ENV_NAME +conda install pytorch=2.3.0 torchvision==0.18.0 pytorch-cuda=$CUDA_VERSION -c pytorch -c nvidia +pip install transformers +pip install imageio[ffmpeg] +pip install PyMCubes +pip install trimesh +pip install rembg[gpu,cli] +pip install kiui \ No newline at end of file diff --git a/lrm/__init__.py b/lrm/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..75ec8635435eb80f60bbe4cfe48c7c3239b3466e --- /dev/null +++ b/lrm/__init__.py @@ -0,0 +1,5 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. \ No newline at end of file diff --git a/lrm/cam_utils.py b/lrm/cam_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..1f11c7a909dcd2f5afe8c7ff7bffc063f7ffeafd --- /dev/null +++ b/lrm/cam_utils.py @@ -0,0 +1,138 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + + +import torch +import numpy as np +import math + +""" +R: (N, 3, 3) +T: (N, 3) +E: (N, 4, 4) +vector: (N, 3) +""" + + +def compose_extrinsic_R_T(R: torch.Tensor, T: torch.Tensor): + """ + Compose the standard form extrinsic matrix from R and T. + Batched I/O. + """ + RT = torch.cat((R, T.unsqueeze(-1)), dim=-1) + return compose_extrinsic_RT(RT) + + +def compose_extrinsic_RT(RT: torch.Tensor): + """ + Compose the standard form extrinsic matrix from RT. + Batched I/O. + """ + return torch.cat([ + RT, + torch.tensor([[[0, 0, 0, 1]]], dtype=torch.float32).repeat(RT.shape[0], 1, 1).to(RT.device) + ], dim=1) + + +def decompose_extrinsic_R_T(E: torch.Tensor): + """ + Decompose the standard extrinsic matrix into R and T. + Batched I/O. + """ + RT = decompose_extrinsic_RT(E) + return RT[:, :, :3], RT[:, :, 3] + + +def decompose_extrinsic_RT(E: torch.Tensor): + """ + Decompose the standard extrinsic matrix into RT. + Batched I/O. + """ + return E[:, :3, :] + + +def get_normalized_camera_intrinsics(intrinsics: torch.Tensor): + """ + intrinsics: (N, 3, 2), [[fx, fy], [cx, cy], [width, height]] + Return batched fx, fy, cx, cy + """ + fx, fy = intrinsics[:, 0, 0], intrinsics[:, 0, 1] + cx, cy = intrinsics[:, 1, 0], intrinsics[:, 1, 1] + width, height = intrinsics[:, 2, 0], intrinsics[:, 2, 1] + fx, fy = fx / width, fy / height + cx, cy = cx / width, cy / height + return fx, fy, cx, cy + + +def build_camera_principle(RT: torch.Tensor, intrinsics: torch.Tensor): + """ + RT: (N, 3, 4) + intrinsics: (N, 3, 2), [[fx, fy], [cx, cy], [width, height]] + """ + fx, fy, cx, cy = get_normalized_camera_intrinsics(intrinsics) + return torch.cat([ + RT.reshape(-1, 12), + fx.unsqueeze(-1), fy.unsqueeze(-1), cx.unsqueeze(-1), cy.unsqueeze(-1), + ], dim=-1) + + +def build_camera_standard(RT: torch.Tensor, intrinsics: torch.Tensor): + """ + RT: (N, 3, 4) + intrinsics: (N, 3, 2), [[fx, fy], [cx, cy], [width, height]] + """ + E = compose_extrinsic_RT(RT) + fx, fy, cx, cy = get_normalized_camera_intrinsics(intrinsics) + I = torch.stack([ + torch.stack([fx, torch.zeros_like(fx), cx], dim=-1), + torch.stack([torch.zeros_like(fy), fy, cy], dim=-1), + torch.tensor([[0, 0, 1]], dtype=torch.float32, device=RT.device).repeat(RT.shape[0], 1), + ], dim=1) + return torch.cat([ + E.reshape(-1, 16), + I.reshape(-1, 9), + ], dim=-1) + + +def center_looking_at_camera_pose(camera_position: torch.Tensor, look_at: torch.Tensor = None, up_world: torch.Tensor = None): + """ + camera_position: (M, 3) + look_at: (3) + up_world: (3) + return: (M, 3, 4) + """ + # by default, looking at the origin and world up is pos-z + if look_at is None: + look_at = torch.tensor([0, 0, 0], dtype=torch.float32) + if up_world is None: + up_world = torch.tensor([0, 0, 1], dtype=torch.float32) + look_at = look_at.unsqueeze(0).repeat(camera_position.shape[0], 1) + up_world = up_world.unsqueeze(0).repeat(camera_position.shape[0], 1) + + z_axis = camera_position - look_at + z_axis = z_axis / z_axis.norm(dim=-1, keepdim=True) + x_axis = torch.cross(up_world, z_axis) + x_axis = x_axis / x_axis.norm(dim=-1, keepdim=True) + y_axis = torch.cross(z_axis, x_axis) + y_axis = y_axis / y_axis.norm(dim=-1, keepdim=True) + extrinsics = torch.stack([x_axis, y_axis, z_axis, camera_position], dim=-1) + return extrinsics + +def get_surrounding_views(M, radius, elevation): +# convert spherical coordinates (radius, azimuth, elevation) to Cartesian coordinates (x, y, z). + camera_positions = [] + rand_theta= np.random.uniform(0, np.pi/180) + elevation = math.radians(elevation) + for i in range(M): + theta = 2 * math.pi * i / M + rand_theta + x = radius * math.cos(theta) * math.cos(elevation) + y = radius * math.sin(theta) * math.cos(elevation) + z = radius * math.sin(elevation) + camera_positions.append([x, y, z]) + camera_positions = torch.tensor(camera_positions, dtype=torch.float32) + extrinsics = center_looking_at_camera_pose(camera_positions) + + return extrinsics diff --git a/lrm/inferrer.py b/lrm/inferrer.py new file mode 100644 index 0000000000000000000000000000000000000000..b9a0b39ea60b0f33e261feceab2b62ca924508bb --- /dev/null +++ b/lrm/inferrer.py @@ -0,0 +1,232 @@ +import torch +import math +import os +import imageio +import mcubes +import trimesh +import numpy as np +import argparse +from torchvision.utils import save_image +from PIL import Image +import glob +from .models.generator import LRMGenerator # Make sure this import is correct +from .cam_utils import build_camera_principle, build_camera_standard, center_looking_at_camera_pose # Make sure this import is correct +from functools import partial +from rembg import remove, new_session +from kiui.op import recenter +import kiui + +class LRMInferrer: + def __init__(self, model_name: str, resume: str): + print("Initializing LRMInferrer") + self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') + _model_kwargs = {'camera_embed_dim': 1024, 'rendering_samples_per_ray': 128, 'transformer_dim': 1024, 'transformer_layers': 16, 'transformer_heads': 16, 'triplane_low_res': 32, 'triplane_high_res': 64, 'triplane_dim': 80, 'encoder_freeze': False} + + self.model = self._build_model(_model_kwargs).eval().to(self.device) + checkpoint = torch.load(resume, map_location='cpu') + state_dict = checkpoint['model_state_dict'] + self.model.load_state_dict(state_dict) + del checkpoint, state_dict + torch.cuda.empty_cache() + + def __enter__(self): + print("Entering context") + return self + + def __exit__(self, exc_type, exc_val, exc_tb): + print("Exiting context") + if exc_type: + print(f"Exception type: {exc_type}") + print(f"Exception value: {exc_val}") + print(f"Traceback: {exc_tb}") + + def _build_model(self, model_kwargs): + print("Building model") + model = LRMGenerator(**model_kwargs).to(self.device) + print("Loaded model from checkpoint") + return model + + @staticmethod + def get_surrounding_views(M, radius, elevation): + camera_positions = [] + rand_theta = np.random.uniform(0, np.pi/180) + elevation = math.radians(elevation) + for i in range(M): + theta = 2 * math.pi * i / M + rand_theta + x = radius * math.cos(theta) * math.cos(elevation) + y = radius * math.sin(theta) * math.cos(elevation) + z = radius * math.sin(elevation) + camera_positions.append([x, y, z]) + camera_positions = torch.tensor(camera_positions, dtype=torch.float32) + extrinsics = center_looking_at_camera_pose(camera_positions) + return extrinsics + + @staticmethod + def _default_intrinsics(): + fx = fy = 384 + cx = cy = 256 + w = h = 512 + intrinsics = torch.tensor([ + [fx, fy], + [cx, cy], + [w, h], + ], dtype=torch.float32) + return intrinsics + + def _default_source_camera(self, batch_size: int = 1): + dist_to_center = 1.5 + canonical_camera_extrinsics = torch.tensor([[ + [0, 0, 1, 1], + [1, 0, 0, 0], + [0, 1, 0, 0], + ]], dtype=torch.float32) + canonical_camera_intrinsics = self._default_intrinsics().unsqueeze(0) + source_camera = build_camera_principle(canonical_camera_extrinsics, canonical_camera_intrinsics) + return source_camera.repeat(batch_size, 1) + + def _default_render_cameras(self, batch_size: int = 1): + render_camera_extrinsics = self.get_surrounding_views(160, 1.5, 0) + render_camera_intrinsics = self._default_intrinsics().unsqueeze(0).repeat(render_camera_extrinsics.shape[0], 1, 1) + render_cameras = build_camera_standard(render_camera_extrinsics, render_camera_intrinsics) + return render_cameras.unsqueeze(0).repeat(batch_size, 1, 1) + + @staticmethod + def images_to_video(images, output_path, fps, verbose=False): + os.makedirs(os.path.dirname(output_path), exist_ok=True) + frames = [] + for i in range(images.shape[0]): + frame = (images[i].permute(1, 2, 0).cpu().numpy() * 255).astype(np.uint8) + assert frame.shape[0] == images.shape[2] and frame.shape[1] == images.shape[3], \ + f"Frame shape mismatch: {frame.shape} vs {images.shape}" + assert frame.min() >= 0 and frame.max() <= 255, \ + f"Frame value out of range: {frame.min()} ~ {frame.max()}" + frames.append(frame) + imageio.mimwrite(output_path, np.stack(frames), fps=fps) + if verbose: + print(f"Saved video to {output_path}") + + def infer_single(self, image: torch.Tensor, render_size: int, mesh_size: int, export_video: bool, export_mesh: bool): + print("infer_single called") + mesh_thres = 1.0 + chunk_size = 2 + batch_size = 1 + + source_camera = self._default_source_camera(batch_size).to(self.device) + render_cameras = self._default_render_cameras(batch_size).to(self.device) + + with torch.no_grad(): + planes = self.model.forward(image, source_camera) + results = {} + + if export_video: + print("Starting export_video") + frames = [] + for i in range(0, render_cameras.shape[1], chunk_size): + print(f"Processing chunk {i} to {i + chunk_size}") + frames.append( + self.model.synthesizer( + planes, + render_cameras[:, i:i+chunk_size], + render_size, + render_size, + 0, + 0 + ) + ) + frames = { + k: torch.cat([r[k] for r in frames], dim=1) + for k in frames[0].keys() + } + results.update({ + 'frames': frames, + }) + print("Finished export_video") + + if export_mesh: + print("Starting export_mesh") + grid_out = self.model.synthesizer.forward_grid( + planes=planes, + grid_size=mesh_size, + ) + vtx, faces = mcubes.marching_cubes(grid_out['sigma'].float().squeeze(0).squeeze(-1).cpu().numpy(), mesh_thres) + vtx = vtx / (mesh_size - 1) * 2 - 1 + vtx_tensor = torch.tensor(vtx, dtype=torch.float32, device=self.device).unsqueeze(0) + vtx_colors = self.model.synthesizer.forward_points(planes, vtx_tensor)['rgb'].float().squeeze(0).cpu().numpy() + vtx_colors = (vtx_colors * 255).astype(np.uint8) + mesh = trimesh.Trimesh(vertices=vtx, faces=faces, vertex_colors=vtx_colors) + results.update({ + 'mesh': mesh, + }) + print("Finished export_mesh") + + return results + + def infer(self, source_image: str, dump_path: str, source_size: int, render_size: int, mesh_size: int, export_video: bool, export_mesh: bool): + print("infer called") + session = new_session("isnet-general-use") + rembg_remove = partial(remove, session=session) + image_name = os.path.basename(source_image) + uid = image_name.split('.')[0] + + image = kiui.read_image(source_image, mode='uint8') + image = rembg_remove(image) + mask = rembg_remove(image, only_mask=True) + image = recenter(image, mask, border_ratio=0.20) + os.makedirs(dump_path, exist_ok=True) + + image = torch.tensor(np.array(image)).permute(2, 0, 1).unsqueeze(0) / 255.0 + if image.shape[1] == 4: + image = image[:, :3, ...] * image[:, 3:, ...] + (1 - image[:, 3:, ...]) + image = torch.nn.functional.interpolate(image, size=(source_size, source_size), mode='bicubic', align_corners=True) + image = torch.clamp(image, 0, 1) + save_image(image, os.path.join(dump_path, f'{uid}.png')) + + results = self.infer_single( + image.cuda(), + render_size=render_size, + mesh_size=mesh_size, + export_video=export_video, + export_mesh=export_mesh, + ) + + if 'frames' in results: + renderings = results['frames'] + for k, v in renderings.items(): + if k == 'images_rgb': + self.images_to_video( + v[0], + os.path.join(dump_path, f'{uid}.mp4'), + fps=40, + ) + print(f"Export video success to {dump_path}") + + if 'mesh' in results: + mesh = results['mesh'] + mesh.export(os.path.join(dump_path, f'{uid}.obj'), 'obj') + +if __name__ == '__main__': + parser = argparse.ArgumentParser() + parser.add_argument('--model_name', type=str, default='lrm-base-obj-v1') + parser.add_argument('--source_path', type=str, default='./assets/cat.png') + parser.add_argument('--dump_path', type=str, default='./results/single_image') + parser.add_argument('--source_size', type=int, default=512) + parser.add_argument('--render_size', type=int, default=384) + parser.add_argument('--mesh_size', type=int, default=512) + parser.add_argument('--export_video', action='store_true') + parser.add_argument('--export_mesh', action='store_true') + parser.add_argument('--resume', type=str, required=True, help='Path to a checkpoint to resume training from') + args = parser.parse_args() + + with LRMInferrer(model_name=args.model_name, resume=args.resume) as inferrer: + with torch.autocast(device_type="cuda", cache_enabled=False, dtype=torch.float32): + print("Start inference for image:", args.source_path) + inferrer.infer( + source_image=args.source_path, + dump_path=args.dump_path, + source_size=args.source_size, + render_size=args.render_size, + mesh_size=args.mesh_size, + export_video=args.export_video, + export_mesh=args.export_mesh, + ) + print("Finished inference for image:", args.source_path) diff --git a/lrm/models/__init__.py b/lrm/models/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..75ec8635435eb80f60bbe4cfe48c7c3239b3466e --- /dev/null +++ b/lrm/models/__init__.py @@ -0,0 +1,5 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. \ No newline at end of file diff --git a/lrm/models/encoders/__init__.py b/lrm/models/encoders/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..75ec8635435eb80f60bbe4cfe48c7c3239b3466e --- /dev/null +++ b/lrm/models/encoders/__init__.py @@ -0,0 +1,5 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. \ No newline at end of file diff --git a/lrm/models/encoders/dino_wrapper2.py b/lrm/models/encoders/dino_wrapper2.py new file mode 100644 index 0000000000000000000000000000000000000000..0930568aaeae919551686c361c1446217f3892e8 --- /dev/null +++ b/lrm/models/encoders/dino_wrapper2.py @@ -0,0 +1,51 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + + +import torch.nn as nn +from transformers import ViTImageProcessor, ViTModel, AutoImageProcessor, AutoModel, Dinov2Model + +class DinoWrapper(nn.Module): + """ + Dino v1 wrapper using huggingface transformer implementation. + """ + def __init__(self, model_name: str, freeze: bool = True): + super().__init__() + self.model, self.processor = self._build_dino(model_name) + if freeze: + self._freeze() + + def forward(self, image): + # image: [N, C, H, W], on cpu + # RGB image with [0,1] scale and properly sized + inputs = self.processor(images=image.float(), return_tensors="pt", do_rescale=False, do_resize=False).to(self.model.device) + # This resampling of positional embedding uses bicubic interpolation + outputs = self.model(**inputs) + last_hidden_states = outputs.last_hidden_state + return last_hidden_states + + def _freeze(self): + print(f"======== Freezing DinoWrapper ========") + self.model.eval() + for name, param in self.model.named_parameters(): + param.requires_grad = False + + @staticmethod + def _build_dino(model_name: str, proxy_error_retries: int = 3, proxy_error_cooldown: int = 5): + import requests + try: + processor = AutoImageProcessor.from_pretrained('facebook/dinov2-base') + processor.do_center_crop = False + model = AutoModel.from_pretrained('facebook/dinov2-base') + return model, processor + except requests.exceptions.ProxyError as err: + if proxy_error_retries > 0: + print(f"Huggingface ProxyError: Retrying in {proxy_error_cooldown} seconds...") + import time + time.sleep(proxy_error_cooldown) + return DinoWrapper._build_dino(model_name, proxy_error_retries - 1, proxy_error_cooldown) + else: + raise err diff --git a/lrm/models/generator.py b/lrm/models/generator.py new file mode 100644 index 0000000000000000000000000000000000000000..e2bafb574a05ca5f380e8b509fd915faddd40607 --- /dev/null +++ b/lrm/models/generator.py @@ -0,0 +1,87 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + + +import torch.nn as nn + +from .encoders.dino_wrapper2 import DinoWrapper +from .transformer import TriplaneTransformer +from .rendering.synthesizer_part import TriplaneSynthesizer + + +class CameraEmbedder(nn.Module): + """ + Embed camera features to a high-dimensional vector. + + Reference: + DiT: https://github.com/facebookresearch/DiT/blob/main/models.py#L27 + """ + def __init__(self, raw_dim: int, embed_dim: int): + super().__init__() + self.mlp = nn.Sequential( + nn.Linear(raw_dim, embed_dim), + nn.SiLU(), + nn.Linear(embed_dim, embed_dim), + ) + + def forward(self, x): + return self.mlp(x) + + +class LRMGenerator(nn.Module): + """ + Full model of the large reconstruction model. + """ + def __init__(self, camera_embed_dim: int, rendering_samples_per_ray: int, + transformer_dim: int, transformer_layers: int, transformer_heads: int, + triplane_low_res: int, triplane_high_res: int, triplane_dim: int, + encoder_freeze: bool = True, encoder_model_name: str = 'facebook/dinov2-base', encoder_feat_dim: int = 768): + super().__init__() + + # attributes + self.encoder_feat_dim = encoder_feat_dim + self.camera_embed_dim = camera_embed_dim + + # modules + self.encoder = DinoWrapper( + model_name=encoder_model_name, + freeze=encoder_freeze, + ) + self.camera_embedder = CameraEmbedder( + raw_dim=12+4, embed_dim=camera_embed_dim, + ) + self.transformer = TriplaneTransformer( + inner_dim=transformer_dim, num_layers=transformer_layers, num_heads=transformer_heads, + image_feat_dim=encoder_feat_dim, + camera_embed_dim=camera_embed_dim, + triplane_low_res=triplane_low_res, triplane_high_res=triplane_high_res, triplane_dim=triplane_dim, + ) + self.synthesizer = TriplaneSynthesizer( + triplane_dim=triplane_dim, samples_per_ray=rendering_samples_per_ray, + ) + + def forward(self, image, camera): + # image: [N, C_img, H_img, W_img] + # camera: [N, D_cam_raw] + assert image.shape[0] == camera.shape[0], "Batch size mismatch for image and camera" + N = image.shape[0] + + # encode image + image_feats = self.encoder(image) + assert image_feats.shape[-1] == self.encoder_feat_dim, \ + f"Feature dimension mismatch: {image_feats.shape[-1]} vs {self.encoder_feat_dim}" + + # embed camera + camera_embeddings = self.camera_embedder(camera) + assert camera_embeddings.shape[-1] == self.camera_embed_dim, \ + f"Feature dimension mismatch: {camera_embeddings.shape[-1]} vs {self.camera_embed_dim}" + + # transformer generating planes + planes = self.transformer(image_feats, camera_embeddings) + assert planes.shape[0] == N, "Batch size mismatch for planes" + assert planes.shape[1] == 3, "Planes should have 3 channels" + return planes + diff --git a/lrm/models/rendering/__init__.py b/lrm/models/rendering/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..75ec8635435eb80f60bbe4cfe48c7c3239b3466e --- /dev/null +++ b/lrm/models/rendering/__init__.py @@ -0,0 +1,5 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. \ No newline at end of file diff --git a/lrm/models/rendering/synthesizer_part.py b/lrm/models/rendering/synthesizer_part.py new file mode 100644 index 0000000000000000000000000000000000000000..96f1c9b10e7304a09b919d8baee72beb8d1f70c1 --- /dev/null +++ b/lrm/models/rendering/synthesizer_part.py @@ -0,0 +1,194 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. + + +import itertools +import torch +import torch.nn as nn + +from .utils.renderer import ImportanceRenderer +from .utils.ray_sampler_part import RaySampler + + +class OSGDecoder(nn.Module): + """ + Triplane decoder that gives RGB and sigma values from sampled features. + Using ReLU here instead of Softplus in the original implementation. + + Reference: + EG3D: https://github.com/NVlabs/eg3d/blob/main/eg3d/training/triplane.py#L112 + """ + def __init__(self, n_features: int, + hidden_dim: int = 64, num_layers: int = 4, activation: nn.Module = nn.ReLU): + super().__init__() + self.net = nn.Sequential( + nn.Linear(3 * n_features, hidden_dim), + activation(), + *itertools.chain(*[[ + nn.Linear(hidden_dim, hidden_dim), + activation(), + ] for _ in range(num_layers - 2)]), + nn.Linear(hidden_dim, 1 + 3), + ) + # init all bias to zero + for m in self.modules(): + if isinstance(m, nn.Linear): + nn.init.zeros_(m.bias) + + def forward(self, sampled_features, ray_directions): + # Aggregate features by mean + # sampled_features = sampled_features.mean(1) + # Aggregate features by concatenation + _N, n_planes, _M, _C = sampled_features.shape + sampled_features = sampled_features.permute(0, 2, 1, 3).reshape(_N, _M, n_planes*_C) + x = sampled_features + + N, M, C = x.shape + x = x.contiguous().view(N*M, C) + + x = self.net(x) + x = x.view(N, M, -1) + rgb = torch.sigmoid(x[..., 1:])*(1 + 2*0.001) - 0.001 # Uses sigmoid clamping from MipNeRF + sigma = x[..., 0:1] + + return {'rgb': rgb, 'sigma': sigma} + + +class TriplaneSynthesizer(nn.Module): + """ + Synthesizer that renders a triplane volume with planes and a camera. + + Reference: + EG3D: https://github.com/NVlabs/eg3d/blob/main/eg3d/training/triplane.py#L19 + """ + + DEFAULT_RENDERING_KWARGS = { + 'ray_start': 'auto', + 'ray_end': 'auto', + 'box_warp': 2., + 'white_back': True, + 'disparity_space_sampling': False, + 'clamp_mode': 'softplus', + 'sampler_bbox_min': -1., + 'sampler_bbox_max': 1., + } + + def __init__(self, triplane_dim: int, samples_per_ray: int): + super().__init__() + + # attributes + self.triplane_dim = triplane_dim + self.rendering_kwargs = { + **self.DEFAULT_RENDERING_KWARGS, + 'depth_resolution': samples_per_ray // 2, + 'depth_resolution_importance': samples_per_ray // 2, + } + + # renderings + self.renderer = ImportanceRenderer() + self.ray_sampler = RaySampler() + + # modules + self.decoder = OSGDecoder(n_features=triplane_dim) + + def forward(self, planes, cameras, render_size: int, crop_size: int, start_x: int, start_y:int): + # planes: (N, 3, D', H', W') + # cameras: (N, M, D_cam) + # render_size: int + assert planes.shape[0] == cameras.shape[0], "Batch size mismatch for planes and cameras" + N, M = cameras.shape[:2] + cam2world_matrix = cameras[..., :16].view(N, M, 4, 4) + intrinsics = cameras[..., 16:25].view(N, M, 3, 3) + + # Create a batch of rays for volume rendering + ray_origins, ray_directions = self.ray_sampler( + cam2world_matrix=cam2world_matrix.reshape(-1, 4, 4), + intrinsics=intrinsics.reshape(-1, 3, 3), + render_size=render_size, + crop_size = crop_size, + start_x = start_x, + start_y = start_y + ) + assert N*M == ray_origins.shape[0], "Batch size mismatch for ray_origins" + assert ray_origins.dim() == 3, "ray_origins should be 3-dimensional" + # Perform volume rendering + rgb_samples, depth_samples, weights_samples = self.renderer( + planes.repeat_interleave(M, dim=0), self.decoder, ray_origins, ray_directions, self.rendering_kwargs, + ) + + # Reshape into 'raw' neural-rendered image + Himg = Wimg = crop_size + rgb_images = rgb_samples.permute(0, 2, 1).reshape(N, M, rgb_samples.shape[-1], Himg, Wimg).contiguous() + depth_images = depth_samples.permute(0, 2, 1).reshape(N, M, 1, Himg, Wimg) + weight_images = weights_samples.permute(0, 2, 1).reshape(N, M, 1, Himg, Wimg) + + return { + 'images_rgb': rgb_images, + 'images_depth': depth_images, + 'images_weight': weight_images, + } + + def forward_grid(self, planes, grid_size: int, aabb: torch.Tensor = None): + # planes: (N, 3, D', H', W') + # grid_size: int + # aabb: (N, 2, 3) + if aabb is None: + aabb = torch.tensor([ + [self.rendering_kwargs['sampler_bbox_min']] * 3, + [self.rendering_kwargs['sampler_bbox_max']] * 3, + ], device=planes.device, dtype=planes.dtype).unsqueeze(0).repeat(planes.shape[0], 1, 1) + assert planes.shape[0] == aabb.shape[0], "Batch size mismatch for planes and aabb" + N = planes.shape[0] + + # create grid points for triplane query + grid_points = [] + for i in range(N): + grid_points.append(torch.stack(torch.meshgrid( + torch.linspace(aabb[i, 0, 0], aabb[i, 1, 0], grid_size, device=planes.device), + torch.linspace(aabb[i, 0, 1], aabb[i, 1, 1], grid_size, device=planes.device), + torch.linspace(aabb[i, 0, 2], aabb[i, 1, 2], grid_size, device=planes.device), + indexing='ij', + ), dim=-1).reshape(-1, 3)) + cube_grid = torch.stack(grid_points, dim=0).to(planes.device) + + features = self.forward_points(planes, cube_grid) + + # reshape into grid + features = { + k: v.reshape(N, grid_size, grid_size, grid_size, -1) + for k, v in features.items() + } + return features + + def forward_points(self, planes, points: torch.Tensor, chunk_size: int = 2**20): + # planes: (N, 3, D', H', W') + # points: (N, P, 3) + N, P = points.shape[:2] + + # query triplane in chunks + outs = [] + for i in range(0, points.shape[1], chunk_size): + chunk_points = points[:, i:i+chunk_size] + + # query triplane + chunk_out = self.renderer.run_model_activated( + planes=planes, + decoder=self.decoder, + sample_coordinates=chunk_points, + sample_directions=torch.zeros_like(chunk_points), + options=self.rendering_kwargs, + ) + outs.append(chunk_out) + + # concatenate the outputs + point_features = { + k: torch.cat([out[k] for out in outs], dim=1) + for k in outs[0].keys() + } + + sig = point_features['sigma'] + print(sig.mean(), sig.max(), sig.min()) + return point_features diff --git a/lrm/models/rendering/utils/__init__.py b/lrm/models/rendering/utils/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..b433b3b1000654825bbfca2b174b3f04bfedacda --- /dev/null +++ b/lrm/models/rendering/utils/__init__.py @@ -0,0 +1,14 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. +# SPDX-FileCopyrightText: Copyright (c) 2021-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: LicenseRef-NvidiaProprietary +# +# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual +# property and proprietary rights in and to this material, related +# documentation and any modifications thereto. Any use, reproduction, +# disclosure or distribution of this material and related documentation +# without an express license agreement from NVIDIA CORPORATION or +# its affiliates is strictly prohibited. diff --git a/lrm/models/rendering/utils/__pycache__/__init__.cpython-310.pyc b/lrm/models/rendering/utils/__pycache__/__init__.cpython-310.pyc new file mode 100644 index 0000000000000000000000000000000000000000..dafbaa5915ad722f43ccf61d08dafde1813645b1 Binary files /dev/null and b/lrm/models/rendering/utils/__pycache__/__init__.cpython-310.pyc differ diff --git a/lrm/models/rendering/utils/__pycache__/math_utils.cpython-310.pyc b/lrm/models/rendering/utils/__pycache__/math_utils.cpython-310.pyc new file mode 100644 index 0000000000000000000000000000000000000000..56f52be43ba55fe9c3114f19138b743b3b634f0e Binary files /dev/null and b/lrm/models/rendering/utils/__pycache__/math_utils.cpython-310.pyc differ diff --git a/lrm/models/rendering/utils/__pycache__/ray_marcher.cpython-310.pyc b/lrm/models/rendering/utils/__pycache__/ray_marcher.cpython-310.pyc new file mode 100644 index 0000000000000000000000000000000000000000..f9d185a23b791933262b745305713bf8fc9550f4 Binary files /dev/null and b/lrm/models/rendering/utils/__pycache__/ray_marcher.cpython-310.pyc differ diff --git a/lrm/models/rendering/utils/__pycache__/ray_sampler_part.cpython-310.pyc b/lrm/models/rendering/utils/__pycache__/ray_sampler_part.cpython-310.pyc new file mode 100644 index 0000000000000000000000000000000000000000..254a8f8ce8274d2f7c9e796f52a49d942596631f Binary files /dev/null and b/lrm/models/rendering/utils/__pycache__/ray_sampler_part.cpython-310.pyc differ diff --git a/lrm/models/rendering/utils/__pycache__/renderer.cpython-310.pyc b/lrm/models/rendering/utils/__pycache__/renderer.cpython-310.pyc new file mode 100644 index 0000000000000000000000000000000000000000..6188a0528e0513db931db9f27d8999a13ece3078 Binary files /dev/null and b/lrm/models/rendering/utils/__pycache__/renderer.cpython-310.pyc differ diff --git a/lrm/models/rendering/utils/math_utils.py b/lrm/models/rendering/utils/math_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..94c00e0cc6ff65817c71a336c0cdb694b0ff5b39 --- /dev/null +++ b/lrm/models/rendering/utils/math_utils.py @@ -0,0 +1,123 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. +# MIT License + +# Copyright (c) 2022 Petr Kellnhofer + +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: + +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. + +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. + +import torch + +def transform_vectors(matrix: torch.Tensor, vectors4: torch.Tensor) -> torch.Tensor: + """ + Left-multiplies MxM @ NxM. Returns NxM. + """ + res = torch.matmul(vectors4, matrix.T) + return res + + +def normalize_vecs(vectors: torch.Tensor) -> torch.Tensor: + """ + Normalize vector lengths. + """ + return vectors / (torch.norm(vectors, dim=-1, keepdim=True)) + +def torch_dot(x: torch.Tensor, y: torch.Tensor): + """ + Dot product of two tensors. + """ + return (x * y).sum(-1) + + +def get_ray_limits_box(rays_o: torch.Tensor, rays_d: torch.Tensor, box_side_length): + """ + Author: Petr Kellnhofer + Intersects rays with the [-1, 1] NDC volume. + Returns min and max distance of entry. + Returns -1 for no intersection. + https://www.scratchapixel.com/lessons/3d-basic-rendering/minimal-ray-tracer-rendering-simple-shapes/ray-box-intersection + """ + o_shape = rays_o.shape + rays_o = rays_o.detach().reshape(-1, 3) + rays_d = rays_d.detach().reshape(-1, 3) + + + bb_min = [-1*(box_side_length/2), -1*(box_side_length/2), -1*(box_side_length/2)] + bb_max = [1*(box_side_length/2), 1*(box_side_length/2), 1*(box_side_length/2)] + bounds = torch.tensor([bb_min, bb_max], dtype=rays_o.dtype, device=rays_o.device) + is_valid = torch.ones(rays_o.shape[:-1], dtype=bool, device=rays_o.device) + + # Precompute inverse for stability. + invdir = 1 / rays_d + sign = (invdir < 0).long() + + # Intersect with YZ plane. + tmin = (bounds.index_select(0, sign[..., 0])[..., 0] - rays_o[..., 0]) * invdir[..., 0] + tmax = (bounds.index_select(0, 1 - sign[..., 0])[..., 0] - rays_o[..., 0]) * invdir[..., 0] + + # Intersect with XZ plane. + tymin = (bounds.index_select(0, sign[..., 1])[..., 1] - rays_o[..., 1]) * invdir[..., 1] + tymax = (bounds.index_select(0, 1 - sign[..., 1])[..., 1] - rays_o[..., 1]) * invdir[..., 1] + + # Resolve parallel rays. + is_valid[torch.logical_or(tmin > tymax, tymin > tmax)] = False + + # Use the shortest intersection. + tmin = torch.max(tmin, tymin) + tmax = torch.min(tmax, tymax) + + # Intersect with XY plane. + tzmin = (bounds.index_select(0, sign[..., 2])[..., 2] - rays_o[..., 2]) * invdir[..., 2] + tzmax = (bounds.index_select(0, 1 - sign[..., 2])[..., 2] - rays_o[..., 2]) * invdir[..., 2] + + # Resolve parallel rays. + is_valid[torch.logical_or(tmin > tzmax, tzmin > tmax)] = False + + # Use the shortest intersection. + tmin = torch.max(tmin, tzmin) + tmax = torch.min(tmax, tzmax) + + # Mark invalid. + tmin[torch.logical_not(is_valid)] = -1 + tmax[torch.logical_not(is_valid)] = -2 + + return tmin.reshape(*o_shape[:-1], 1), tmax.reshape(*o_shape[:-1], 1) + + +def linspace(start: torch.Tensor, stop: torch.Tensor, num: int): + """ + Creates a tensor of shape [num, *start.shape] whose values are evenly spaced from start to end, inclusive. + Replicates but the multi-dimensional bahaviour of numpy.linspace in PyTorch. + """ + # create a tensor of 'num' steps from 0 to 1 + steps = torch.arange(num, dtype=torch.float32, device=start.device) / (num - 1) + + # reshape the 'steps' tensor to [-1, *([1]*start.ndim)] to allow for broadcastings + # - using 'steps.reshape([-1, *([1]*start.ndim)])' would be nice here but torchscript + # "cannot statically infer the expected size of a list in this contex", hence the code below + for i in range(start.ndim): + steps = steps.unsqueeze(-1) + + # the output starts at 'start' and increments until 'stop' in each dimension + out = start[None] + steps * (stop - start)[None] + + return out diff --git a/lrm/models/rendering/utils/ray_marcher.py b/lrm/models/rendering/utils/ray_marcher.py new file mode 100644 index 0000000000000000000000000000000000000000..31cb6300b7f5dea91317524e54301b3f167f3311 --- /dev/null +++ b/lrm/models/rendering/utils/ray_marcher.py @@ -0,0 +1,73 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. +# SPDX-FileCopyrightText: Copyright (c) 2021-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: LicenseRef-NvidiaProprietary +# +# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual +# property and proprietary rights in and to this material, related +# documentation and any modifications thereto. Any use, reproduction, +# disclosure or distribution of this material and related documentation +# without an express license agreement from NVIDIA CORPORATION or +# its affiliates is strictly prohibited. +# +# Modified by Zexin He +# The modifications are subject to the same license as the original. + + +""" +The ray marcher takes the raw output of the implicit representation and uses the volume rendering equation to produce composited colors and depths. +Based off of the implementation in MipNeRF (this one doesn't do any cone tracing though!) +""" + +import torch +import torch.nn as nn + + +class MipRayMarcher2(nn.Module): + def __init__(self, activation_factory): + super().__init__() + self.activation_factory = activation_factory + + def run_forward(self, colors, densities, depths, rendering_options): + + deltas = depths[:, :, 1:] - depths[:, :, :-1] + colors_mid = (colors[:, :, :-1] + colors[:, :, 1:]) / 2 + densities_mid = (densities[:, :, :-1] + densities[:, :, 1:]) / 2 + depths_mid = (depths[:, :, :-1] + depths[:, :, 1:]) / 2 + + + + # using factory mode for better usability + densities_mid = self.activation_factory(rendering_options)(densities_mid) + + density_delta = densities_mid * deltas + + alpha = 1 - torch.exp(-density_delta) + + alpha_shifted = torch.cat([torch.ones_like(alpha[:, :, :1]), 1-alpha + 1e-10], -2) + weights = alpha * torch.cumprod(alpha_shifted, -2)[:, :, :-1] + + composite_rgb = torch.sum(weights * colors_mid, -2) + weight_total = weights.sum(2) + composite_depth = torch.sum(weights * depths_mid, -2) / weight_total + + # clip the composite to min/max range of depths + composite_depth = torch.nan_to_num(composite_depth, float('inf')) + composite_depth = torch.clamp(composite_depth, torch.min(depths), torch.max(depths)) + + if rendering_options.get('white_back', False): + composite_rgb = composite_rgb + 1 - weight_total + + # rendered value scale is 0-1, comment out original mipnerf scaling + # composite_rgb = composite_rgb * 2 - 1 # Scale to (-1, 1) + + return composite_rgb, composite_depth, weights + + + def forward(self, colors, densities, depths, rendering_options): + composite_rgb, composite_depth, weights = self.run_forward(colors, densities, depths, rendering_options) + + return composite_rgb, composite_depth, weights diff --git a/lrm/models/rendering/utils/ray_sampler_part.py b/lrm/models/rendering/utils/ray_sampler_part.py new file mode 100644 index 0000000000000000000000000000000000000000..2ba8a9676ac41ae79bb0aaae4e9988fe8c37b547 --- /dev/null +++ b/lrm/models/rendering/utils/ray_sampler_part.py @@ -0,0 +1,94 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. +# SPDX-FileCopyrightText: Copyright (c) 2021-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: LicenseRef-NvidiaProprietary +# +# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual +# property and proprietary rights in and to this material, related +# documentation and any modifications thereto. Any use, reproduction, +# disclosure or distribution of this material and related documentation +# without an express license agreement from NVIDIA CORPORATION or +# its affiliates is strictly prohibited. +# +# Modified by Zexin He +# The modifications are subject to the same license as the original. + + +""" +The ray sampler is a module that takes in camera matrices and resolution and batches of rays. +Expects cam2world matrices that use the OpenCV camera coordinate system conventions. +""" + +import torch + +class RaySampler(torch.nn.Module): + def __init__(self): + super().__init__() + self.ray_origins_h, self.ray_directions, self.depths, self.image_coords, self.rendering_options = None, None, None, None, None + + + def forward(self, cam2world_matrix, intrinsics, render_size, crop_size, start_x, start_y): + """ + Create batches of rays and return origins and directions. + + cam2world_matrix: (N, 4, 4) + intrinsics: (N, 3, 3) + render_size: int + + ray_origins: (N, M, 3) + ray_dirs: (N, M, 2) + """ + + N, M = cam2world_matrix.shape[0], crop_size**2 + cam_locs_world = cam2world_matrix[:, :3, 3] + fx = intrinsics[:, 0, 0] + fy = intrinsics[:, 1, 1] + cx = intrinsics[:, 0, 2] + cy = intrinsics[:, 1, 2] + sk = intrinsics[:, 0, 1] + + uv = torch.stack(torch.meshgrid( + torch.arange(render_size, dtype=torch.float32, device=cam2world_matrix.device), + torch.arange(render_size, dtype=torch.float32, device=cam2world_matrix.device), + indexing='ij', + )) + if crop_size < render_size: + patch_uv = [] + for i in range(cam2world_matrix.shape[0]): + patch_uv.append(uv.clone()[None, :, start_y:start_y+crop_size, start_x:start_x+crop_size]) + uv = torch.cat(patch_uv, 0) + uv = uv.flip(1).reshape(cam2world_matrix.shape[0], 2, -1).transpose(2, 1) + else: + uv = uv.flip(0).reshape(2, -1).transpose(1, 0) + uv = uv.unsqueeze(0).repeat(cam2world_matrix.shape[0], 1, 1) + # uv = uv.unsqueeze(0).repeat(cam2world_matrix.shape[0], 1, 1) + # uv = uv.flip(1).reshape(cam2world_matrix.shape[0], 2, -1).transpose(2, 1) + x_cam = uv[:, :, 0].view(N, -1) * (1./render_size) + (0.5/render_size) + y_cam = uv[:, :, 1].view(N, -1) * (1./render_size) + (0.5/render_size) + z_cam = torch.ones((N, M), device=cam2world_matrix.device) + + x_lift = (x_cam - cx.unsqueeze(-1) + cy.unsqueeze(-1)*sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)*y_cam/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z_cam + y_lift = (y_cam - cy.unsqueeze(-1)) / fy.unsqueeze(-1) * z_cam + + cam_rel_points = torch.stack((x_lift, y_lift, z_cam, torch.ones_like(z_cam)), dim=-1).float() + + _opencv2blender = torch.tensor([ + [1, 0, 0, 0], + [0, -1, 0, 0], + [0, 0, -1, 0], + [0, 0, 0, 1], + ], dtype=torch.float32, device=cam2world_matrix.device).unsqueeze(0).repeat(N, 1, 1) + + # added float here + cam2world_matrix = torch.bmm(cam2world_matrix.float(), _opencv2blender.float()) + + world_rel_points = torch.bmm(cam2world_matrix.float(), cam_rel_points.permute(0, 2, 1)).permute(0, 2, 1)[:, :, :3] + + ray_dirs = world_rel_points - cam_locs_world[:, None, :] + ray_dirs = torch.nn.functional.normalize(ray_dirs, dim=2) + + ray_origins = cam_locs_world.unsqueeze(1).repeat(1, ray_dirs.shape[1], 1) + return ray_origins, ray_dirs \ No newline at end of file diff --git a/lrm/models/rendering/utils/renderer.py b/lrm/models/rendering/utils/renderer.py new file mode 100644 index 0000000000000000000000000000000000000000..0606e9273d4f26f02fa47e5735da58b4b5d25d9b --- /dev/null +++ b/lrm/models/rendering/utils/renderer.py @@ -0,0 +1,314 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the license found in the +# LICENSE file in the root directory of this source tree. +# SPDX-FileCopyrightText: Copyright (c) 2021-2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: LicenseRef-NvidiaProprietary +# +# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual +# property and proprietary rights in and to this material, related +# documentation and any modifications thereto. Any use, reproduction, +# disclosure or distribution of this material and related documentation +# without an express license agreement from NVIDIA CORPORATION or +# its affiliates is strictly prohibited. +# +# Modified by Zexin He +# The modifications are subject to the same license as the original. + + +""" +The renderer is a module that takes in rays, decides where to sample along each +ray, and computes pixel colors using the volume rendering equation. +""" + +import torch +import torch.nn as nn +import torch.nn.functional as F + +from .ray_marcher import MipRayMarcher2 +from . import math_utils + +def generate_planes(): + """ + Defines planes by the three vectors that form the "axes" of the + plane. Should work with arbitrary number of planes and planes of + arbitrary orientation. + + Bugfix reference: https://github.com/NVlabs/eg3d/issues/67 + """ + return torch.tensor([[[1, 0, 0], + [0, 1, 0], + [0, 0, 1]], + [[1, 0, 0], + [0, 0, 1], + [0, 1, 0]], + [[0, 0, 1], + [0, 1, 0], + [1, 0, 0]]], dtype=torch.float32) + +def project_onto_planes(planes, coordinates): + """ + Does a projection of a 3D point onto a batch of 2D planes, + returning 2D plane coordinates. + + Takes plane axes of shape n_planes, 3, 3 + # Takes coordinates of shape N, M, 3 + # returns projections of shape N*n_planes, M, 2 + """ + N, M, C = coordinates.shape + n_planes, _, _ = planes.shape + coordinates = coordinates.unsqueeze(1).expand(-1, n_planes, -1, -1).reshape(N*n_planes, M, 3) + inv_planes = torch.linalg.inv(planes).unsqueeze(0).expand(N, -1, -1, -1).reshape(N*n_planes, 3, 3) + coordinates = coordinates.to(inv_planes.device) + projections = torch.bmm(coordinates, inv_planes) + return projections[..., :2] + +def sample_from_planes(plane_axes, plane_features, coordinates, mode='bilinear', padding_mode='zeros', box_warp=None): + assert padding_mode == 'zeros' + N, n_planes, C, H, W = plane_features.shape + _, M, _ = coordinates.shape + plane_features = plane_features.view(N*n_planes, C, H, W) + + coordinates = (2/box_warp) * coordinates # add specific box bounds + # half added here + projected_coordinates = project_onto_planes(plane_axes, coordinates).unsqueeze(1) + # removed float from projected_coordinates + output_features = torch.nn.functional.grid_sample(plane_features.float(), projected_coordinates.float(), mode=mode, padding_mode=padding_mode, align_corners=False).permute(0, 3, 2, 1).reshape(N, n_planes, M, C) + return output_features + +def sample_from_3dgrid(grid, coordinates): + """ + Expects coordinates in shape (batch_size, num_points_per_batch, 3) + Expects grid in shape (1, channels, H, W, D) + (Also works if grid has batch size) + Returns sampled features of shape (batch_size, num_points_per_batch, feature_channels) + """ + batch_size, n_coords, n_dims = coordinates.shape + sampled_features = torch.nn.functional.grid_sample(grid.expand(batch_size, -1, -1, -1, -1), + coordinates.reshape(batch_size, 1, 1, -1, n_dims), + mode='bilinear', padding_mode='zeros', align_corners=False) + N, C, H, W, D = sampled_features.shape + sampled_features = sampled_features.permute(0, 4, 3, 2, 1).reshape(N, H*W*D, C) + return sampled_features + +class ImportanceRenderer(torch.nn.Module): + """ + Modified original version to filter out-of-box samples as TensoRF does. + + Reference: + TensoRF: https://github.com/apchenstu/TensoRF/blob/main/models/tensorBase.py#L277 + """ + def __init__(self): + super().__init__() + self.activation_factory = self._build_activation_factory() + self.ray_marcher = MipRayMarcher2(self.activation_factory) + self.plane_axes = generate_planes() + + def _build_activation_factory(self): + def activation_factory(options: dict): + if options['clamp_mode'] == 'softplus': + return lambda x: F.softplus(x - 1) # activation bias of -1 makes things initialize better + else: + assert False, "Renderer only supports `clamp_mode`=`softplus`!" + return activation_factory + + def _forward_pass(self, depths: torch.Tensor, ray_directions: torch.Tensor, ray_origins: torch.Tensor, + planes: torch.Tensor, decoder: nn.Module, rendering_options: dict): + """ + Additional filtering is applied to filter out-of-box samples. + Modifications made by Zexin He. + """ + + # context related variables + batch_size, num_rays, samples_per_ray, _ = depths.shape + device = planes.device + depths = depths.to(device) + ray_directions = ray_directions.to(device) + ray_origins = ray_origins.to(device) + # define sample points with depths + sample_directions = ray_directions.unsqueeze(-2).expand(-1, -1, samples_per_ray, -1).reshape(batch_size, -1, 3) + sample_coordinates = (ray_origins.unsqueeze(-2) + depths * ray_directions.unsqueeze(-2)).reshape(batch_size, -1, 3) + + # filter out-of-box samples + mask_inbox = \ + (rendering_options['sampler_bbox_min'] <= sample_coordinates) & \ + (sample_coordinates <= rendering_options['sampler_bbox_max']) + mask_inbox = mask_inbox.all(-1) + + # forward model according to all samples + _out = self.run_model(planes, decoder, sample_coordinates, sample_directions, rendering_options) + + # set out-of-box samples to zeros(rgb) & -inf(sigma) + SAFE_GUARD = 3 + DATA_TYPE = _out['sigma'].dtype + colors_pass = torch.zeros(batch_size, num_rays * samples_per_ray, 3, device=device, dtype=DATA_TYPE) + densities_pass = torch.nan_to_num(torch.full((batch_size, num_rays * samples_per_ray, 1), -float('inf'), device=device, dtype=DATA_TYPE)) / SAFE_GUARD + colors_pass[mask_inbox], densities_pass[mask_inbox] = _out['rgb'][mask_inbox], _out['sigma'][mask_inbox] + + # reshape back + colors_pass = colors_pass.reshape(batch_size, num_rays, samples_per_ray, colors_pass.shape[-1]) + densities_pass = densities_pass.reshape(batch_size, num_rays, samples_per_ray, densities_pass.shape[-1]) + + return colors_pass, densities_pass + + def forward(self, planes, decoder, ray_origins, ray_directions, rendering_options): + # self.plane_axes = self.plane_axes.to(ray_origins.device) + + if rendering_options['ray_start'] == rendering_options['ray_end'] == 'auto': + ray_start, ray_end = math_utils.get_ray_limits_box(ray_origins, ray_directions, box_side_length=rendering_options['box_warp']) + is_ray_valid = ray_end > ray_start + if torch.any(is_ray_valid).item(): + ray_start[~is_ray_valid] = ray_start[is_ray_valid].min() + ray_end[~is_ray_valid] = ray_start[is_ray_valid].max() + depths_coarse = self.sample_stratified(ray_origins, ray_start, ray_end, rendering_options['depth_resolution'], rendering_options['disparity_space_sampling']) + else: + # Create stratified depth samples + depths_coarse = self.sample_stratified(ray_origins, rendering_options['ray_start'], rendering_options['ray_end'], rendering_options['depth_resolution'], rendering_options['disparity_space_sampling']) + + depths_coarse = depths_coarse.to(planes.device) + + # Coarse Pass + colors_coarse, densities_coarse = self._forward_pass( + depths=depths_coarse, ray_directions=ray_directions, ray_origins=ray_origins, + planes=planes, decoder=decoder, rendering_options=rendering_options) + + # Fine Pass + N_importance = rendering_options['depth_resolution_importance'] + if N_importance > 0: + _, _, weights = self.ray_marcher(colors_coarse, densities_coarse, depths_coarse, rendering_options) + + depths_fine = self.sample_importance(depths_coarse, weights, N_importance) + + colors_fine, densities_fine = self._forward_pass( + depths=depths_fine, ray_directions=ray_directions, ray_origins=ray_origins, + planes=planes, decoder=decoder, rendering_options=rendering_options) + + all_depths, all_colors, all_densities = self.unify_samples(depths_coarse, colors_coarse, densities_coarse, + depths_fine, colors_fine, densities_fine) + + # Aggregate + rgb_final, depth_final, weights = self.ray_marcher(all_colors, all_densities, all_depths, rendering_options) + else: + rgb_final, depth_final, weights = self.ray_marcher(colors_coarse, densities_coarse, depths_coarse, rendering_options) + + return rgb_final, depth_final, weights.sum(2) + + def run_model(self, planes, decoder, sample_coordinates, sample_directions, options): + plane_axes = self.plane_axes.to(planes.device) + sampled_features = sample_from_planes(plane_axes, planes, sample_coordinates, padding_mode='zeros', box_warp=options['box_warp']) + + out = decoder(sampled_features, sample_directions) + if options.get('density_noise', 0) > 0: + out['sigma'] += torch.randn_like(out['sigma']) * options['density_noise'] + return out + + def run_model_activated(self, planes, decoder, sample_coordinates, sample_directions, options): + out = self.run_model(planes, decoder, sample_coordinates, sample_directions, options) + out['sigma'] = self.activation_factory(options)(out['sigma']) + return out + + def sort_samples(self, all_depths, all_colors, all_densities): + _, indices = torch.sort(all_depths, dim=-2) + all_depths = torch.gather(all_depths, -2, indices) + all_colors = torch.gather(all_colors, -2, indices.expand(-1, -1, -1, all_colors.shape[-1])) + all_densities = torch.gather(all_densities, -2, indices.expand(-1, -1, -1, 1)) + return all_depths, all_colors, all_densities + + def unify_samples(self, depths1, colors1, densities1, depths2, colors2, densities2): + all_depths = torch.cat([depths1, depths2], dim = -2) + all_colors = torch.cat([colors1, colors2], dim = -2) + all_densities = torch.cat([densities1, densities2], dim = -2) + + _, indices = torch.sort(all_depths, dim=-2) + all_depths = torch.gather(all_depths, -2, indices) + all_colors = torch.gather(all_colors, -2, indices.expand(-1, -1, -1, all_colors.shape[-1])) + all_densities = torch.gather(all_densities, -2, indices.expand(-1, -1, -1, 1)) + + return all_depths, all_colors, all_densities + + def sample_stratified(self, ray_origins, ray_start, ray_end, depth_resolution, disparity_space_sampling=False): + """ + Return depths of approximately uniformly spaced samples along rays. + """ + N, M, _ = ray_origins.shape + if disparity_space_sampling: + depths_coarse = torch.linspace(0, + 1, + depth_resolution, + device=ray_origins.device).reshape(1, 1, depth_resolution, 1).repeat(N, M, 1, 1) + depth_delta = 1/(depth_resolution - 1) + depths_coarse += torch.rand_like(depths_coarse) * depth_delta + depths_coarse = 1./(1./ray_start * (1. - depths_coarse) + 1./ray_end * depths_coarse) + else: + if type(ray_start) == torch.Tensor: + depths_coarse = math_utils.linspace(ray_start, ray_end, depth_resolution).permute(1,2,0,3) + depth_delta = (ray_end - ray_start) / (depth_resolution - 1) + depths_coarse += torch.rand_like(depths_coarse) * depth_delta[..., None] + else: + depths_coarse = torch.linspace(ray_start, ray_end, depth_resolution, device=ray_origins.device).reshape(1, 1, depth_resolution, 1).repeat(N, M, 1, 1) + depth_delta = (ray_end - ray_start)/(depth_resolution - 1) + depths_coarse += torch.rand_like(depths_coarse) * depth_delta + + return depths_coarse + + def sample_importance(self, z_vals, weights, N_importance): + """ + Return depths of importance sampled points along rays. See NeRF importance sampling for more. + """ + with torch.no_grad(): + batch_size, num_rays, samples_per_ray, _ = z_vals.shape + + z_vals = z_vals.reshape(batch_size * num_rays, samples_per_ray) + weights = weights.reshape(batch_size * num_rays, -1) # -1 to account for loss of 1 sample in MipRayMarcher + + # smooth weights + weights = torch.nn.functional.max_pool1d(weights.unsqueeze(1).float(), 2, 1, padding=1) + weights = torch.nn.functional.avg_pool1d(weights, 2, 1).squeeze() + weights = weights + 0.01 + + z_vals_mid = 0.5 * (z_vals[: ,:-1] + z_vals[: ,1:]) + importance_z_vals = self.sample_pdf(z_vals_mid, weights[:, 1:-1], + N_importance).detach().reshape(batch_size, num_rays, N_importance, 1) + return importance_z_vals + + def sample_pdf(self, bins, weights, N_importance, det=False, eps=1e-5): + """ + Sample @N_importance samples from @bins with distribution defined by @weights. + Inputs: + bins: (N_rays, N_samples_+1) where N_samples_ is "the number of coarse samples per ray - 2" + weights: (N_rays, N_samples_) + N_importance: the number of samples to draw from the distribution + det: deterministic or not + eps: a small number to prevent division by zero + Outputs: + samples: the sampled samples + """ + N_rays, N_samples_ = weights.shape + weights = weights + eps # prevent division by zero (don't do inplace op!) + pdf = weights / torch.sum(weights, -1, keepdim=True) # (N_rays, N_samples_) + cdf = torch.cumsum(pdf, -1) # (N_rays, N_samples), cumulative distribution function + cdf = torch.cat([torch.zeros_like(cdf[: ,:1]), cdf], -1) # (N_rays, N_samples_+1) + # padded to 0~1 inclusive + + if det: + u = torch.linspace(0, 1, N_importance, device=bins.device) + u = u.expand(N_rays, N_importance) + else: + u = torch.rand(N_rays, N_importance, device=bins.device) + u = u.contiguous() + + inds = torch.searchsorted(cdf, u, right=True) + below = torch.clamp_min(inds-1, 0) + above = torch.clamp_max(inds, N_samples_) + + inds_sampled = torch.stack([below, above], -1).view(N_rays, 2*N_importance) + cdf_g = torch.gather(cdf, 1, inds_sampled).view(N_rays, N_importance, 2) + bins_g = torch.gather(bins, 1, inds_sampled).view(N_rays, N_importance, 2) + + denom = cdf_g[...,1]-cdf_g[...,0] + denom[denomindhw', x) # [3, N, D, H, W] + x = x.contiguous().view(3*N, -1, H, W) # [3*N, D, H, W] + x = self.deconv(x) # [3*N, D', H', W'] + x = x.view(3, N, *x.shape[-3:]) # [3, N, D', H', W'] + x = torch.einsum('indhw->nidhw', x) # [N, 3, D', H', W'] + x = x.contiguous() + + assert self.triplane_high_res == x.shape[-2], \ + f"Output triplane resolution does not match with expected: {x.shape[-2]} vs {self.triplane_high_res}" + assert self.triplane_dim == x.shape[-3], \ + f"Output triplane dimension does not match with expected: {x.shape[-3]} vs {self.triplane_dim}" + + return x diff --git a/modeling.py b/modeling.py new file mode 100644 index 0000000000000000000000000000000000000000..8d43ccd646a94f943fb351a18a1ea2b7b7b2290a --- /dev/null +++ b/modeling.py @@ -0,0 +1,84 @@ + +#### modeling.py +import torch.nn as nn +from transformers import PreTrainedModel, PretrainedConfig +import torch +# import dinowrapper +from lrm.models.encoders.dino_wrapper2 import DinoWrapper +from lrm.models.transformer import TriplaneTransformer +from lrm.models.rendering.synthesizer_part import TriplaneSynthesizer + +class CameraEmbedder(nn.Module): + def __init__(self, raw_dim: int, embed_dim: int): + super().__init__() + self.mlp = nn.Sequential( + nn.Linear(raw_dim, embed_dim), + nn.SiLU(), + nn.Linear(embed_dim, embed_dim), + ) + + def forward(self, x): + return self.mlp(x) + +class LRMGeneratorConfig(PretrainedConfig): + model_type = "lrm_generator" + + def __init__(self, **kwargs): + super().__init__(**kwargs) + self.camera_embed_dim = kwargs.get("camera_embed_dim", 1024) + self.rendering_samples_per_ray = kwargs.get("rendering_samples_per_ray", 128) + self.transformer_dim = kwargs.get("transformer_dim", 1024) + self.transformer_layers = kwargs.get("transformer_layers", 16) + self.transformer_heads = kwargs.get("transformer_heads", 16) + self.triplane_low_res = kwargs.get("triplane_low_res", 32) + self.triplane_high_res = kwargs.get("triplane_high_res", 64) + self.triplane_dim = kwargs.get("triplane_dim", 80) + self.encoder_freeze = kwargs.get("encoder_freeze", False) + self.encoder_model_name = kwargs.get("encoder_model_name", 'facebook/dinov2-base') + self.encoder_feat_dim = kwargs.get("encoder_feat_dim", 768) + +class LRMGenerator(PreTrainedModel): + config_class = LRMGeneratorConfig + + def __init__(self, config: LRMGeneratorConfig): + super().__init__(config) + + self.encoder_feat_dim = config.encoder_feat_dim + self.camera_embed_dim = config.camera_embed_dim + + self.encoder = DinoWrapper( + model_name=config.encoder_model_name, + freeze=config.encoder_freeze, + ) + self.camera_embedder = CameraEmbedder( + raw_dim=12 + 4, embed_dim=config.camera_embed_dim, + ) + self.transformer = TriplaneTransformer( + inner_dim=config.transformer_dim, num_layers=config.transformer_layers, num_heads=config.transformer_heads, + image_feat_dim=config.encoder_feat_dim, + camera_embed_dim=config.camera_embed_dim, + triplane_low_res=config.triplane_low_res, triplane_high_res=config.triplane_high_res, triplane_dim=config.triplane_dim, + ) + self.synthesizer = TriplaneSynthesizer( + triplane_dim=config.triplane_dim, samples_per_ray=config.rendering_samples_per_ray, + ) + + def forward(self, image, camera): + assert image.shape[0] == camera.shape[0], "Batch size mismatch" + N = image.shape[0] + + # encode image + image_feats = self.encoder(image) + assert image_feats.shape[-1] == self.encoder_feat_dim, \ + f"Feature dimension mismatch: {image_feats.shape[-1]} vs {self.encoder_feat_dim}" + + # embed camera + camera_embeddings = self.camera_embedder(camera) + assert camera_embeddings.shape[-1] == self.camera_embed_dim, \ + f"Feature dimension mismatch: {camera_embeddings.shape[-1]} vs {self.camera_embed_dim}" + + # transformer generating planes + planes = self.transformer(image_feats, camera_embeddings) + assert planes.shape[0] == N, "Batch size mismatch for planes" + assert planes.shape[1] == 3, "Planes should have 3 channels" + return planes diff --git a/results/40_prompt_images_provided/A DSLR photo of Sydney Opera House.mp4 b/results/40_prompt_images_provided/A DSLR photo of Sydney Opera House.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..8adcb328ab57364cefdd458aa85baa637bc6df47 Binary files /dev/null and b/results/40_prompt_images_provided/A DSLR photo of Sydney Opera House.mp4 differ diff --git a/results/40_prompt_images_provided/A crab, low poly.mp4 b/results/40_prompt_images_provided/A crab, low poly.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..5771e6d9b27b4436f5151b9db1daae0fd9fb3744 Binary files /dev/null and b/results/40_prompt_images_provided/A crab, low poly.mp4 differ diff --git a/results/40_prompt_images_provided/A product photo of a toy tank.mp4 b/results/40_prompt_images_provided/A product photo of a toy tank.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..3138d34535646273108aabb1d299d4d1254c6d0e Binary files /dev/null and b/results/40_prompt_images_provided/A product photo of a toy tank.mp4 differ diff --git a/results/40_prompt_images_provided/A statue of angel, blender.mp4 b/results/40_prompt_images_provided/A statue of angel, blender.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..2611eb2370cd42faefe7a02b6d306c0d8a1e66ea Binary files /dev/null and b/results/40_prompt_images_provided/A statue of angel, blender.mp4 differ diff --git a/results/40_prompt_images_provided/Daenerys Targaryen from game of throne.mp4 b/results/40_prompt_images_provided/Daenerys Targaryen from game of throne.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..48ce5f814b058114fe77ed0462ca1ba250afb394 Binary files /dev/null and b/results/40_prompt_images_provided/Daenerys Targaryen from game of throne.mp4 differ diff --git a/results/40_prompt_images_provided/Darth Vader helmet,g highly detailed.mp4 b/results/40_prompt_images_provided/Darth Vader helmet,g highly detailed.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..2516120dba0faa363772fc7c0591864e38403e39 Binary files /dev/null and b/results/40_prompt_images_provided/Darth Vader helmet,g highly detailed.mp4 differ diff --git a/results/40_prompt_images_provided/Fisherman House, cute, cartoon, blender, stylized.mp4 b/results/40_prompt_images_provided/Fisherman House, cute, cartoon, blender, stylized.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..294154b62f170ef56594ec86c6e4d0b060fda0d8 Binary files /dev/null and b/results/40_prompt_images_provided/Fisherman House, cute, cartoon, blender, stylized.mp4 differ diff --git a/results/40_prompt_images_provided/Handpainted watercolor windmill, hand-painted.mp4 b/results/40_prompt_images_provided/Handpainted watercolor windmill, hand-painted.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..44182e80f87a67deea461ba42adec44582b76978 Binary files /dev/null and b/results/40_prompt_images_provided/Handpainted watercolor windmill, hand-painted.mp4 differ diff --git a/results/40_prompt_images_provided/Little italian town, hand-painted style.mp4 b/results/40_prompt_images_provided/Little italian town, hand-painted style.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..9bd5aaef103bb6eff82b6f13357040e36febd03e Binary files /dev/null and b/results/40_prompt_images_provided/Little italian town, hand-painted style.mp4 differ diff --git a/results/40_prompt_images_provided/Mr Bean Cartoon doing a T Pose.mp4 b/results/40_prompt_images_provided/Mr Bean Cartoon doing a T Pose.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..7e4aa8baf05e4206736b4472a3d940c60786e4d2 Binary files /dev/null and b/results/40_prompt_images_provided/Mr Bean Cartoon doing a T Pose.mp4 differ diff --git a/results/40_prompt_images_provided/Pikachu with hat.mp4 b/results/40_prompt_images_provided/Pikachu with hat.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..6db03526136313d4a8552fce8848aab212ca214b Binary files /dev/null and b/results/40_prompt_images_provided/Pikachu with hat.mp4 differ diff --git a/results/40_prompt_images_provided/Samurai koala bear.mp4 b/results/40_prompt_images_provided/Samurai koala bear.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..c42760abd477c8347478dd49eb264ced31fa76d2 Binary files /dev/null and b/results/40_prompt_images_provided/Samurai koala bear.mp4 differ diff --git a/results/40_prompt_images_provided/a DSLR photo of a ghost eating a hamburger.mp4 b/results/40_prompt_images_provided/a DSLR photo of a ghost eating a hamburger.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..02a6055ca33e74e68ac814397ee0eba1e065438c Binary files /dev/null and b/results/40_prompt_images_provided/a DSLR photo of a ghost eating a hamburger.mp4 differ diff --git a/results/40_prompt_images_provided/a DSLR photo of a squirrel playing guitar.mp4 b/results/40_prompt_images_provided/a DSLR photo of a squirrel playing guitar.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..b52f07e9f8eef35e6766bfe30f62b29cb2e5e31c Binary files /dev/null and b/results/40_prompt_images_provided/a DSLR photo of a squirrel playing guitar.mp4 differ diff --git a/results/40_prompt_images_provided/animal skull pile.mp4 b/results/40_prompt_images_provided/animal skull pile.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..51da03dec699558b3ae1c809e40764f359e72dfd Binary files /dev/null and b/results/40_prompt_images_provided/animal skull pile.mp4 differ diff --git a/results/40_prompt_images_provided/army Jacket, 3D scan.mp4 b/results/40_prompt_images_provided/army Jacket, 3D scan.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..f17bacdc11879b18194bb756309b0f4c73e9b2c2 Binary files /dev/null and b/results/40_prompt_images_provided/army Jacket, 3D scan.mp4 differ diff --git a/results/40_prompt_images_provided/beautiful, intricate butterfly.mp4 b/results/40_prompt_images_provided/beautiful, intricate butterfly.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..16492a9e59feedd32f121b8065891103a7a6ae2e Binary files /dev/null and b/results/40_prompt_images_provided/beautiful, intricate butterfly.mp4 differ diff --git a/results/40_prompt_images_provided/girl riding wolf, cute, cartoon, blender.mp4 b/results/40_prompt_images_provided/girl riding wolf, cute, cartoon, blender.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..a94bdf89d1f89f036f5f3ccdc9e88773abf821f7 Binary files /dev/null and b/results/40_prompt_images_provided/girl riding wolf, cute, cartoon, blender.mp4 differ diff --git a/results/40_prompt_images_provided/mecha vampire girl chibi.mp4 b/results/40_prompt_images_provided/mecha vampire girl chibi.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..ee3d0865f91f158d59819e468565597ba723e5de Binary files /dev/null and b/results/40_prompt_images_provided/mecha vampire girl chibi.mp4 differ diff --git a/results/40_prompt_images_provided/military Mech, future, scifi.mp4 b/results/40_prompt_images_provided/military Mech, future, scifi.mp4 new file mode 100644 index 0000000000000000000000000000000000000000..1f8345e64a6f001b5b79e028fa182f1ee79dcfd7 Binary files /dev/null and b/results/40_prompt_images_provided/military Mech, future, scifi.mp4 differ