University of Cambridge Researchers Introduce a Dataset of 50,000 Synthetic and Photorealistic Foot Images along with a Novel AI Library for Foot

The health, fashion, and fitness industries are highly interested in the difficult computer vision problem of 3D reconstructing human body parts from pictures. They tackle the issue of reconstructing a human foot in this study. Accurate foot models are useful for shoe shopping, orthotics, and personal health monitoring, and the idea of recovering a 3D foot model from pictures has become highly attractive as the digital market for these businesses grows. There are four types of existing foot reconstruction solutions: Costly scanning apparatus is one method reconstruction of noisy point clouds, using depth maps or phone-based sensors like a TrueDepth camera, is another Structure from Motion (SfM) it is followed by Multi-View Stereo (MVS) and generative foot models are fitted to picture silhouettes is a fourth method. 

They conclude that none of these options is adequate for precise scanning in a domestic setting: Most people cannot afford expensive scanning equipment; phone-based sensors are not widely available or user-friendly; noisy point clouds are challenging to utilize for activities that come after, such rendering and measuring; Additionally, foot generative models have been low quality and restrictive, and using only silhouettes from images limits the amount of geometrical information that can be obtained from the images, which is especially problematic in a few-view setting. SfM depends on many input views to match dense features between images, and MVS can also produce noisy point clouds. 

The insufficient availability of paired pictures and 3D ground truth data for feet for training further constrains the performance of these approaches. To do this, researchers from the University of Cambridge present FOUND, or Foot Optimisation, using Uncertain Normals for Surface Deformation. This algorithm uses uncertainties in addition to per-pixel surface normals to improve upon conventional multi-view reconstruction optimization approaches. Like, their technique needs a minimal number of input RGB photographs that have been calibrated. Despite relying just on silhouettes, which are devoid of geometric information, they use surface normals and key points as supplementary clues. They also make available a sizable collection of artificially photorealistic photos matched with ground truth labels for these kinds of signals to overcome data scarcity. 

Their main contributions are outlined below: 

• They release SynFoot, a large-scale synthetic dataset of 50,000 photorealistic foot pictures with precise silhouettes, surface normal, and keypoint labels, to aid in research on 3D foot reconstruction. Although obtaining such information on actual photos necessitates costly scanning apparatus, their dataset exhibits great scalability. They demonstrate that their synthetic dataset captures enough variance within foot pictures for downstream tasks to generalize to real images despite only having 8 real-world foot scans. Additionally, they make available an evaluation dataset consisting of 474 photos of 14 actual feet. Each matched with high-resolution 3D scans and ground-truth per-pixel surface normals. Lastly, they make known their proprietary Python library for Blender, which allows for the effective creation of large-scale synthetic datasets. 

• They show that an uncertainty-aware surface normal estimate network can generalize to actual in-wild foot pictures after training only on their synthetic data from 8 foot scans. To reduce the difference in the domain between artificial and authentic foot photos, they employ aggressive appearance and perspective augmentation. The network calculates the associated uncertainty and surface normals at each pixel. The uncertainty is helpful in two ways: first, by thresholding the uncertainty, they can obtain precise silhouettes without having to train a different network; second, by using the estimated uncertainty to weight the surface normal loss in their optimization scheme, they can increase robustness against the possibility that the predictions made in some views may not be accurate. 

• They provide an optimization strategy that uses differentiable rendering to fit a generative foot model to a series of calibrated photos with expected surface normals and key points. Their pipeline outperforms state-of-the-art photogrammetry for surface reconstruction, is uncertainty-aware, and can rebuild a watertight mesh from a limited number of views. It can also be used for data obtained from a consumer’s cell phone.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.

University of Cambridge Researchers Introduce a Dataset of 50,000 Synthetic and Photorealistic Foot Images along with a Novel AI Library for Foot Read More »

Meet CodeGPT: A New Code Generation Tool Making Waves in the AI Community

New among AI code-generating tools, CodeGPT is quickly becoming a favourite among programmers. It’s an add-on for Visual Studio Code that leverages the GPT-3 language model to produce code, translate languages, write content of various types, and answer queries.

CodeGPT is currently under development, but it has the potential to alter the way that developers code. CodeGPT’s capacity to grasp natural language is one of the features that sets it apart from other AI code-generating tools. This means that, instead of utilizing formal programming terminology, developers can instruct CodeGPT to build code based on descriptions written in natural language. Time savings like these can be substantial, especially for developers learning a new language or framework.

CodeGPT’s ability to produce efficient and idiomatic code is an additional benefit. CodeGPT has this advantage because it has been trained on a large corpus of code from actual projects. This means that CodeGPT is well-versed in the norms and standards of each programming language.

Finally, updates and enhancements to CodeGPT are released often. The CodeGPT team routinely updates the software with new functions and fixes any issues that may arise. This means that CodeGPT is always improving in various tasks, including code generation, language translation, content creation, and question answering.

Application areas for CodeGPT:

CodeGPT can automatically finish off incomplete or unclear code snippets. Especially when dealing with vast and complicated codebases, this can be a huge time-saver for engineers.

Functions, classes, and even whole programs can be generated with CodeGPT. This can be helpful for quickly producing basic code or for developing novel concepts.

Code reworking is made easier with the help of CodeGPT, which recommends cleaner, more idiomatic code constructs to programmers. It can also aid programmers in spotting and fixing common security flaws in their code.

Regarding debugging code, CodeGPT is a useful tool because it suggests possible reasons for mistakes and offers advice on how to repair them.

Finding bugs: CodeGPT can help developers uncover faults in their code by identifying potential problems and offering tests to check the accuracy of their code.

When used properly, CodeGPT is a potent tool which can improve the speed, efficiency, and quality with which programmers produce code. 

Here’s where you can get CodeGPT: https://marketplace.visualstudio.com/items?itemName=DanielSanMedium.dscodegpt&ssr=false

Mistral can be downloaded and used at https://docs.codegpt.co/docs/tutorial-ai-providers/ollama

Introducing CodeGPT, running the @MistralAI 7B model locally in VSCode on an M2 MacBook Pro 🤯In my trials, Mistral 7B surpasses Llama 2 and CodeLlama in both speed and performance.Feel like giving a try? You can try it for free! Just download the CodeGPT extension from the… pic.twitter.com/TnfqZFpTDc— Daniel San (@dani_avila7) November 4, 2023

Meet CodeGPT: A New Code Generation Tool Making Waves in the AI Community Read More »

Phind’s New AI Model Outperforms GPT-4 at Coding, with GPT-3.5-like Speed and 16k Context

In coding and technical problem-solving, a challenge has been the trade-off between speed and accuracy when seeking answers to complex questions. Developers often find themselves in need of quick and reliable assistance.

GPT-4 has often faced the issue of relatively slow response times. The delay in obtaining answers can hinder productivity.

The Phind’s v7 Model matches and surpasses the coding capabilities of GPT-4 but does so with remarkable speed. With a 5x increase in response time, the Phind Model provides high-quality answers to technical questions in just 10 seconds, a significant improvement over the 50-second wait associated with its predecessor.

The Phind Model, now in its 7th generation, is built upon the foundation of CodeLlama-34B fine-tunes, the first models to outperform GPT-4 in HumanEval scores. This new model has been fine-tuned on an impressive 70 billion tokens of high-quality code and reasoning problems. While it achieves a remarkable HumanEval score of 74.7%, it is essential to note that real-world helpfulness often transcends such metrics. Through comprehensive feedback collection and user experiences, the Phind Model has demonstrated its ability to consistently meet or exceed GPT-4’s utility in practical coding scenarios.

One of the standout features of the Phind Model is its speed. By leveraging the power of H100s and the TensorRT-LLM library from NVIDIA, it can process an impressive 100 tokens per second in a single stream, providing swift assistance to users in need. 

Additionally, the Phind Model provides a vast context, supporting up to 16,000 tokens in its responses. Currently, the model permits inputs of up to 12,000 tokens on the website, reserving the remaining 4,000 for web-based results.

While the Phind Model offers substantial benefits, it’s worth acknowledging that it still faces some areas for improvement. One notable challenge is consistency, particularly when handling complex questions. In these cases, the Phind Model may require more generations to arrive at the correct answer than GPT-4.

In conclusion, the Phind Model is a promising solution to the ongoing problem of efficient and reliable coding assistance. It combines superior coding abilities, remarkable speed, and extensive context support, all contributing to its effectiveness in providing real-world help to users. As this model continues to evolve and address its remaining challenges, it has the potential to revolutionize the way technical questions are answered, offering developers and tech enthusiasts a more efficient and productive coding experience.

Phind’s New AI Model Outperforms GPT-4 at Coding, with GPT-3.5-like Speed and 16k Context Read More »

Amazon Researchers Introduce Fortuna: An AI Library for Uncertainty Quantification in Deep Learning

The recent developments in the fields of Artificial Intelligence and Machine Learning have made everyone’s lives easier. With their incredible capabilities, AI and ML are diving into every industry and solving problems. A key component of Machine Learning is predictive uncertainty, which enables the evaluation of the accuracy of model predictions. In order to make sure that the ML systems are reliable and safe, it is important to estimate the uncertainty correctly. 

Overconfidence is a prevalent issue, particularly in the context of deep neural networks. Overconfidence is when the model predicts a certain class with a substantially higher likelihood than it really does. This can affect judgements and behaviours in the real world, which makes it a matter of concern. 

A number of approaches capable of estimating and calibrating uncertainty in ML have been developed. Among these methods are Bayesian inference, conformal prediction, and temperature scaling. Although these methods exist, putting them into practice is a challenge. Many open-source libraries provide unique implementations of particular techniques or generic probabilistic programming languages, but there is a lack of a cohesive framework supporting a broad spectrum of latest methodologies. 

To overcome these challenges, a team of researchers has presented Fortuna, an open-source uncertainty quantification library. Modern, scalable techniques are integrated into Fortuna from the literature and are made available to users via a consistent, intuitive interface. Its main objective is to make the application of sophisticated uncertainty quantification methods in regression and classification applications more straightforward.

The team has shared the two primary features of Fortuna that greatly improve deep learning uncertainty quantification.

Calibration techniques: Fortuna supports a number of tools for calibration, one of which is conformal prediction. Any pre-trained neural network can be used with conformal prediction to produce reliable uncertainty estimates. It assists in balancing the confidence scores of the model with the actual accuracy of its predictions. This is extremely helpful as it enables users to discern between instances in which the model’s predictions are dependable and those that are not. The team has shared an example of a doctor in which the doctor can get help in determining whether an AI system’s diagnosis or a self-driving car’s interpretation of its environment is reliable.

Scalable Bayesian Inference: Fortuna provides scalable Bayesian inference tools in addition to calibration procedures. Deep neural networks that are being trained from the start can be trained using these techniques. A probabilistic method called Bayesian inference enables the incorporation of uncertainty in both the model parameters and the predictions. Users can increase the overall accuracy of Fortuna as well as the model’s ability to quantify uncertainty by implementing scalable Bayesian inference. 

In conclusion, Fortuna offers a consistent framework for measuring and calibrating uncertainty in model predictions, definitely making it a useful addition to the field of Machine Learning. 

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.

Amazon Researchers Introduce Fortuna: An AI Library for Uncertainty Quantification in Deep Learning Read More »

Core42 and Cerebras Sets New Benchmark for Arabic Large Language Models with the Release of Jais 30B

Cerebras and Core42, a G42 company and the UAE-based national-scale enabler for cloud and generative AI, has announced the launch of Jais 30B, the newest and most proficient version of its open-source Arabic Large Language Model (LLM).

Jais 30B is a significant upgrade from its predecessor, Jais 13B, which was released in August 2023. The new model has 30 billion parameters, compared to 13 billion for Jais 13B, and was trained on a substantially larger dataset. This has resulted in significant improvements in language generation, summarization, and Arabic-English translation.

Jais 30B is now on par with monolingual English models and outperforms most open-source models in Foundation Model evaluations. The model is also able to generate longer and more detailed responses in both Arabic and English.

Core42 is committed to responsible and safe AI practices, and the Jais 30B development team has further enhanced its processes and policies to guardrail biases and the production of hateful or harmful content by the model.

Jais 30B is available for download on Hugging Face.

Hugging Face foundational model: https://huggingface.co/core42/jais-30b-v1Hugging Face chat model: https://huggingface.co/core42/jais-30b-chat-v1

The launch of Jais 30B is a major milestone for Core42 and the Arabic-speaking world. The model has the potential to revolutionize the way we communicate, learn, and work in Arabic.

Core42 and Cerebras Sets New Benchmark for Arabic Large Language Models with the Release of Jais 30B Read More »

Hugging Face Researchers Introduce Distil-Whisper: A Compact Speech Recognition Model Bridging the Gap in High-Performance, Low-Resource Environments

Hugging Face researchers have tackled the issue of deploying large pre-trained speech recognition models in resource-constrained environments. They accomplished this by creating a substantial open-source dataset through pseudo-labelling. The dataset was then utilised to distil a smaller version of the Whisper model, called Distil-Whisper.

The Whisper speech recognition transformer model was pre-trained on 680,000 hours of noisy internet speech data. It comprises transformer-based encoder and decoder components and achieves competitive results in a zero-shot scenario without fine-tuning. Distil-Whisper is a compact version derived through knowledge distillation using pseudo-labelling. Distil-Whisper upholds the Whisper model’s resilience in challenging acoustic conditions while mitigating hallucination errors in long-form audio. The research introduces a large-scale pseudo-labelling method for speech data, an underexplored yet promising avenue for knowledge distillation. 

Automatic Speech Recognition (ASR) systems have reached human-level accuracy but face challenges due to the growing size of pre-trained models in resource-constrained environments. The Whisper, a large pre-trained ASR model, excels in various datasets but could be more practical for low-latency deployment. While knowledge distillation has compressed NLP transformer models effectively, its use in speech recognition is underexplored. 

The proposed approach utilises pseudo-labelling to construct a sizable open-source dataset, facilitating knowledge distillation. To ensure training quality, a WER heuristic is employed for selecting optimal pseudo-labels. The knowledge distillation objective involves a combination of Kullback-Leibler divergence and pseudo-label terms, introducing a mean-square error component to align the student’s hidden layer outputs with the teacher’s. This distillation technique is applied to the Whisper model within the Seq2Seq ASR framework, ensuring uniform transcription formatting and offering sequence-level distillation guidance.

Distil-Whisper, derived from knowledge distillation, significantly enhances speed and reduces parameters compared to the original Whisper model while retaining resilience in challenging acoustic conditions. It boasts a 5.8x speedup with a 51% parameter reduction, achieving less than a 1% WER on out-of-distribution test data in a zero-shot scenario. The distil-medium.en model has a slightly higher WER but exhibits a 6.8x more immediate inference and 75% model compression. The Whisper model is susceptible to hallucination errors in long-form audio transcription, while Distil-Whisper mitigates these errors while maintaining competitive WER performance.

In conclusion, Distil-Whisper is a compact variant of the Whisper model achieved through knowledge distillation. This innovative approach yields remarkable benefits in terms of speed and parameter reduction, with Distil-Whisper being faster and having fewer parameters compared to the original Whisper model. The distil-medium.en model offers more immediate inference and substantial model compression despite a slightly higher WER. 

Future research opportunities in audio domain knowledge distillation and pseudo-labelling for compressing transformer-based models in speech recognition are promising. Investigating the effects of various filtering methods and thresholds on transcription quality and downstream model performance can offer valuable insights for optimising knowledge distillation. Exploring alternative compression techniques, including layer-based methods and using mean-square error terms, may lead to even greater model compression without sacrificing performance. The provision of training code, inference code, and models in this work can be a valuable resource for further research and experimentation in knowledge distillation for speech recognition.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.

Hugging Face Researchers Introduce Distil-Whisper: A Compact Speech Recognition Model Bridging the Gap in High-Performance, Low-Resource Environments Read More »

40+ Cool AI Tools You Should Check Out (November 2023)

DeepSwap is an AI-based tool for anyone who wants to create convincing deepfake videos and images. It is super easy to create your content by refacing videos, pictures, memes, old movies, GIFs… You name it. The app has no content restrictions, so users can upload material of any content. Besides, you can get a 50% off to be a subscribed user of the product for the first time.

Get stunning professional headshots effortlessly with Aragon. Utilize the latest in A.I. technology to create high-quality headshots of yourself in a snap! Skip the hassle of booking a photography studio or dressing up. Get your photos edited and retouched quickly, not after days. Receive 40 HD photos that will give you an edge in landing your next job.

Boost your advertising and social media game with AdCreative.ai – the ultimate Artificial Intelligence solution. Say goodbye to hours of creative work and hello to high-converting ad and social media posts generated in mere seconds. Maximize your success and minimize your effort with AdCreative.ai today.

Hostinger uses the power of a cutting-edge artificial intelligence engine to create the best AI website builder for all website owners. The builder guides you through the design process, suggesting layouts, color schemes, and content placements tailored to your needs. Embrace the freedom to customize every detail while maintaining responsive design for various devices.

Using artificial intelligence, Otter.AI empowers users with real-time transcriptions of meeting notes that are shareable, searchable, accessible, and secure. Get a meeting assistant that records audio, writes notes, automatically captures slides, and generates summaries.

Notion is aiming to increase its user base through the utilization of its advanced AI technology. Their latest feature, Notion AI, is a robust generative AI tool that assists users with tasks like note summarization, identifying action items in meetings, and creating and modifying text. Notion AI streamlines workflows by automating tedious tasks, providing suggestions, and templates to users, ultimately simplifying and improving the user experience.

Generating meaningful tests for busy devs. With CodiumAI, you get non-trivial tests (and trivial, too!) suggested right inside your IDE, so you can code smart, create more value, and stay confident when you push. With CodiumAI, developers innovate faster and with confidence, saving their time devoted to testing and analyzing code. Code, as you meant it.

Decktopus is an AI-powered presentation tool that simplifies online content creation with more than 100 customizable templates, allowing users to create professional presentations in seconds.

AI is the future, but at SaneBox, AI has been successfully powering email for the past 12 years and counting, saving the average user more than 3 hours a week on inbox management.

Promptpal AI helps users discover the best prompts to get the most out of AI models like ChatGPT.

Quinvio is an AI video creation tool that enables quick video presentations with an intuitive editor, AI assistance for writing, and an option to choose an AI spokesperson.

AskYourPdf is an AI chatbot that helps users interact with PDF documents easily and extract insights.

Supernormal is an AI-powered tool that helps users create meeting notes automatically, saving 5-10 minutes every meeting.

Suggesty is powered by GPT-3 and provides human-like answers to Google searches.

ChatGPT Sidebar is a ChatGPT Chrome extension that can be used on any website to summarize articles, explain concepts, etc.

MarcBot is a chatbot inside Telegram messenger than uses ChatGPT API, Whisper, and Amazon Polly.

Motion enables users to create chatbots that can engage as well as delight their customers across multiple channels and platforms, all at scale.

Roam Around is an AI tool powered by ChatGPT that helps users to build their travel itineraries.

Beautiful AI presentation software enables users to quickly create beautifully designed, modern slides that are professional-looking and impressive.

Quotify uses AI to identify the most relevant quotes from any text-based PDF, making it a powerful quote-finding tool.

Harvey is an AI legal advisor that helps in contract analysis, litigation, due diligence, etc.

Bearly is an AI-based tool that facilitates faster reading, writing, and content creation.

Scispace is an AI assistant that simplifies reading and understanding complex content, allowing users to highlight confusing text, ask follow-up questions, and search for relevant papers without specifying keywords.

Hints is an AI tool powered by GPT that can be integrated with any software to perform tasks on behalf of the user.

Monday.com is a cloud-based framework that allows users to build software applications and work management tools.

Base64 is a data extraction automation tool that allows users to extract text, photos, and other types of data from all documents.

AI Writer is an AI content creation platform that allows users to generate articles and blog posts within seconds.

Engage is an AI tool that augments users’ comments to engage prospects on Linkedin.

Google Duplex is an AI technology that mimics a human voice and makes phone calls on behalf of a person.

Perplexity is an AI tool that aims to answer questions accurately using large language models.

NVIDIA Canvas is an AI tool that turns simple brushstrokes into realistic landscape images.

Seenapse is a tool that allows users to generate hundreds of divergent and creative ideas.

Murf AI allows users to create studio-like voice overs within minutes.

10Web is an AI-powered WordPress platform that automates website building, hosting, and page speed boosting.

KickResume is an AI tool that allows users to create beautiful resumes quickly.

DimeADozen is an AI tool that allows users to validate their business ideas within seconds.

WavTools allows users to make high-quality music in the browser for free.

Wonder Dynamics is an AI tool that integrates computer-generated (CG) characters into real-life settings through automatic animation, lighting, and composition.

Gen-2 is a multimodal AI tool that generates videos by taking text, images, or video clips as input.

Uizard is an AI tool for designing web and mobile apps within a few minutes.

This is an AI tool that generates a color palette on the basis of an English description.

Rationale is an AI tool that assists business owners, managers, and individuals with tough decisions.

Vizology is an AI tool that provides businesses with AI-generated responses to inquiries about companies, markets, and contextual business intelligence.

PromptPerfect is a prompt optimization tool that helps to bring the most out of AI models like ChatGPT.

Numerous is an AI assistant that allows users to breeze through their busy work in Excel & Google Sheets.

Nolan is a tool that allows users to craft compelling movie scripts.

Play HT is an AI voice generator that allows users to generate realistic text-to-speech voice online.

PromptGPT allows users to improve their ChatGPT output by providing optimized prompts.

This tool allows users to enlarge and enhance their small images automatically.

Timely is an AI-powered time-tracking software that helps users to boost their productivity.

This is an IOS shortcut that replaces Siri with ChatGPT.

Don’t forget to join our 29k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any question regarding the above article or if we missed anything, feel free to email us at [email protected]

40+ Cool AI Tools You Should Check Out (November 2023) Read More »

It’s Time to define Levels of Autonomy for Digital Workers & AI Agents similar to Self-Driving Vehicles: IDWA kicks off the Process

The rapid development of AI is giving rise to an increasing number of Digital Workers, AI agents & AI agent platforms that are capable of executing tasks,  making decisions and taking actions on their own. 

In the context of self-driving vehicles, the Society of Automotive Engineers (SAE) has developed a six-level scale for defining the levels of autonomy. This scale ranges from Level 0, where the human driver is in complete control, to Level 5, where the vehicle is fully autonomous and can operate in any environment.

A similar scale could and should be developed for Digital Workers/ AI agents. This would help clarify the expectations of users and developers, and potentially define industry standards for a faster and sustainable development of this ecosystem. The IDWA – International Digital Workers Association will propose a draft of Digital Worker (Digital Employee) Autonomy Levels at their conference IDWA-Forum in Silicon Valley on November 8th.

Some of the key  benefits of defining levels of autonomy for AI agents:

Increased transparency: By making it clear what AI agents can and cannot do, we can help to build trust among users.

Improved safety: By clearly defining the capabilities of AI agents, we can help to ensure that they are used in a safe and responsible manner.

Reduced liability: By establishing clear guidelines for AI development, we can help to reduce the risk of liability for AI developers.

But this is not an easy task; some of the challenges include:

The complexity of AI: AI agents are complex systems that can be difficult to understand and predict. This makes it difficult to define clear boundaries between different levels of autonomy.

The rapid pace of AI development: The field of AI is constantly evolving, which means that any definition of levels of autonomy will need to be updated regularly.

IAWD is taking on this challenge and proposing a draft with 8 levels from level 0 ( no task automation), level 4 (autonomous task management) to level 8 ( leadership) to begin the process of defining levels of autonomy for Digital workers and AI agents. 

The IDWA-Forum is produced by Kuzma Frost.

The IDAW is led by David Yang and Michael Engel.

It’s Time to define Levels of Autonomy for Digital Workers & AI Agents similar to Self-Driving Vehicles: IDWA kicks off the Process Read More »

This AI Research Introduces PERF: The Panoramic NeRF Transforming Single Images into Explorable 3D Scenes

NeRF stands for Neural Radiance Fields, a deep learning technique for 3D scene reconstruction and view synthesis from 2D images. It typically requires multiple images or views of a scene to construct a 3D representation accurately. NeRF involves a set of pictures of a scene taken from different viewpoints. NeRF has inspired extensions and improvements, such as NeRF-W, which aim to make it more efficient, accurate, and applicable to various scenarios, including dynamic scenes and real-time applications. Its variants have had a significant impact on the fields of computer vision, computer graphics, and 3D scene reconstruction.

However, If you have a single image and want to incorporate 3D priors, you need to improve the quality of the 3D reconstruction. The present techniques limit the field of view, which greatly limits their scalability to real-world 360-degree panoramic scenarios with large sizes. Researchers present a 360-degree novel view synthesis framework called PERF. It stands for Panoramic Neural Radiance field. Their framework trains a panoramic neural radiance field from a single panorama. 

A panoramic image is created by capturing multiple images, often sequentially, and then stitching them together to form a seamless and wide-angle representation of a landscape, cityscape, or any other scene. The team proposes a collaborative RGBD inpainting method to complete RGB images and depth maps of visible regions with a trained Stable Diffusion for RGB inpainting. They also trained a monocular depth estimator for depth completion to generate novel appearances and 3D shapes that are invisible from the input panorama.

Training a panoramic neural radiance field (NeRF) from a single panorama is a challenging problem due to lack of 3D information,  large-size object occlusion, coupled problems on reconstruction and generation, and geometry conflict between visible regions and invisible regions during inpainting. To tackle these issues, PERF consists of a three-step process: 1) to obtain single view NeRF training with depth supervision; 2) to collaborate RGBD inpainting of ROI; and 3) to use progressive inpainting-and-erasing generation.

To optimize the predicted depth map of ROI and make it consistent with the global panoramic scene, they propose an inpainting-and-erasing method, which inpaints invisible regions from a random view and erases conflicted geometry regions observed from other reference views, yielding better 3D scene completion.

Researchers experimented on the Replica and PERF-in-the-wild datasets. They demonstrate that PERF achieves a new state-of-the-art single-view panoramic neural radiance field. They say  PERF can be applied to panorama-to-3D, text-to-3D, and 3D scene stylization tasks to obtain surprising results with several promising applications.

PERF significantly improves the performance of single-shot NeRF but heavily depends on the accuracy of the depth estimator and the Stable Diffusion. So, the team says that the future work will include improving the accuracy of the depth estimator and stable diffusion model. 

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on Telegram and WhatsApp.

This AI Research Introduces PERF: The Panoramic NeRF Transforming Single Images into Explorable 3D Scenes Read More »

Brave Introduces Leo: An Artificial Intelligence Assistant that can Help with All Sorts of Tasks Including Real-Time Summaries of Webpages or Videos

In a significant stride towards user privacy and accurate AI interactions, Brave, the renowned browser developer, has unveiled its native AI assistant, Leo, alongside the release of the desktop version 1.6. Powered by Meta Llama 2 as its underlying model, Leo responds to user queries based on the content of web pages visited, effectively addressing concerns related to AI-generated content.

Leo, an extension of the Brave search AI Summarizer launched earlier this year, can be accessed directly from the search bar. During the testing phase in August, through the Nightly channel (version 1.59), tens of thousands of developers and users downloaded and evaluated the browser along with Leo, leading to its official integration in Brave version 1.60.

One of Leo’s distinctive features is its commitment to user privacy. Unlike other chatbots, Leo does not collect conversations, track users, or generate responses from thin air. Instead, it relies solely on web content to provide accurate and relevant information.

The free version of Leo is based on the highly secure Llama 2 model, a specialized variant of the Meta open-source model. However, Brave has also introduced Leo Premium, a paid service priced at $15 monthly. Leo Premium comes equipped with the Claude Instant model, developed by Anthropic, which emphasizes logical reasoning and code writing. This model offers more structured responses, enhanced execution of instructions, and improved capabilities in math, programming, multilingualism, and question-response interactions.

To further enhance response accuracy, Brave has integrated Anthropic technology, leveraging Brave’s search API to train the latest Claude 2 model. This approach aids the Claude product in achieving retrieval-augmented generation (RAG), resulting in more precise responses and mitigating the generative AI’s tendency towards hallucination.

In terms of safety and privacy, Brave has taken extensive measures. In the free version, Leo’s conversations remain anonymous and private, with no recording of interactions. The data is not utilized for training models, and no account or login is required. Reverse proxy technology ensures all calls pass through anonymous servers, preventing Brave from establishing any correlation between the call and the user’s IP address.

For users opting for the Premium version of Leo, an unlinkable token is issued upon registration to secure the subscription verification process, which means that Brave cannot link usage activities with user purchase information, ensuring complete privacy. Additionally, the user’s email is solely used for subscription verification and cannot be traced back.

Looking ahead, Brave has plans to introduce additional models in the Premium version, along with improvements in network speed limits, conversation quality, and exclusive features for subscribers.

Currently available in the desktop version of Brave 1.6, Leo and Leo Premium are set to launch on Android and iOS platforms in the coming months. This innovative development marks a significant step forward in browser technology and AI integration, reaffirming Brave’s commitment to user-centric, privacy-focused innovation.

Brave Introduces Leo: An Artificial Intelligence Assistant that can Help with All Sorts of Tasks Including Real-Time Summaries of Webpages or Videos Read More »

Scroll to Top