This AI Research Presents Neural A*: A Novel Data-Driven Search Method for Path Planning Problems

Path planning identifies a cost-effective and valid path from an initial point to a target point within an environmental map. Search-based planning methods, which include the well-known A* search, are widely employed in addressing path-planning challenges. These techniques have found application in various domains, including autonomous vehicle navigation and robot arm manipulation.

Recent studies have highlighted the significant benefits of data-driven path planning in two specific scenarios. 

The first scenario involves the more efficient discovery of near-optimal paths in point-to-point shortest-path search problems compared to traditional heuristic planners. 

The second scenario pertains to enabling path planning using raw image inputs. This task is challenging for classical planners unless there is access to semantic pixel-wise labeling of the environment.

In this research, the authors have redefined the conventional A* search algorithm differently and combined it with a convolutional encoder to create a fully trainable end-to-end neural network planner. This approach, known as Neural A*, addresses path planning problems by transforming a given problem instance into a guidance map and subsequently conducting a differentiable A* search based on that map. 

The above image demonstrates two Scenarios of Path Planning with Neural A*.

Point-to-point shortest path search: finding a near-optimal path (red) with fewer node explorations (green) for an input map.

Path planning on raw image inputs: accurately predicting a human trajectory (red) on a natural image.

Through the process of learning to align search outcomes with expert-provided ground truth paths, Neural A* can generate paths that accurately and efficiently adhere to the ground truth. 

This figure shows the schematic diagram of Neural A*:

(1) A path-planning problem instance is fed to the encoder to produce a guidance map. 

(2) The differentiable A* module performs a point-to-point shortest path search with the guidance map and outputs a search history and a resulting path. 

(3) A loss between the search history and the ground-truth path is back-propagated to train the encoder. 

Comprehensive experimentation results have shown that Neural A* surpasses state-of-the-art data-driven planners, achieving a favorable balance between search optimality and efficiency.  Furthermore, Neural A* has demonstrated the capability to predict realistic human trajectories by directly applying search-based planning to natural image inputs.

Check out the Paper, Project, and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

This AI Research Presents Neural A*: A Novel Data-Driven Search Method for Path Planning Problems Read More »

SalesForce AI Research Developed ProGen: A Leap Forward in Protein Engineering Using Artificial Intelligence

The development of functional proteins has long been a critical pursuit in various scientific fields, including healthcare, biotechnology, and environmental sustainability. However, conventional approaches to protein engineering have been limited by the reliance on random mutation and natural selection, leading to challenges in precise protein design. Researchers have recognized the need for more controlled and accurate methods to generate proteins with specific properties, prompting the exploration of artificial intelligence (AI) as a potential solution to this problem.

In response to the challenges of traditional protein engineering, a research team of Salesforce introduced ProGen, an AI model specifically designed to generate protein sequences in a controlled manner. Diverging from conventional methods, ProGen leverages a comprehensive dataset of protein sequences and incorporates conditioning tags to train the model to comprehend the intricate language of proteins. By utilizing these conditioning tags, ProGen can predict the subsequent amino acids in a sequence, thereby demonstrating its potential to facilitate the design and generation of proteins with desired properties.

ProGen’s underlying methodology involves a next-token prediction mechanism similar to the predictive algorithms utilized in natural language processing. By leveraging a comprehensive set of over 100,000 conditioning tags encompassing diverse facets of protein sequences, ProGen can effectively generate novel proteins while adhering to predefined structural and functional attributes. The evaluation of ProGen’s performance highlights its remarkable proficiency in producing protein sequences that exhibit near-native structural energies, indicating potential functional viability. This capability has been exemplified through successfully generating proteins like VEGFR2 and GB1, showcasing ProGen’s ability to generate protein sequences that align with specific functional requirements.

The research team’s comprehensive analysis underscores ProGen’s capacity to accurately predict and generate protein sequences with desired properties, thus marking a significant advancement in protein engineering. By integrating cutting-edge AI technologies, ProGen enhances precision and control in protein design and offers new avenues for accelerating scientific progress in various domains such as biotechnology, pharmaceuticals, and environmental sustainability. The successful application of ProGen in generating proteins with predefined functions signifies a pivotal step toward overcoming the limitations associated with traditional protein engineering methodologies.

In conclusion, the research team’s groundbreaking work in developing ProGen represents a significant milestone in protein engineering. ProGen’s advanced capabilities in controlled protein generation demonstrate a crucial advancement in addressing the challenges posed by traditional protein engineering techniques. The successful integration of AI-driven methodologies augments the precision and control in protein design and paves the way for transformative developments across diverse scientific disciplines. 

As ProGen continues to evolve, its potential for further advancements and applications in protein engineering appears promising, offering many opportunities for groundbreaking discoveries and advancements in scientific research and development. The successful demonstration of ProGen’s capabilities holds immense promise for driving significant progress in protein engineering, opening new vistas for innovation and advancements in scientific research and development.

Check out the Reference Page and Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

SalesForce AI Research Developed ProGen: A Leap Forward in Protein Engineering Using Artificial Intelligence Read More »

This AI Paper Proposes ‘MotionDirector’: An Artificial Intelligence Approach to Customize Video Motion and Appearance

Text-to-video diffusion models have made significant advancements in recent times. Just by providing textual descriptions, users can now create either realistic or imaginative videos. These foundation models have also been tuned to generate images to match certain appearances, styles, and subjects. However, the area of customizing motion in text-to-video generation still needs to be explored. Users may want to create videos with specific motions, such as a car moving forward and then turning left. It, therefore, becomes important to adapt the diffusion models to create more specific content to cater to the users’ preferences.

The authors of this paper have proposed MotionDirector, which helps foundation models achieve motion customization while maintaining appearance diversity at the same time. The technique uses a dual-path architecture to train the models to learn the appearance and motions in the given single or multiple reference videos separately, which makes it easy to generalize the customized motion to other settings.

The dual architecture comprises both a spatial and a temporal pathway. The spatial path has a foundational model with trainable spatial LoRAs (low-rank adaptions) integrated into its transformer layers for each video. These spatial LoRAs are trained using a randomly selected single frame in each training step to capture the visual attributes of the input videos. On the contrary, the temporal pathway duplicates the foundational model, sharing the spatial LoRAs with the spatial path to adapt to the appearance of the given input video. Moreover, the temporal transformers in this pathway are enhanced with temporal LoRAs, which are trained using multiple frames from the input videos to grasp the inherent motion patterns.

Just by deploying the trained temporal LoRAs, the foundation model can synthesize videos of the learned motions with diverse appearances. The dual architecture allows the models to learn the appearance and motion of objects in videos separately. This decoupling enables MotionDirector to isolate the appearance and motion of videos and then combine them from various source videos.

The researchers compared the performance of MotionDirector on a couple of benchmarks, having more than 80 different motions and 600 text prompts. On the UCF Sports Action benchmark (with 95 videos and 72 text prompts), MotionDirector was preferred by human raters around 75% of the time for better motion fidelity. The method also outperformed the 25% preferences of base models. On the second benchmark, i.e., the LOVEU-TGVE-2023 benchmark (with 76 videos and 532 text prompts), MotionDirector performed better than other controllable generation and tuning-based methods. The results demonstrate that numerous base models can be customized using MotionDirector to produce videos characterized by diversity and the desired motion concepts.

MotionDirector is a promising new method for adapting text-to-video diffusion models to generate videos with specific motions. It excels in learning and adapting specific motions of subjects and cameras, and it can be used to generate videos with a wide range of visual styles.

One area where MotionDirector can be improved is learning the motion of multiple subjects in the reference videos. However, even with this limitation, MotionDirector has the potential to enhance flexibility in video generation, allowing users to craft videos tailored to their preferences and requirements.

Check out the Paper, Project, and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

This AI Paper Proposes ‘MotionDirector’: An Artificial Intelligence Approach to Customize Video Motion and Appearance Read More »

Mozilla Brings a Fake Review Checker AI Tool to Firefox

In the vast landscape of online shopping, discerning genuine product reviews from fabricated ones has become an increasingly arduous task. Consumers are left wondering whether they can truly rely on certain opinions, leading to a cloud of uncertainty hovering over their purchasing decisions. Addressing this critical concern, Mozilla’s Firefox has taken a monumental step by integrating a review checker into its browser, set to revolutionize the online shopping experience.

Existing solutions have attempted to combat this issue, with browser extensions like Fakespot leading the charge. Acquired by Mozilla in May, Fakespot is a specialized tool designed to detect fraudulent online reviews. Currently functional on major platforms such as Amazon, Walmart, eBay, Yelp, and TripAdvisor, it employs a grading system ranging from A to F. An A grade signifies a product with entirely reliable reviews, while a B grade indicates that the majority are trustworthy. A C grade implies a balanced mix of both reliable and unreliable feedback, while D and F grades denote products with predominantly unreliable reviews.

Notably, a lower grade does not necessarily reflect the quality of the product or service itself but rather indicates the trustworthiness of the reviews. Fakespot does not pinpoint specific fraudulent reviews but assigns an overall score to the product. The lower the grade, the higher the likelihood that the reviews are inauthentic. This vital tool is set to be seamlessly integrated into Firefox, providing users with an intrinsic means of evaluating the authenticity of reviews. The feature is currently in testing and is slated to be widely accessible by November, initially on Amazon, Best Buy, and Walmart, with additional sites to follow suit in due course.

The crux of Fakespot’s effectiveness lies in its utilization of artificial intelligence. By analyzing a multitude of data points and conducting multiple tests, Fakespot determines the integrity of a review. While the specifics of Fakespot’s algorithms remain undisclosed to prevent manipulation, the key factor is whether a review is left by a genuine customer. This innovation addresses a pervasive issue in the online shopping realm, where reviews play a pivotal role in influencing consumer decisions. Google, for instance, leverages reviews to recommend products, often leading to manipulation as companies vie for prominence.

Recent research underscores the gravity of the fake review epidemic, revealing that over 80% of shoppers have encountered fraudulent feedback online. Among the demographic of 18 to 34-year-olds, this figure surges to a staggering 92%. Fakespot, armed with its sophisticated AI-driven approach, stands as a powerful antidote to this pervasive problem.

In conclusion, Mozilla’s integration of Fakespot into Firefox represents a monumental leap towards combating the proliferation of fake reviews in online shopping. This ingenious tool harnesses the power of AI to discern genuine feedback from deceitful ones, providing users with a reliable means of evaluating products. With its widespread availability on major e-commerce platforms, Fakespot is poised to become an indispensable ally for consumers navigating the digital marketplace, ushering in an era of confidence and transparency in online shopping. As the battle against fake reviews gains a formidable ally in Firefox, consumers can finally shop with assurance and make informed choices.

Mozilla Brings a Fake Review Checker AI Tool to Firefox Read More »

Researchers from Princeton Introduce ShearedLLaMA Models for Accelerating Language Model Pre-Training via Structured Pruning

Large Language Models (LLMs) have become extremely popular because of their outstanding capabilities in a variety of natural language tasks. Though they are growing at a fast pace, the massive computational resources needed to train these models are a major drawback. Consequently, there’s been a surge in interest in creating more compact and effective LLMs, such as LLaMA, MPT, and Falcon. These medium-sized models are intended to support various use cases by providing effective inference and fine-tuning. However, training even the smallest billion-parameter LLMs from the start is prohibitively expensive for many organizations due to the significant computational resources required.

Researchers have earlier demonstrated how like moderate-sized Large Language Models (LLMs) like LLaMA, smaller language models can be just as powerful. These models are thought to be a more effective substitute for large LLMs, which need a lot of processing power to train. In a recent study, a team of researchers studied the usefulness of structured pruning as a successful technique for reducing the size of bigger, pre-trained models into smaller LLMs. This method makes use of two essential strategies, which are as follows.

Targeted Structured Pruning: It is a technique that methodically eliminates layers, heads, intermediate, and hidden dimensions from a bigger language model in order to trim it to a target configuration. Because this procedure is carried out from beginning to end, the model’s coherence and functioning are preserved. It optimizes the model without sacrificing vital language comprehension abilities.

Dynamic Batch Loading: This method modifies the training data composition within each batch according to the changing loss levels in various domains. It makes sure that the model concentrates more on tasks or domains where it isn’t performing as well as it could be dynamically modifying the data samples utilized in each batch. It may effectively adjust its performance in this way, increasing overall efficiency.

Sheared-LLaMA-1.3B and Sheared-LLaMA-2.7B, two smaller LLMs created from the pruning of an LLaMA2-7B model, show how effective this suggested procedure is. This trimming procedure only consumes 50 billion tokens, or 5% of OpenLLaMA’s pre-training budget, of the training set. Notwithstanding these drawbacks, Sheared-LLaMA-1.3B and Sheared-LLaMA-2.7B perform better on a variety of 11 typical downstream jobs than other well-known LLMs of comparable scales, such Pythia, INCITE, and OpenLLaMA. These exercises address a variety of topics, including instruction tuning for open-ended generation, reading comprehension, common sense understanding, and world knowledge.

Additional training with more tokens may also result in even bigger benefits based on the performance trajectory of the pruned models. While the current study’s trials are restricted to models with a maximum of 7 billion parameters, the LLM-shearing technique is engineered to possess great generalizability and can be expanded to encompass big language models of any magnitude in prospective investigations.

To sum up, LLM shearing provides a complete approach to LLM size reduction via dynamic batch loading and focused structured pruning. The construction of Sheared-LaMA models that perform better than equivalent-sized models in a variety of downstream tasks is an effective demonstration of it. This method demonstrates how more effectively and economically smaller but strong LLMs can be developed, and it can be used for a wide range of model sizes.

Check out the Paper, Github, and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

Researchers from Princeton Introduce ShearedLLaMA Models for Accelerating Language Model Pre-Training via Structured Pruning Read More »

Meet Universal Simulator (UniSim): An Interactive Simulator of the Real World Interaction Through Generative Modeling

Generative models have transformed content creation in text, images, and videos. The next frontier is simulating realistic experiences triggered by human and agent actions. A universal simulator, UniSim, is explored for this purpose. UniSim leverages diverse datasets, each capturing different aspects of real-world interactions. It can emulate how humans and agents interact with the world by simulating visual outcomes in response to high-level instructions and low-level controls. UniSim offers applications ranging from training embodied agents to enhancing video captioning models through simulated experience.

Researchers from UC Berkeley, Google DeepMind, MIT, and the University of Alberta tackle the challenge of developing world models for real-world interactions by expanding the success of internet-scale generative models beyond text-based tasks. While prior work has focused on generating domain-specific videos, this study pioneers the concept of universal simulators for interactive agent training. By enabling extensive environment access through these simulators, the goal is to enhance agents’ capabilities for multi-turn interactions and to benefit various agents, including vision-language planners and reinforcement learning policies.

Generative models have revolutionized content creation but need help with simulating real-world experiences. UniSim leverages diverse datasets to affect various aspects of human interaction, from high-level instructions to low-level controls. The goal is to train agents and machine intelligence models purely in simulation to achieve zero-shot transfer to real-world applications, bridging the sim-to-real gap.

UniSim utilizes datasets encompassing various aspects of real-world interaction. The datasets used cover image data with abundant objects, densely sampled actions from robotics data, and diverse movements in navigation data. UniSim learns to simulate visual outcomes based on high-level instructions and low-level controls within static scenes and objects. Their study outlines the reinforcement learning policy training process with initialization and behavioral cloning objectives. 

Their research highlights the capability of UniSim to facilitate zero-shot real-world transfer for high-level vision-language planners and low-level reinforcement learning policies trained entirely in simulation. It extends this utility to various forms of machine intelligence, including video captioning models, broadening its applications. UniSim’s generated long-horizon data significantly enhances the performance of the Vision-Language Model (VLM) policy, achieving a 3-4 times higher completion rate for long-horizon goal-conditioned tasks compared to short-horizon training data.

Their study mentions that UniSim, like other contemporary foundation models, requires significant computational resources. However, the sources must thoroughly detail specific technical methods, leading to limited insights into technical limitations. Their study needs to include a discussion on the generalizability of UniSim to diverse domains or potential biases in training datasets. Notably, it does not address ethical considerations for employing simulated experiences in machine intelligence training.

Their research demonstrates UniSim’s potential to create a universal simulator for realistic real-world interactions via generative modeling. UniSim can simulate various experiences and effectively train autonomous agents. It enables zero-shot transfer for high-level vision-language planners and low-level reinforcement learning policies. Furthermore, other machine intelligence models like video captioning benefit from UniSim training, broadening its applications. UniSim’s long-horizon data substantially enhances the performance of VLMs in goal-conditioned tasks.

Future research should enhance UniSim’s adaptability to diverse domains and address potential dataset biases. Ethical implications and unintended consequences of simulated experiences in machine training must be thoroughly explored. Detailed and comprehensive training methods for UniSim should be developed, along with a deeper understanding of its technical limitations and challenges. Alternative approaches for action-rich interaction and long-horizon rollouts in real-world simulators should also be investigated to enhance UniSim’s capabilities.

Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

Meet Universal Simulator (UniSim): An Interactive Simulator of the Real World Interaction Through Generative Modeling Read More »

Recognition and Generation of Object-State Compositions in Machine Learning Using “Chop and Learn”

The real world contains objects of varying sizes, hues, and textures. Visual qualities, often called states or attributes, can be innate to an item (such as color) or acquired through treatment (such as being cut). Current data-driven recognition models (e.g., deep networks) presuppose robust training data available for exhaustive object attributes, yet they still need help generalizing to unseen aspects of objects. However, humans and other animals have an inbuilt ability to recognize and envision a wide variety of things with different properties by piecing together a small number of known items and their states. Modern deep learning models frequently need more compositional generalization and the capacity to synthesize and detect new combinations from finite concepts.

To aid in the study of compositional generalization—the ability to recognize and produce unseen compositions of objects in different states—a group of researchers from the University of Maryland suggest a new dataset, Chop & Learn (ChopNLearn). They restrict the research to chopping fruits and vegetables to zero in on the compositional component. These items change form in recognizable ways when sliced in various ways, depending on the method of slicing used. The purpose is to examine how these different approaches to recognizing object states without direct observation can be applied to various objects. Their choice of 20 things and seven typical cutting styles (including complete object) yields varying granularity and size object state pairs.

The first task requires the system to create an image from a (object, state) composition not encountered during training. For this purpose, researchers propose modifying existing large-scale text-to-image generative models. They compare many existing approaches, including Textual Inversion and DreamBooth, by utilizing text prompts to represent the object state creation. They also suggest a different process, which involves the addition of additional tokens for objects and states in addition to the simultaneous adjustment of language and diffusion models. Finally, they evaluate the strengths and weaknesses of the proposed generative model and the existing literature.

An existing Compositional Action Recognition job is expanded upon in the second challenge. This work aims to notice small changes in object states, a key initial step for activity recognition, while the focus of past work has been on long-term activity tracking in films. The task allows the model to learn changes in object states that are not visible to the naked eye by recognizing the compositions of states at the beginning and end of the task. Using the ChopNLearn dataset, they compare several state-of-the-art baselines for video tasks. The study concludes by discussing the many image and video-related functions that could benefit from using the dataset. 

Here are some of the contributions:

The proposed ChopNLearn dataset would include photos and movies from various camera angles, representing different object-state compositions.

They offer a new activity called Compositional Image Generation to generate images for compositions of objects and states that are not currently visible to the user.

They set a new standard for Compositional Action as a whole. Recognition aims to learn and recognize how objects change over time and from diverse perspectives.

Limitations

Few-shot generalization is becoming more and more significant as foundation models become available. ChopNLearn’s potential is investigated in this work for use in studies of compositional production and identification of extremely intricate and interrelated concepts. ChopNLearn is, admittedly, a small-scale dataset with a green screen background, which limits the generalizability of models trained on it. However, this is the first attempt to learn how different objects might share common fine-grained states (cut styles). They investigate this by training and testing more complex models using ChopNLearn, then using the same tool to fine-tune those models against and without a green screen. Further, they anticipate that the community will benefit from employing ChopNLearn in even more difficult tasks such as 3D reconstruction, video frame interpolation, state change creation, etc.

Visit https://chopnlearn.github.io/ for further information.

To sum it up

Researchers offer ChopNLearn, a novel dataset for gauging compositional generalization, or the capacity of models to detect and build unseen compositions of objects in different states. In addition, they present two new tasks—Compositional Image Generation and Compositional Action Recognition—on which to evaluate the effectiveness of existing generative models and video recognition techniques. They illustrate the problems with the current methods and their limited generalizability to new compositions. These two activities, however, are merely the tip of the proverbial iceberg. Multiple image and video activities rely on understanding object states, including 3D reconstruction, future frame prediction, video production, summarization, and parsing of long-term video. As a result of this dataset, researchers hope to see new compositional challenges for photos, videos, 3D, and other media proposed and learned by the computer vision community. 

Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

Recognition and Generation of Object-State Compositions in Machine Learning Using “Chop and Learn” Read More »

Researchers from Stanford and Microsoft Introduce Self-Improving AI: Leveraging GPT-4 to Elevate Scaffolding Program Performance

Almost every aim described in natural language may be optimized by querying a language model. However, a program may frequently provide outputs with greater objective values by making several organized calls to a language model. They refer to these as “scaffolding” programs, and they are often created (by people) using a computer language like Python. Their main finding is that a scaffolding program’s design is an optimization issue for any distribution over optimization problems and any given language model. Researchers from Microsoft Research and Stanford University in this paper describe the Self-Taught Optimizer (STOP), a technique in which the recursive application of code that uses a language model to enhance any given solution leads to self-improvement. 

Their method starts with an initial seed “improver” scaffolding program that uses the language model to enhance a response to a subsequent challenge. The model improves this improver program as the system iterates. To measure the effectiveness of their self-optimizing architecture, they apply a limited selection of downstream algorithmic tasks. Their findings show that the model improves as it runs through more iterations using its self-improvement techniques. STOP demonstrates how language models may function as their meta-optimizers in this way. In addition, they analyze the kind of self-improvement tactics the model (see Figure 1) suggests, how well the recommended strategies translate to downstream tasks, and if the model is vulnerable to risky self-improvement techniques. 

Figure 1: Examples of self-improvement techniques suggested and used by GPT-4 are shown here. The arbitrary code, including the scaffolding code itself, is then revised using each technique as scaffolding.

Since the underlying language model is unaltered, this issue is known as recursively self-improving code generation, which is inspired by but not entirely a Recursively Self-Improving (RSI) system. It has been at least 50 years since researchers formalized the concept of RSI. That effort, however, concentrated on creating systems that were more competent in general and made the assumption that the model could improve every part of its code. Their research is a modest step in that direction because it only considers the model’s capacity to enhance the scaffold that invokes it iteratively. The RSI-code-generation problem is first stated mathematically well-defined in this study. 

Then, they create and assess STOP to illustrate the possible use of RSI-code generation. Different downstream jobs have demonstrated improvements. When utilizing a version of the GPT-4 language model trained on data up to 2021, far in advance of the debut of most scaffolding systems, Figure 1 demonstrates a few of the intriguing and useful scaffolds STOP offers. Additional tests track how frequently the model tries to turn off a sandbox flag. Finally, they tackle issues with the ethical development of such technology. 

The main contributions of this work are:

Formulating a meta-optimization strategy where a scaffolding system recursively improves itself.

Demonstrating that this system can successfully recursively improve itself using a modern language model (GPT-4 in particular).

Examining the self-improvement techniques proposed and implemented by the model, including how the model avoids safety precautions like a sandbox.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

Researchers from Stanford and Microsoft Introduce Self-Improving AI: Leveraging GPT-4 to Elevate Scaffolding Program Performance Read More »

Revolutionizing Wearable Tech: Edge Impulse’s Ultra-Efficient Heart Rate Algorithm & Expanding Healthcare Suite

Machine learning is used in almost every aspect of our lives and across various fields. It’s a technology becoming increasingly prevalent and finding applications in many areas. Its relevance is especially important in medicine because it is essential to improving healthcare procedures. Machine learning is revolutionizing how we tackle medical problems, from identifying diseases to forecasting patient outcomes, ultimately leading to better patient care and medical research.

Consequently, a company called Edge Impulse, which specializes in on-device machine learning and artificial intelligence, has announced the launch of what it claims is the smallest and most precise heart rate measurement algorithm. They also emphasized that it requires only one-sixteenth of the competition’s memory.

The researchers emphasize that this innovative algorithm functions as a nervous system health detective for our body. To comprehend how the autonomic nervous system is maintained in balance, it examines changes in our heart rate and the intervals between beats. Our general health, including heart health, stress levels, and how quickly we bounce back from activities, depends on this balance. 

With the help of a straightforward sensor that measures the light that passes through our skin (a photoplethysmogram), the algorithm’s cleverness allows it to provide precise heart rate and heart rate variability values. Wearables like those worn on the finger frequently contain this sensor. The measurement and analysis of heart rate interbeat intervals (IBIs) are fundamental in studying cardiovascular physiology and health. Heart rate variability (HRV) measures the variation in time between successive heartbeats. It goes beyond the measurement of the heart rate itself. 

The algorithm primarily uses light-based sensors like those used in fitness bands and smartwatches, but it can also utilize electrocardiogram (ECG) sensors. It is extremely intelligent—while using only one-sixteenth of the memory compared to its nearest rival, it can diagnose atrial fibrillation, detect falls, monitor sleep, gauge stress, and recognize changes in activity levels.

They have algorithms for measuring body temperature, monitoring movement, and tracking posture and brain activity data through electroencephalograms (EEG). Edge Impulse has developed data dashboards for real-time monitoring and a research data lake for clinical data to improve these algorithms even more.

The researchers emphasized that this significantly reduces the money required for research and development (R&D) to produce unique algorithms. The researchers also highlighted that modern algorithms are used in Edge Impulse’s HR/HRV solutions, which negates the need for time-consuming, difficult algorithm refinement. 

Edge Impulse also offers a robust infrastructure to enable the growth of centralized and decentralized clinical investigations, accommodating small and big subject groups. This scalability is essential for extensive testing and validation since it guarantees that the dataset utilized is diverse and reduces model biases.

Check out the Research Page and Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

Revolutionizing Wearable Tech: Edge Impulse’s Ultra-Efficient Heart Rate Algorithm & Expanding Healthcare Suite Read More »

This AI Paper Introduces Lemur and Lemur Chat For Harmonizing Natural Language and Code For Language Agents

In a broad sense, intelligent agents are autonomous problem solvers endowed with perception, judgment, and action capabilities based on data gathered from their surroundings. Recent applications of this idea have shown promise in developing language agents that can use natural language to do a wide range of complex tasks in various contexts. This is especially true when these agents are constructed using large language models (LLMs). Agents of this type can mimic human thought and language because they draw on human expertise in the form of LLMs. This allows people to be flexible in their use of tools, adapt to new situations, reason linguistically, and develop multi-agent systems on the fly. 

LLMs should grasp human interaction, reasoning, and planning and ensure grounding in the necessary contexts to properly construct the foundation of language agents. LLMs’ natural language capabilities allow them to closely mimic human conversation, thinking, and planning. However, environment-based execution is typically accomplished through general-purpose code or domain-specific APIs, such as those used to manage web browsers, communicate with operating system command line interface terminals, and control robotic arms.

To fill this gap, a new study by the University of Hong Kong, XLang Lab, Salesforce Research, Sea AI Lab, University of Washington, and MIT CSAIL present Lemur and Lemur-Chat, two state-of-the-art, publicly available models that have been pre-trained and fine-tuned to achieve harmony between text and code. Through carefully crafted pre-training and instruction fine-tuning steps, the researchers improved the original Llama-2-70B. To ensure enhanced capabilities in coding ability while retaining performance in natural language ability, they constructed a code-centric corpus based on The Stack, including 90 billion tokens with a 10:1 text-to-code ratio. This prototype is known as Lemur. To create the instruction-following model, Lemur-Chat, they first pretrained it using around 100K instances from both text and code. Lemur and Lemur-Chat have been proven to be the most well-rounded open-source models after undergoing extensive examinations across 8 textual and coding benchmarks. 

In addition, this effort sets out to provide agent standards for evaluating the core competencies of linguistic agents in various settings. The team focuses particularly on their skill with tools and their ability to root themselves in both environmental and social feedback. They also investigate the difficulties inherent in real-world, partially visible situations, where the agent must operate based on incomplete information and perform additional actions to fill in the gaps. Experiments show that Lemur-Chat performs better in 12 of the 13 agent benchmarks compared to other open-source models. This exemplifies how Lemur-Chat can outperform existing open-source models for language agents by bridging the performance gap between open-source and commercial alternatives by combining natural and coding talents. 

The results of these tests demonstrate the importance of combining linguistic and computational skills in agent-based settings. Models like Llama-2-70B-Chat, which excel in natural language processing but struggle with coding, can efficiently use basic tools to aid reasoning because the action space is constrained, and the effort of employing such tools is low. In contrast, the action space is typically enormous when confronted with sophisticated decision-making scenarios like web browsing and home navigation, and models with high coding abilities have an edge when constructing complex executable action sequences. In sum, Lemur’s superior performance can be attributed to its natural language processing and programming superiority. This study lays the groundwork for creating sophisticated language agents that can function well in a wide range of settings by shedding light on optimizing the synergy between natural and programming languages. 

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

This AI Paper Introduces Lemur and Lemur Chat For Harmonizing Natural Language and Code For Language Agents Read More »

Scroll to Top