<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://selimbin.github.io/feed.xml" rel="self" type="application/atom+xml"/><link href="https://selimbin.github.io/" rel="alternate" type="text/html" hreflang="en"/><updated>2025-07-13T21:40:58+00:00</updated><id>https://selimbin.github.io/feed.xml</id><title type="html">blank</title><subtitle>A website showcasing some of my work and achievements. </subtitle><entry><title type="html">AudioLDM &amp;amp; MusicGen for Voice-to-Music Generation</title><link href="https://selimbin.github.io/blog/2025/MusicGen-blog/" rel="alternate" type="text/html" title="AudioLDM &amp;amp; MusicGen for Voice-to-Music Generation"/><published>2025-07-05T14:00:00+00:00</published><updated>2025-07-05T14:00:00+00:00</updated><id>https://selimbin.github.io/blog/2025/MusicGen-blog</id><content type="html" xml:base="https://selimbin.github.io/blog/2025/MusicGen-blog/"><![CDATA[<p>Welcome to this blog post on <strong>generating instrumental music from vocal input</strong>, a novel and intuitive interface for musical expression. This post presents and summarizes work done on extending two prominent models—<strong>AudioLDM</strong> and <strong>MusicGen (AudioCraft)</strong>—to perform <strong>voice-to-music generation</strong> with optional multi-modal conditioning.</p> <h2 id="introduction"><strong>Introduction</strong></h2> <p>Audio synthesis has recently seen breakthroughs via <strong>latent diffusion models</strong> and <strong>autoregressive transformers</strong>. Among these, <strong>AudioLDM</strong> and <strong>MusicGen</strong> are state-of-the-art frameworks in the realm of controllable sound generation.</p> <p>However, despite these advances, using the <strong>human voice</strong> as a control modality remains underexplored. Voice—through singing or humming—provides a natural way for non-musicians to guide music generation.</p> <p>In this project:</p> <ul> <li>We extend <strong>AudioLDM</strong> to support <strong>audio-based conditioning</strong> (voice input).</li> <li>We fine-tune <strong>MusicGen</strong> to generate music from <strong>melody and genre prompts</strong>.</li> <li>We build a custom <strong>vocal-instrumental dataset</strong> with 416 paired tracks across 42 artists and multiple languages.</li> <li>We evaluate both models and show that <strong>multi-modal conditioning</strong> leads to better generation quality.</li> </ul> <h2 id="background"><strong>Background</strong></h2> <h3 id="audioldm">AudioLDM</h3> <p>AudioLDM is a <strong>latent diffusion model</strong> designed for text-to-audio generation. It works in Mel spectrogram space and includes:</p> <ul> <li>A <strong>VAE</strong> for encoding and decoding spectrograms</li> <li>The <strong>CLAP</strong> encoder for extracting semantic information from text/audio</li> <li>A <strong>UNet</strong>-based diffusion model to generate latent audio</li> </ul> <h4 id="audioldm-pipeline">AudioLDM Pipeline</h4> <ol> <li>Encode text using CLAP → embedding</li> <li>Encode input audio into latent Mel via VAE</li> <li>Generate latent audio with diffusion model (conditioned on embedding)</li> <li>Decode Mel → waveform using HiFi-GAN</li> </ol> <h3 id="musicgen-audiocraft">MusicGen (AudioCraft)</h3> <p>MusicGen, from Meta’s <strong>AudioCraft</strong>, is an <strong>autoregressive Transformer</strong> for high-fidelity music generation using:</p> <ul> <li>Discrete audio tokens from <strong>EnCodec</strong></li> <li>Melody conditioning using <strong>chroma pitch features</strong></li> <li>Optional <strong>text conditioning</strong> for genre/style control</li> </ul> <h4 id="musicgen-pipeline">MusicGen Pipeline</h4> <ol> <li>Extract chroma features from melody (e.g., vocal input)</li> <li>Encode text prompt (T5 encoder)</li> <li>Transformer generates EnCodec tokens</li> <li>Decode tokens into waveform (32kHz audio)</li> </ol> <h4 id="applications">Applications</h4> <ul> <li><strong>Voice-to-instrument</strong> synthesis</li> <li><strong>Prompt-based music composition</strong></li> <li><strong>Genre transfer</strong></li> <li><strong>Melody harmonization</strong></li> </ul> <h2 id="dataset"><strong>Dataset</strong></h2> <p>To train both models, we created a dataset of:</p> <ul> <li><strong>416 tracks</strong> across <strong>42 artists</strong></li> <li>Paired <strong>vocals</strong> and <strong>instrumentals</strong></li> <li>Metadata: genre, BPM, key, mood, artist name</li> <li><strong>Languages</strong>: English (69%), Arabic (19%), French (12%)</li> </ul> <p>Tracks were segmented for training:</p> <ul> <li>30s chunks for <strong>MusicGen</strong></li> <li>20s chunks for <strong>AudioLDM</strong></li> </ul> <h2 id="methodology"><strong>Methodology</strong></h2> <h3 id="extending-audioldm">Extending AudioLDM</h3> <p>Original AudioLDM only supported <strong>text conditioning</strong>. We extended it to support <strong>multi-modal conditioning</strong> by:</p> <ul> <li>Using both <strong>CLAP text and audio encoders</strong></li> <li>Concatenating embeddings for unified context</li> <li>Injecting context via FiLM layers or direct input to UNet</li> </ul> <p>This allows generation from:</p> <ul> <li>Voice only</li> <li>Voice + text prompt (e.g., genre)</li> <li>Text only</li> </ul> <h3 id="fine-tuning-musicgen">Fine-tuning MusicGen</h3> <p>We fine-tuned the <strong>1.5B parameter MusicGen-Melody model</strong> in two stages:</p> <ol> <li><strong>Voice-to-music</strong>: using isolated vocals to guide melody</li> <li><strong>Voice + text</strong>: adding genre-specific text prompts (e.g., “Disco music for input vocals”)</li> </ol> <h2 id="experimental-setup"><strong>Experimental Setup</strong></h2> <p>Training was done on an <strong>NVIDIA A40 (48 GB VRAM)</strong> GPU.</p> <h3 id="musicgen-fine-tuning">MusicGen Fine-Tuning</h3> <ul> <li>Model: MusicGen-Melody (1.5B)</li> <li>Batch size: 2</li> <li>Epochs: 48</li> <li>Duration: ~48 hours</li> </ul> <h3 id="audioldm-fine-tuning">AudioLDM Fine-Tuning</h3> <ul> <li>Model: AudioLDM-s (330M)</li> <li>Trained for 10k steps / 12 hrs per variant</li> <li>Variants: <ul> <li>Voice only</li> <li>Voice + text</li> <li>Refined text prompts (to match MusicGen prompt format)</li> </ul> </li> </ul> <h2 id="results"><strong>Results</strong></h2> <h3 id="musicgen-output">MusicGen Output</h3> <p>MusicGen showed:</p> <ul> <li>Strong alignment to both <strong>melody and genre prompts</strong></li> <li>Stylistic consistency across genres</li> <li>Clearer, more coherent outputs than the pretrained baseline</li> </ul> <h3 id="audioldm-output">AudioLDM Output</h3> <ul> <li><strong>Voice-only</strong>: poor output, often noisy or incoherent</li> <li><strong>Voice + text</strong>: major improvement</li> <li><strong>Refined prompts</strong>: better genre alignment and clarity</li> </ul> <p>While improved, AudioLDM still underperforms compared to MusicGen, especially in musical coherence and fidelity.</p> <h3 id="-voice-to-music-generation-results">🎧 Voice-to-Music Generation Results</h3> <ul id="test-cases" class="tab" data-tab="bfedf07e-14f2-41e9-b205-6e3cf4908c76" data-name="test-cases"> <li class="active" id="test-cases-test-1--pop"> <a href="#">Test 1: Pop </a> </li> <li id="test-cases-test-2a--pop-prompt"> <a href="#">Test 2a: Pop Prompt </a> </li> <li id="test-cases-test-2b--disco-prompt"> <a href="#">Test 2b: Disco Prompt </a> </li> <li id="test-cases-test-3--disco"> <a href="#">Test 3: Disco </a> </li> <li id="test-cases-test-4--rock"> <a href="#">Test 4: Rock </a> </li> </ul> <ul class="tab-content" id="bfedf07e-14f2-41e9-b205-6e3cf4908c76" data-name="test-cases"> <li class="active"> <p><strong>Input Vocal</strong></p> <audio controls="" src="/assets/audio/test1_input_2.wav"></audio> <p><strong>Ground Truth Instrumental</strong></p> <audio controls="" src="/assets/audio/test1_gt_2.wav"></audio> <p><strong>Genre Prompt:</strong> <code class="language-plaintext highlighter-rouge">Pop</code></p> <p><strong>MusicGen Pretrained Output</strong></p> <audio controls="" src="/assets/audio/test1_pretrained_2.wav"></audio> <p><strong>MusicGen Fine-tuned Output</strong></p> <audio controls="" src="/assets/audio/test1_musicgen_ft_2.wav"></audio> <p><strong>AudioLDM Fine-tuned Output</strong></p> <audio controls="" src="/assets/audio/test1_audioldm_ft_2.wav"></audio> </li> <li> <p><strong>Input Vocal</strong></p> <audio controls="" src="/assets/audio/test2_input.wav"></audio> <p><strong>Ground Truth Instrumental</strong></p> <audio controls="" src="/assets/audio/test2_gt.wav"></audio> <p><strong>Genre Prompt:</strong> <code class="language-plaintext highlighter-rouge">Pop</code></p> <p><strong>MusicGen Fine-tuned Output</strong></p> <audio controls="" src="/assets/audio/test2_musicgen_pop.wav"></audio> <p><strong>AudioLDM Fine-tuned Output</strong></p> <audio controls="" src="/assets/audio/test2_audioldm_pop.wav"></audio> </li> <li> <p><strong>Input Vocal</strong></p> <audio controls="" src="/assets/audio/test2_input.wav"></audio> <p><strong>Ground Truth Instrumental</strong></p> <audio controls="" src="/assets/audio/test2_gt.wav"></audio> <p><strong>Genre Prompt:</strong> <code class="language-plaintext highlighter-rouge">Disco</code></p> <p><strong>MusicGen Fine-tuned Output</strong></p> <audio controls="" src="/assets/audio/test2_musicgen_disco.wav"></audio> <p><strong>AudioLDM Fine-tuned Output</strong></p> <audio controls="" src="/assets/audio/test2_audioldm_disco.wav"></audio> </li> <li> <p><strong>Input Vocal</strong></p> <audio controls="" src="/assets/audio/test3_input.wav"></audio> <p><strong>Ground Truth Instrumental</strong></p> <audio controls="" src="/assets/audio/test3_gt.wav"></audio> <p><strong>Genre Prompt:</strong> <code class="language-plaintext highlighter-rouge">Disco</code></p> <p><strong>MusicGen Pretrained Output</strong></p> <audio controls="" src="/assets/audio/test3_pretrained.wav"></audio> <p><strong>MusicGen Fine-tuned Output</strong></p> <audio controls="" src="/assets/audio/test3_musicgen_ft.wav"></audio> <p><strong>AudioLDM Fine-tuned Output</strong></p> <audio controls="" src="/assets/audio/test3_audioldm_ft.wav"></audio> </li> <li> <p><strong>Input Vocal</strong></p> <audio controls="" src="/assets/audio/test4_input.wav"></audio> <p><strong>Ground Truth Instrumental</strong></p> <audio controls="" src="/assets/audio/test4_gt.wav"></audio> <p><strong>Genre Prompt:</strong> <code class="language-plaintext highlighter-rouge">Rock</code></p> <p><strong>MusicGen Pretrained Output</strong></p> <audio controls="" src="/assets/audio/test4_pretrained.wav"></audio> <p><strong>MusicGen Fine-tuned Output</strong></p> <audio controls="" src="/assets/audio/test4_musicgen_ft.wav"></audio> <p><strong>AudioLDM Fine-tuned Output</strong></p> <audio controls="" src="/assets/audio/test4_audioldm_ft.wav"></audio> </li> </ul> <h3 id="-cultural-generalizability-unclean-vocal-inputs">🌍 Cultural Generalizability: Unclean Vocal Inputs</h3> <p>To evaluate how well the models generalize across <strong>languages</strong>, <strong>accents</strong>, and <strong>noisy vocal inputs</strong>, we tested on vocal tracks from different cultural backgrounds using known songs in <strong>Arabic</strong>, <strong>French</strong>, <strong>Egyptian Arabic</strong>, and <strong>English</strong>.</p> <p>Each case uses the original unclean vocals and generates instrumental output via the fine-tuned models.</p> <ul id="cultural-generalizability" class="tab" data-tab="4d99e0e3-a175-4f48-b566-087f78edd039" data-name="cultural-generalizability"> <li class="active" id="cultural-generalizability-arabic------------"> <a href="#">Arabic — كفّك إنتَ </a> </li> <li id="cultural-generalizability-french---la-vie-en-rose"> <a href="#">French — La Vie en Rose </a> </li> <li id="cultural-generalizability-egyptian---cairokee"> <a href="#">Egyptian — CairoKee </a> </li> <li id="cultural-generalizability-english---sweet-caroline"> <a href="#">English — Sweet Caroline </a> </li> </ul> <ul class="tab-content" id="4d99e0e3-a175-4f48-b566-087f78edd039" data-name="cultural-generalizability"> <li class="active"> <p><strong>Culture:</strong> Arabic<br/> <strong>Song:</strong> <em>Kefak enta (كفّك إنتَ)</em></p> <p><strong>Input Vocal:</strong></p> <audio controls="" style="width: 100%;"> <source src="/assets/audio/culture_arabic_input.wav" type="audio/wav"/> </audio> <p><strong>Generated Instrumental Output:</strong></p> <audio controls="" style="width: 100%;"> <source src="/assets/audio/culture_arabic_output.wav" type="audio/wav"/> </audio> </li> <li> <p><strong>Culture:</strong> French<br/> <strong>Song:</strong> <em>La Vie en Rose — Édith Piaf</em></p> <p><strong>Input Vocal:</strong></p> <audio controls="" style="width: 100%;"> <source src="/assets/audio/culture_french_input.wav" type="audio/wav"/> </audio> <p><strong>Generated Instrumental Output:</strong></p> <audio controls="" style="width: 100%;"> <source src="/assets/audio/culture_french_output.wav" type="audio/wav"/> </audio> </li> <li> <p><strong>Culture:</strong> Egyptian Arabic<br/> <strong>Song:</strong> <em>Cairokee – James Dean</em></p> <p><strong>Input Vocal:</strong></p> <audio controls="" style="width: 100%;"> <source src="/assets/audio/culture_egypt_input.wav" type="audio/wav"/> </audio> <p><strong>Generated Instrumental Output:</strong></p> <audio controls="" style="width: 100%;"> <source src="/assets/audio/culture_egypt_output.wav" type="audio/wav"/> </audio> </li> <li> <p><strong>Culture:</strong> English<br/> <strong>Song:</strong> <em>Sweet Caroline</em></p> <p><strong>Input Vocal:</strong></p> <audio controls="" style="width: 100%;"> <source src="/assets/audio/culture_english_input.wav" type="audio/wav"/> </audio> <p><strong>Generated Instrumental Output:</strong></p> <audio controls="" style="width: 100%;"> <source src="/assets/audio/culture_english_output.wav" type="audio/wav"/> </audio> </li> </ul> <h2 id="-evaluation-results">🔍 Evaluation Results</h2> <p>We conducted a comprehensive evaluation of our models using both <strong>qualitative</strong> and <strong>quantitative</strong> methods. The goal was to compare the performance of our fine-tuned models—<strong>MusicGen Fine-tuned</strong> and <strong>AudioLDM Fine-tuned</strong>—against the <strong>pretrained MusicGen</strong> baseline in generating instrumental music from vocal input.</p> <hr/> <h3 id="qualitative-evaluation-user-listening-survey">Qualitative Evaluation: User Listening Survey</h3> <p>To assess perceptual quality and alignment, we conducted a user study with <strong>10 participants</strong>, each comparing outputs from all three models across <strong>4 different songs</strong>. Participants answered <strong>12 questions</strong> covering:</p> <ul> <li>Vocal alignment</li> <li>Genre fit</li> <li>Overall audio quality</li> </ul> <h4 id="win-rate-comparison-per-question-basis">Win Rate Comparison (Per Question Basis)</h4> <table> <thead> <tr> <th>Model</th> <th>Win Rate (%)</th> </tr> </thead> <tbody> <tr> <td>MusicGen Fine-tuned</td> <td><strong>58.57%</strong></td> </tr> <tr> <td>MusicGen Pretrained</td> <td>43.33%</td> </tr> <tr> <td>AudioLDM Fine-tuned</td> <td>40.00%</td> </tr> </tbody> </table> <blockquote> <p>ℹ️ While MusicGen Fine-tuned led in preference, the relatively close win rates highlight the complexity of modeling user expectations and stylistic alignment in music generation.</p> </blockquote> <h4 id="-genre-preference-bar-chart-simplified">📊 Genre Preference Bar Chart (Simplified)</h4> <p>Across four test tracks, participants selected preferred outputs based on genre fit:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Track   | MusicGen Fine-tuned | AudioLDM Fine-tuned
--------|----------------------|---------------------
Test 1  | 7 votes              | 3 votes
Test 2  | 5 votes              | 5 votes
Test 3  | 5 votes              | 5 votes
Test 4  | 5 votes              | 5 votes
</code></pre></div></div> <p>🎵 Interpretation: While MusicGen was generally preferred for melodic coherence, both models showed similar strength in capturing genre cues.</p> <hr/> <h3 id="quantitative-evaluation">Quantitative Evaluation</h3> <p>We used two established metrics to evaluate realism and prompt consistency:</p> <h4 id="clap-score-text-audio-alignment">CLAP Score (Text-Audio Alignment)</h4> <p>CLAP (Contrastive Language-Audio Pretraining) measures similarity between the generated audio and the text prompt.</p> <table> <thead> <tr> <th>Model</th> <th>CLAP Score ↑</th> </tr> </thead> <tbody> <tr> <td>MusicGen Fine-tuned</td> <td><strong>0.180</strong></td> </tr> <tr> <td>MusicGen Pretrained</td> <td><strong>0.180</strong></td> </tr> <tr> <td>AudioLDM Fine-tuned</td> <td>0.117</td> </tr> </tbody> </table> <p><strong>Higher is better.</strong> MusicGen clearly excels at maintaining alignment with semantic prompts, while AudioLDM showed weaker consistency, despite its improvements from multi-modal fine-tuning.</p> <h4 id="fréchet-audio-distance-fad">Fréchet Audio Distance (FAD)</h4> <p>FAD assesses the realism of generated audio by comparing the statistical distribution of embeddings against real instrumentals.</p> <table> <thead> <tr> <th>Model</th> <th>FAD Score ↓</th> </tr> </thead> <tbody> <tr> <td>MusicGen Pretrained</td> <td><strong>10.64</strong></td> </tr> <tr> <td>MusicGen Fine-tuned</td> <td>10.70</td> </tr> <tr> <td>AudioLDM Fine-tuned</td> <td><strong>9.48</strong></td> </tr> </tbody> </table> <p><strong>Lower is better.</strong> Interestingly, AudioLDM Fine-tuned achieved the lowest FAD score, indicating that it generates more acoustically realistic audio—even if semantically weaker. This suggests it captures low-level audio features well.</p> <hr/> <h3 id="summary-of-findings">Summary of Findings</h3> <ul> <li><strong>MusicGen Fine-tuned</strong> was preferred in qualitative tests and matched baselines in CLAP score.</li> <li><strong>AudioLDM Fine-tuned</strong> produced more realistic audio per FAD but lacked in semantic alignment.</li> <li>Combining <strong>voice + text conditioning</strong> yields stronger results than using audio-only inputs.</li> <li>While MusicGen appears better suited for structured, genre-aware music generation, AudioLDM benefits from its latent-domain realism and could be enhanced further with architectural tuning.</li> </ul> <h2 id="conclusion"><strong>Conclusion</strong></h2> <p>This work proposes a <strong>voice-guided music generation</strong> framework by extending two powerful audio generation models. Key contributions:</p> <ul> <li>A <strong>custom dataset</strong> with paired vocals/instrumentals and rich metadata</li> <li><strong>Multi-modal conditioning</strong> for both AudioLDM and MusicGen</li> <li><strong>Transformer-based generation</strong> (MusicGen) outperforms diffusion-based generation (AudioLDM) in quality</li> </ul> <h3 id="takeaways">Takeaways</h3> <ul> <li><strong>Voice + text prompts</strong> offer the best control and realism</li> <li><strong>MusicGen</strong> is better suited for voice-to-instrument tasks today</li> <li><strong>AudioLDM</strong> can improve with further architecture tuning</li> </ul> <h3 id="future-work">Future Work</h3> <ul> <li>Larger datasets with studio-quality separation</li> <li>Conditioning on <strong>chord progressions</strong>, <strong>lyrics</strong>, or <strong>emotions</strong></li> <li>Real-time applications in music apps, or web tools</li> </ul> <h2 id="references"><strong>References</strong></h2> <ul> <li><a href="https://arxiv.org/abs/2301.12503">AudioLDM</a></li> <li><a href="https://arxiv.org/abs/2306.05284">MusicGen (AudioCraft)</a></li> <li><a href="https://arxiv.org/abs/2301.12661">CLAP: Contrastive Language-Audio Pretraining</a></li> <li><a href="https://arxiv.org/abs/2210.13438">EnCodec</a></li> <li><a href="https://github.com/LAION-AI/CLAP">HTSAT-CLAP Implementation</a></li> </ul> <hr/>]]></content><author><name></name></author><category term="blog-posts"/><category term="deep-learning"/><category term="music-generation"/><category term="diffusion-models"/><category term="transformers"/><summary type="html"><![CDATA[A deep learning blog post on extending AudioLDM and MusicGen for voice-guided multi-modal music generation]]></summary></entry><entry><title type="html">CasCast</title><link href="https://selimbin.github.io/blog/2024/CasCast-blog/" rel="alternate" type="text/html" title="CasCast"/><published>2024-08-05T12:30:13+00:00</published><updated>2024-08-05T12:30:13+00:00</updated><id>https://selimbin.github.io/blog/2024/CasCast-blog</id><content type="html" xml:base="https://selimbin.github.io/blog/2024/CasCast-blog/"><![CDATA[<p>Welcome to my blog post summarizing and presenting the paper “<a href="https://arxiv.org/abs/2402.04290">CasCast</a>: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling”. The paper will be presented in this blog post as it showcases a novel approach with great promise.</p> <h2 id="introduction"><strong>Introduction</strong></h2> <p>The CasCast paper represents an advancement in the field of meteorological forecasting, particularly in the accurate prediction of precipitation using high-resolution radar data. The paper specifically addresses the challenges faced in nowcasting, which is the prediction of weather conditions for a short period, usually up to two hours ahead. Accurate Weather Forecast in the immediate future are of critical importance for the fields of disaster management and in various social sectors. This paper aims to provide a robust solution to improve prediction accuracy, especially for extreme weather events.</p> <h2 id="motivation"><strong>Motivation</strong></h2> <p>Around the world extreme weather events result in significant damages every year. One of the most destructive consequences that can be caused by these events is flooding. Flooding is due to high amounts of precipitation that occur. Weather Forecasting is essential to handle disaster, plan for them and has impacts on many sectors (transportation, event planning, etc..). Precipitation events involve multiple scales of atmospheric systems, making accurate predictions challenging. Furthermore most current methods struggle with Short-term forecasting (NowCasting). Nowcasting is defined here as forecasting events that will occur within the next 2 hours. These predictions are essential for emergency management and disaster mitigation. This precipitation data can be used to give Real-Time warning to impacted communities.</p> <h2 id="some-information-to-know-beforehand"><strong>Some information to know beforehand</strong></h2> <p>Previous Research in this field has faced multiple problems. First of all, precipitation events involve multiple scales of atmospheric systems, making accurate predictions challenging. These systems are influenced by Mesoscale precipitation as well as small scale systems.</p> <p>Previous research also faced challenges in predicting extreme precipitation events, which are seen in small scales. Which is important because over the past 50 years, extreme-precipitation events have caused more than 1 million related deaths, and economic losses beyond US$ 2.8 trillion</p> <h3 id="precipitation-systems">Precipitation Systems</h3> <p>Mesoscale precipitation systems evolve over spatial ranges of tens to hundreds of kilometers and time scales of several hours, driven and constrained by relatively stable large-scale circulation.</p> <p>Small-scale systems evolve within a range of a few kilometers and operating on time scales of minutes, is influenced by local processes such as heating, surface features, and other physical factors, which introduce stochasticity and unpredictability into the systems behavior.</p> <h3 id="models-used">Models Used</h3> <p>Another problem is that short term forecasting has its limitations. Each type of model whether deterministic or probabilistic has its limitations. Deterministic models are unable to capture the fine-grained detail of precipitation patterns. On the other hand, Probabilistic models are unable to capture large scale movements.</p> <p>Deterministic models aim to predict the overall motion of mid-scale precipitation systems with a single-value forecast, but they often lack detail and appear blurry because they average out the randomness of small-scale systems.</p> <p>Probabilistic models, on the other hand, sample from various latent variables to represent the randomness of future weather, capturing small-scale phenomena better. However, they struggle with accurately forecasting the large-scale, predictable distribution of precipitation.</p> <p>In summary, current models still face challenges in simultaneously predicting both mesoscale and small-scale systems.</p> <h2 id="deep-learning-approach"><strong>Deep Learning Approach</strong></h2> <h3 id="mapping-the-problem">Mapping The Problem</h3> <p>In order to approach this problem, its structure must be well-defined. Using multiple inputs, experts are attempting to predict the weather conditions using deep Learning Models.</p> <p>The inputs used are generally Radar Data (High Resolution Radar Echo Images) as well as a variety of Atmospheric Variables; these can include temperature, humidity or wind patterns for example. This is the data from the past, from time 0 to T, where T is the current time step (present). The data covers T time steps</p> <p>This is an example of a High Resolution Radar Echo Image that can be used as input.</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/radar_echo_input-480.webp 480w,/assets/img/radar_echo_input-800.webp 800w,/assets/img/radar_echo_input-1400.webp 1400w," sizes="95vw" type="image/webp"/> <img src="/assets/img/radar_echo_input.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <div class="caption"> Example of a High Resolution Radar Echo Image </div> <p>The desired outputs are accurate precipitation maps of the affected areas, below is an example of a desired precipitation map.</p> <p>In this image areas of precipitation are colored from lightest precipitation to highest precipitation, this progression is as follows: green, yellow, orange, red and pink. The darker the shade in each color the higher the precipitation.</p> <p>It is also important to note that the pink areas are the areas of extreme precipitation.</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/output_sample_1-480.webp 480w,/assets/img/output_sample_1-800.webp 800w,/assets/img/output_sample_1-1400.webp 1400w," sizes="95vw" type="image/webp"/> <img src="/assets/img/output_sample_1.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <div class="caption"> Output Precipitation Map from CasCast </div> <h3 id="loss-function">Loss Function</h3> <p>Multiple Loss Function can be used to train the models. A couple of loss functions that can be used are:</p> <ul> <li><strong>Mean Squared Error (MSE)</strong>: Can be used for training deterministic models. It measures the average squared difference between the estimated values and what is observed. This loss helps in minimizing the forecast error in terms of the general precipitation distribution.</li> <li><strong>Noise Prediction Loss</strong>: Can be used in probabilistic models where a diffusion process is involved. This loss function helps in refining the generation of local weather phenomena, focusing on the specifics that the deterministic model might miss.</li> <li><strong>Hybrid Loss</strong>: Different loss function can be combined to train complex, multi-part models.</li> </ul> <h2 id="what-loss-functions-and-scoring-functions-are-used-"><strong>What loss functions and scoring functions are used ?</strong></h2> <h3 id="loss-functions">Loss Functions</h3> <h4 id="mean-squared-error-loss">Mean Squared Error Loss</h4> <p>In deterministic component the mean squared error loss is used.</p> \[L_{\text{MSE}} = \frac{1}{N} \sum_{i=1}^{N} \left( y_i - \hat{y}_i \right)^2\] <p>where \(y_i\) is the observed value and \(\hat{y}_i\) is the predicted value. MSE is used to minimize the average squared differences between the predicted and observed values, this ensures that the model captures the general precipitation patterns accurately.</p> <h4 id="noise-prediction-loss">Noise Prediction Loss</h4> <p>One of the loss functions used is the Noise Prediction Loss, which was presented by <a href="https://arxiv.org/pdf/2006.11239">Ho et al</a> in 2020:</p> \[L_{\theta_p} = \mathbb{E}_{\epsilon, k} \left[ \| \epsilon - \epsilon_{\theta_p}(z_k, k, z_{\text{cond}}) \|_2^2 \right]\] <p>where:</p> <ul> <li>\(\epsilon\) is the true noise added to the data during the forward diffusion process.</li> <li>\(k\) is the current time step in the diffusion process.</li> <li>\(z_k\) is the latent variable at time step \(k\).</li> <li>\(\epsilon_{\theta_p}(z_k, k, z_{\text{cond}})\) is the predicted noise by the model.</li> <li>\(z_{\text{cond}}\) represents the conditional information, including the latent representations of the initial radar observations and deterministic model outputs.</li> </ul> <p>The noise prediction loss is utilized in the probabilistic component, particularly within the diffusion model framework. The objective of the noise prediction loss is to train the model to accurately predict the noise ϵ added at each step of the diffusion process. By minimizing this loss, the model learns to reverse the noise addition, effectively denoising the data to retrieve the original high-resolution precipitation patterns.</p> <h4 id="hybrid-loss">Hybrid Loss</h4> <p>By integrating both losses, the hybrid loss ensures that the model captures both broad precipitation patterns (mesoscale) and fine-grained details (small-scale), balancing deterministic accuracy and probabilistic realism.</p> <h3 id="scoring-formulas">Scoring Formulas</h3> <p>In the paper 4 different scoring methods are used to compare the results from different models. These scoring methods are : Critical Success Index (CSI), Heidke Skill Score (HSS), Continuous Ranked Probability Score (CRPS) and Continuous Ranked Probability Score (CRPS).</p> <p>Some of these scoring methods can be viewed in detail below.</p> <details><summary>Critical Success Index (CSI)</summary> <p><br/> It measures the accuracy of binary event forecasts, particularly useful for precipitation where the event is the occurrence of rainfall above a certain threshold.</p> \[\text{CSI} = \frac{\text{TP}}{\text{TP} + \text{FP} + \text{FN}}\] <p>CSI Measures the accuracy of binary event forecasts, which is useful for precipitation where the event is the occurrence of rainfall above a certain threshold (precipitation above a particular intensity).</p> </details> <p><br/> </p> <details><summary>Heidke Skill Score (HSS)</summary> <p><br/> Evaluates the accuracy of forecasts relative to random chance, considering the number of correct predictions.</p> \[\text{HSS} = \frac{2(\text{TP} \cdot \text{TN} - \text{FP} \cdot \text{FN})}{(\text{TP} + \text{FN})(\text{FN} + \text{TN}) + (\text{TP} + \text{FP})(\text{FP} + \text{TN})}\] <p>HSS is used to evaluate the overall skill of the model compared to random chance, providing a balanced measure of accuracy.</p> </details> <p><br/> </p> <details><summary>Continuous Ranked Probability Score (CRPS)</summary> <p><br/> Measures the accuracy of probabilistic forecasts by comparing the predicted cumulative distribution function (CDF) to the observed outcome.</p> \[\text{CRPS}(F, y) = \int_{-\infty}^{\infty} \left[ F(x) - \mathbf{1}(x \geq y) \right]^2 dx\] <p>where \(F(x)\) is the CDF of the forecasted distribution at value \(x\), and \(\mathbf{1}(x \geq y)\) is the indicator function that equals 1 if \(x \geq y\) and 0 otherwise.</p> <p>CRPS evaluates the probabilistic predictions, ensuring that the predicted distributions align well with the actual observed values.</p> </details> <p><br/></p> <details><summary>Structural Similarity Index Measure (SSIM)</summary> <p><br/> Measures the perceived quality of the predictions compared to the ground truth, considering luminance, contrast, and structure.</p> <p>SSIM is calculated using a sliding window approach over the images, typically involving:</p> \[\text{SSIM}(x, y) = \frac{(2 \mu_x \mu_y + c_1)(2 \sigma_{xy} + c_2)}{(\mu_x^2 + \mu_y^2 + c_1)(\sigma_x^2 + \sigma_y^2 + c_2)}\] <p>where \(\mu_x\) and \(\mu_y\) are the means of the two images \(x\) and \(y\), \(\sigma_x^2\) and \(\sigma_y^2\) are the variances, \(\sigma_{xy}\) is the covariance, and \(c_1\) and \(c_2\) are constants to stabilize the division.</p> <p>SSIM is used to assess the visual similarity between predicted precipitation patterns and the actual observations, providing a measure of structural accuracy.</p> </details> <p><br/></p> <h2 id="cascast-model"><strong>CasCast Model</strong></h2> <p>CasCast is a deep learning model designed for high-resolution precipitation nowcasting, which tackles the challenge of accurately predicting precipitation in the short term using radar data. This is a novel model that incorporates a cascaded architecture.</p> <h3 id="cascaded-architecture">Cascaded Architecture</h3> <p>It is structured into two main components; a deterministic model and a probabilistic model, that work in tandem. This cascaded approach allows the model to effectively handle the complexities of precipitation systems that operate at different scales which was a challenge to previous models.</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/cascaded_model1-480.webp 480w,/assets/img/cascaded_model1-800.webp 800w,/assets/img/cascaded_model1-1400.webp 1400w," sizes="95vw" type="image/webp"/> <img src="/assets/img/cascaded_model1.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <div class="caption"> CasCast Model Design </div> <h3 id="deterministic-model">Deterministic Model</h3> <p>This first part of the model can incorporate conventional neural network architectures such as CNNs, RNNs, or Transformers, that are trained to minimize mean squared error (MSE) loss.</p> <p>This part of the model is responsible for predicting the mesoscale aspects of precipitation (larger, more predictable patterns).</p> <p>This allows the model to capture the broad movements in weather patterns. The output of this models provides a solid foundation for the second type of models used in this architecture - the probabilistic model.</p> <p>Seen here is the deterministic part of the model architecture.</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/deterministic-480.webp 480w,/assets/img/deterministic-800.webp 800w,/assets/img/deterministic-1400.webp 1400w," sizes="95vw" type="image/webp"/> <img src="/assets/img/deterministic.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <div class="caption"> CasCast Deterministic Component Architecture </div> <h3 id="probabilistic-model">Probabilistic Model</h3> <p>Using the output of the deterministic model, the probabilistic model generates the fine-grained details and local variations within the precipitation pattern. It aims to model the stochasticity inherent in meteorological systems, this is particularly useful for capturing the nuances of extreme weather events.</p> <p>In CasCast the probabilistic model is a frame-wise-guided diffusion transformer. This is a generative model that simulates the process of adding and removing noise to generate detailed predictions. This part is crucial for enhancing the resolution and accuracy of predictions at a localized scale.</p> <p>Seen below is the probabilistic part of the model architecture.</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/probabilistic-480.webp 480w,/assets/img/probabilistic-800.webp 800w,/assets/img/probabilistic-1400.webp 1400w," sizes="95vw" type="image/webp"/> <img src="/assets/img/probabilistic.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <div class="caption"> CasCast Probabilistic Component Architecture </div> <h3 id="overall-architecture">Overall Architecture</h3> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/overall_architecture-480.webp 480w,/assets/img/overall_architecture-800.webp 800w,/assets/img/overall_architecture-1400.webp 1400w," sizes="95vw" type="image/webp"/> <img src="/assets/img/overall_architecture.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <div class="caption"> CasCast Architecture </div> <p>As you can see here the model has 2 parts a deterministic and probabilistic part (shown on the left side).</p> <p>The deterministic model takes the input and generates mesoscale precipitation. All this data is then combined and given to the Casformer.</p> <p>The diffusion denoise process generates the fine-grained details using the input data and denoising the image.</p> <p>Diffusion models learn the reverse process of gradually noising data \(x_0\) into Gaussian noise.</p> <p>CasFormer has a frame-wise encoding stage and a sequence-wise decoding stage. The frame-wise encoding provides better-matched conditions for each frame-wise latent vector, reducing the complexity of the denoising conditioned by a sequence of blurry predictions.</p> <p>Sequence-wise decoding utilizes the sequence features from the sequence aggregator to ensure the spatiotemporal consistency of precipitation nowcasting.</p> <p>This frame-wise guidance in diffusion transformer ensures a frame-to-frame correspondence between blurry predictions and latent vectors, resulting in better optimization for the generation of small-scale patterns.</p> <h2 id="results"><strong>Results</strong></h2> <p>The results obtained from the CasCast model shows that it has powerful capabilities in high-resolution precipitation nowcasting, particularly in handling complex weather events.</p> <h3 id="datasets-used">Datasets Used</h3> <p>The CasCast model was trained and tested on three radar precipitation datasets to evaluate its performance and robustness in different geographic and climatic conditions. These datasets were:</p> <ul> <li>The <strong>SEVIR</strong> Dataset comprising weather radar observations mostly from the United States. It features a spatial resolution of 1km and covers a large geographic area.</li> <li>The <strong>HKO-7</strong> Dataset that comes from the Hong Kong Observatory and is primarily used for studying the regional weather conditions around Hong Kong. It contains radar CAPPI (Constant Altitude Plan Position Indicator) reflectivity images, which are critical for analyzing rainfall and storm patterns at an altitude of 2km. The dataset has a resolution of 480x480 pixels, covering a 512km x 512km area around Hong Kong.</li> <li>The <strong>MeteoNet</strong> Dataset from Meteo France. This dataset includes comprehensive weather data from different regions of France. The data is recorded with a spatial resolution of approximately 0.01 degrees. For practical applications, a portion of 400x400 pixels from the top left corner is often used to ensure quality and consistency.</li> </ul> <h3 id="comparative-results">Comparative Results</h3> <p>CasCast has demonstrated substantial improvements over existing models, especially in predicting extreme weather events. The model surpasses baseline models by up to 91.8% in regional extreme-precipitation nowcasting. This leap in performance showcases the efficacy of the cascaded modeling approach and the strength of CasCast.</p> <p>Using metrics like the Critical Success Index (<strong>CSI</strong>), Heidke Skill Score (<strong>HSS</strong>), and Continuous Ranked Probability Score (<strong>CRPS</strong>) the model shows high performance.</p> <p>The model had the highest <strong>CSI</strong> scores. Improvements in <strong>CSI</strong>, particularly at finer scales, indicate that the model can more accurately detect where and how intense precipitation events will occur, matching the actual observed data more closely than previous models.</p> <p>Furthermore CasCast had the highest <strong>HSS</strong> score when compared to other models. Improved <strong>HSS</strong> scores suggest that CasCast is better at distinguishing between events and non-events, which is crucial for preventing false alarms—a common issue in weather prediction.</p> <p>Finally CasCast have the lowest <strong>CRPS</strong> scores among the models tested, lower <strong>CRPS</strong> score indicate that the probabilistic forecasts of CasCast closely resemble the actual outcomes, suggesting high predictive reliability and better uncertainty estimation in forecasts.</p> <p>The results of these comparative tests can be seen below.</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/DataComSamples-480.webp 480w,/assets/img/DataComSamples-800.webp 800w,/assets/img/DataComSamples-1400.webp 1400w," sizes="95vw" type="image/webp"/> <img src="/assets/img/DataComSamples.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <div class="caption"> Comparative Results between multiple models using multiple datasets </div> <h3 id="deterministic-model-selection">Deterministic Model Selection</h3> <p>Multiple Deterministic models can be used as the first component of the CasCast architecture. To assess how well the cascaded strategy in CasCast works, it was tested on the SEVIR dataset.It was compared using just the probabilistic generation model with different deterministic models as shown the table and figure just below.</p> <p>The results were that using either the probabilistic or deterministic model alone led to problems like pixel mismatches or poor predictions of small-scale patterns and extreme regional values. However results improved significantly when ConvLSTM, SimVP, and EarthFormer were used as the deterministic parts of CasCast, . The CSI-219-POOL16 metric shot up by 49.54%, 57.69%, and 91.83% with each of these models. This shows that the cascaded approach improves regional extreme-precipitation nowcasting.</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/DeterministicPartSelection-480.webp 480w,/assets/img/DeterministicPartSelection-800.webp 800w,/assets/img/DeterministicPartSelection-1400.webp 1400w," sizes="95vw" type="image/webp"/> <img src="/assets/img/DeterministicPartSelection.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <div class="caption"> Replacement of the deterministic part of CasCast on SEVIR dataset </div> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/DetermModelComparFigure-480.webp 480w,/assets/img/DetermModelComparFigure-800.webp 800w,/assets/img/DetermModelComparFigure-1400.webp 1400w," sizes="95vw" type="image/webp"/> <img src="/assets/img/DetermModelComparFigure.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <div class="caption"> Frame-wise CSI-M (left) and CSI-219 (right) results when using different Deterministic Models </div> <p>In short, a better deterministic model leads to better overall performance in cascaded modeling. This highlights the value of combining both probabilistic and deterministic models for more accurate predictions.</p> <h3 id="visual-comparisons">Visual Comparisons</h3> <p>Comparisons with outputs from other models highlight CasCast’s superior performance in capturing both macro and micro-precipitation dynamics without issues like blurring or oversimplification that can be seen in other forecasting models. The results demonstrate that the model can distinguish between different precipitation intensities and spatial distributions with high fidelity.</p> <p>Some output image comparisons can be seen below.</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/cascast_results_2-480.webp 480w,/assets/img/cascast_results_2-800.webp 800w,/assets/img/cascast_results_2-1400.webp 1400w," sizes="95vw" type="image/webp"/> <img src="/assets/img/cascast_results_2.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <div class="caption"> Sample Model Output Image </div> <p>The animation below shows an important distinction between CasCast and other models. As can be seen below CasCast is the model most able to predict the ground truth.</p> <p>The clear advantage of CasCast is that it is the only model that it able to detect the areas of extreme precipitation (pink areas).</p> <div class="col-sm mt-3 mt-md-0"> <figure> <video src="/assets/img/high_precip.mp4" class="img-fluid rounded z-depth-1" width="auto" height="auto" autoplay="" controls=""/> </figure> </div> <div class="caption"> Video Comparing Resutls using Different Models </div> <h3 id="computational-efficiency">Computational Efficiency</h3> <p>Despite the high-resolution outputs, CasCast efficiently manages computational resources, making it suitable for real-time operational use. This efficiency is very important for practical deployment in meteorological stations and emergency management systems where time is essential when producing predictions.</p> <h2 id="analysis"><strong>Analysis</strong></h2> <p>The CasCast model presents several strengths and limitations. Below is an overview of some of its pros based on its described capabilities and performance:</p> <h3 id="pros">Pros</h3> <ul> <li> <p><strong>Enhanced Prediction Accuracy for Extreme Events:</strong> CasCast excels in predicting extreme precipitation events. The model surpasses baseline models in accuracy for regional extreme-precipitation nowcasting, it can be a valuable tool for sectors that rely heavily on accurate weather predictions, such as agriculture, transportation, and public safety as well as disaster management and emergency services.</p> </li> <li> <p><strong>Efficient Computational Performance:</strong> CasCast also uses computational resources efficiently. This efficiency makes it feasible for real-time applications.</p> </li> </ul> <h3 id="cons">Cons</h3> <ul> <li> <p><strong>Complexity in Training and Implementation:</strong> The dual-model structure of CasCast adds complexity in training and model tuning. This complexity could pose challenges in terms of the time and resources needed for model training and optimization.</p> </li> <li> <p><strong>Generalizability Across Different Regions:</strong> However one diffeciency is that CasCast needs to be retrained for different geographic regions and conditions to maintain its accuracy and effectiveness. This requirement limits its applicability in global settings without retraining.</p> </li> </ul> <h2 id="conclusion"><strong>Conclusion</strong></h2> <h3 id="future-work">Future Work</h3> <p>The authors of this paper propose to further explore incorporating additional data sources like wind and temperature data to train the model.</p> <p>Another aspect that needs to be further researched is multi region training. To train the model to work on any region or design a unified model that can work on multiple datasets.</p> <h3 id="what-should-you-remember">What should you remember?</h3> <p>So, what should you remember from this paper?</p> <ul> <li>First It proposes a dual model approach: CasCast combines deterministic and Probabilistic models to achieve high resolutions.</li> <li>CasCast can enhance disaster preparedness, by providing more detailed and reliable forecasts.</li> <li>Finally, Cascast is more efficient. It manages computational resources efficiently, making it suitable for real-time applications.</li> </ul> <h2 id="references"><strong>References</strong></h2> <ul> <li><a href="https://arxiv.org/abs/2402.04290">CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling</a></li> <li><a href="https://arxiv.org/pdf/2006.11239">Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.</a></li> <li>Chen, &amp; Zhang, Chunze &amp; Liu, Jingtian &amp; Zeng,. (2019). Generative Adversarial Networks Capabilities for Super-Resolution Reconstruction of Weather Radar Echo Images. Atmosphere. 10. 555. <a href="https://www.researchgate.net/publication/335862244_Generative_Adversarial_Networks_Capabilities_for_Super-Resolution_Reconstruction_of_Weather_Radar_Echo_Images">10.3390/atmos10090555</a></li> <li>All images, tables and figures in this post can be found in the references above.</li> </ul> ]]></content><author><name></name></author><category term="blog-posts"/><category term="deep-learning"/><category term="weather-forecasting"/><summary type="html"><![CDATA[A blog Describing Skillful High Resolution Precipitation Nowcasting via CasCaded Modeling]]></summary></entry></feed>