Editing article

Title

Summary

Content

<h3>Automated insights on Medium articles with GenAI and Ruby!</h3>A few months ago my manager asked me to ramp up on GenAI. That was one of the best work days of my life! Google offers a lot of toys to play with, and it’s hard not to have fun. I’m fortunate enough to have a job I really love! But enough with my dramatic style, since <a href="https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/api-quickstart">Vertex AI </a>Palm API seems to think i’m a bit too Italian, look:<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*wnUNrC0-Mg0j5Hef0kJXqQ.png" /><figcaption>“Riccardo has a flair for the dramatic”: Palm API seems to have a sense of humour!</figcaption></figure>This article will illustrate how to crawl someone’s Medium page, and use GenAI to build a JSON file which has a mix of the crawled information and some new information which Vertex AI’s text-bison was able to infer for us.<h4>The code</h4>TL;DR If you want to delve into the code: <a href="https://github.com/palladius/genai-googlecloud-scripts/tree/main/03-ruby-medium-article-slurper">https://github.com/palladius/genai-googlecloud-scripts/tree/main/03-ruby-medium-article-slurper</a><figure><img alt="" src="https://cdn-images-1.medium.com/max/949/0*Nq7afPB21pYiBnx5" /><figcaption>A Ruby crawler to find insights on Medium Articles</figcaption></figure>A Ruby crawler to find insights on Medium ArticlesMy script is simple: you give it a Medium handle (mine is “palladiusbonton”) and it will do two things:<ol><li>Parse the XML RSS feed from Medium (thanks for making it curl-able!) which provides a list of the latest (10?) articles by that person. [<a href="https://github.com/palladius/genai-googlecloud-scripts/blob/main/03-ruby-medium-article-slurper/inputs/medium-feed.palladiusbonton.xml">example</a>]</li><li>Extract some significant fields (title, body, date, keywords, ..) [<a href="https://github.com/palladius/genai-googlecloud-scripts/blob/main/03-ruby-medium-article-slurper/inputs/medium-latest-articles.palladiusbonton.txt">example</a>]</li><li>Paste those at the end of a convoluted <a href="https://github.com/palladius/genai-googlecloud-scripts/blob/main/03-ruby-medium-article-slurper/main.rb#L34-L93">GenAI prompt</a>.</li><li>Call the Vertex AI Palm API to get an answer. [<a href="https://github.com/palladius/genai-googlecloud-scripts/blob/main/03-ruby-medium-article-slurper/outputs/medium-latest-articles.palladiusbonton.txt.json">sample</a>]</li></ol>What am I trying to prove? I’m trying to use an LLM for a few things:<ul><li>Data Scraping. I can do this also in a deterministic way (with <a href="https://nokogiri.org/index.html">nokogiri</a>).</li><li>Classification. I’m really playing with this a lot. I ask the system to rate articles from 1 to 10, and to tell whether they’re Google Cloud articles or not. I also ask it to infer author’s nationality and favorite languages.</li><li>Summarization. I get a small summary of each article, which is super useful.</li></ul><h4>Two prompts: Text and JSON</h4>This morning, my prompt looked like this:<pre>Prompt = &lt;&lt;-END_OF_PROMPT Provide a summary for each of the following articles. * Please write about the topics, the style, and rate the article from 1 to 10 in terms of accuracy or professionalism. * Please also tell me, for each article, whether it talks about Google Cloud. * Can you guess the nationality of the person (or geographic context of the article itself) writing all of these articles? * If you can find any typos or visible mistakes, please write them down. -- #{Medium content will be pasted here} END_OF_PROMPT</pre>My colleague Marc C has showed me that genAI can do better — it can write a JSON for you, with very little guidance! Actually it’s probably better than a human to close parenthesis and double quotes :)So after 3–4 hours of tinkering I got to this version:<pre> ### PROMPT HISTORY # 1.6 16nov23 Removed typos from articles. # 1.5 16nov23 Added movie. # 1.4 16nov23 M oved from TXT to JSON! PromptInJson = &lt;&lt;-END_OF_PROMPT You are an avid article reader and summarizer. I&#39;m going to provide a list of articles for a single person and ask you to do this: 1. For each article, I&#39;m going to ask a number of per-article questions 2. Overall, I&#39;m going to ask questions about the author. I&#39;m going to provide a JSON structure for the questions I ask. If you don&#39;t know some answer, feel free to leave NULL/empty values. 1. Per-article: * Please write about the topics, the style, and rate the article from 1 to 10 in terms of accuracy or professionalism. * Please also tell me, for each article, whether it talks about Google Cloud. * For each article, capture the original title and please produce a short 300-500-character summary. * What existing movie or book would this article remind you the most of? Try a guess, use your fantasy. 2. Overall (author): * Extract name and surname * Can you guess the nationality of the person (or geographic context of the article itself) writing all of these articles? * Please describe this author style. Is it professional or more personal? Terse or verbose? .. * Does this author prefer a certain language? In which language are their code snippets (if any)? * If you can find any typos or recurring mistakes in any article, please write them here. Please provide the output in a `JSON` file as an array of answer per article, like this: { &quot;prompt_version&quot;: &quot;1.6a&quot;, // do NOT change this, take verbatim &quot;author_name&quot;: &quot;&quot;, // name and surname of the author &quot;author_nationality&quot;: &quot;&quot;, // nationality here &quot;author_style&quot;: &quot;&quot;, // overall author style: is it professional or more personal? Terse or verbose? .. &quot;author_favorite_languages&quot;: &quot;&quot;, // which languages does the author use? Pascal? C++? Python? Java? Usa comma separated for the list. &quot;typos&quot;: [{ // array of mistakes or typos &quot;current&quot;: &quot;xxx&quot;, // typo or mistake &quot;correct&quot;: &quot;yyy&quot;, // }], &quot;articles_feedback&quot;: [ // article 1 { &quot;title&quot;: &quot;&quot;, // This should be the ORIGINAL article title, you should be able to extract it from the TITLE XML part, like &quot;&lt;title&gt;&lt;![CDATA[What is toilet papers right side?]]&gt;&lt;/title&gt;&quot; &quot;summary&quot;: &quot;...&quot;, // This should be the article summary produced by you. &quot;publication_date&quot;: &quot;&quot; // This should be provided to you in input &quot;accuracy&quot;: XXX, // Integer 1 to 10 &quot;is_gcp&quot;: false, // boolean, true of false &quot;movie_or_book&quot;: &quot;&quot; // string, a book or film this article content reminds you of. ] }, // Article 2, and so on.. ] } Make **ABSOLUTELY SURE** the result is valid JSON or I&#39;ll have to drop the result. Here are the articles: -- END_OF_PROMPT</pre>I’ve been playing around with the structure a lot: what pertains to the single article, what with the global part (author identity / style)?<h4>Hey, we want some action!</h4>Sure, here you are! Let’s look setp by step the output from the Ruby crawler and comment on it:<figure><img alt="" src="https://cdn-images-1.medium.com/max/949/0*k25ajaAGq6ruaS2O" /><figcaption>Typical Ruby crawler, in Nokogiri</figcaption></figure><h4>Phase 1 — download the XML.</h4>First, curl this XML: <a href="https://medium.com/feed/@palladiusbonton">https://medium.com/feed/@palladiusbonton</a> . Results looks like this:<figure><img alt="" src="https://cdn-images-1.medium.com/max/608/1*1TOI6AooifViF0eP-0YbkA.png" /><figcaption>Zooming on a single article (XML RSS Feed)</figcaption></figure>XML looks like this, so far so good.<h4>Phase 2 — psarse the XML wityh Nokogiri</h4>For each item, I scrape things I care about. Probably I could have it just infer automatically, but it’s little effort, so why not! I’t just about finding the fields you care about:<ul><li>iterate through item</li><li>Per item, extract what you want: title , dc:creator, link , pubDate , ..</li></ul>The heavy lifting is done by <a href="https://nokogiri.org/index.html">nokogiri</a>, which is a real 💎 gem among Ruby gems; no, it’s not a repetition, it’s just a double gem. Pythonists and Javascriptists, I wish you had it in your language too 😜<pre># Code: https://github.com/palladius/genai-googlecloud-scripts/blob/main/03-ruby-medium-article-slurper/main.rb#L120-L141 File.open(genai_input_filename, &#39;w&#39;) do |file| # file.write(&quot;your text&quot;) } ## Version 2: Scrape more important metadatsa docSM.xpath(&quot;//item&quot;).each_with_index do |node,ix| # Article file.writeln &quot;\n====== Article #{ix+1} =====&quot; title = node.xpath(&quot;title&quot;).inner_text creator = node.xpath(&quot;dc:creator&quot;).inner_text url = node.xpath(&quot;link&quot;).inner_text pubDate = node.xpath(&quot;pubDate&quot;).inner_text categories = node.xpath(&quot;category&quot;).map{|c| c.inner_text} # there&#39;s many, eg: [&quot;cycling&quot;, &quot;google-forms&quot;, &quot;data-studio&quot;, &quot;pivot&quot;, &quot;google-sheets&quot;] article_content = ActionView::Base.full_sanitizer.sanitize(node.inner_text) file.writeln &quot;* Title: &#39;#{title}&#39;&quot; file.writeln &quot;* Author: &#39;#{creator}&#39;&quot; file.writeln &quot;* URL: &#39;#{url}&#39;&quot; file.writeln &quot;* PublicationDate: &#39;#{pubDate}&#39;&quot; file.writeln &quot;* Categories: #{categories.join(&#39;, &#39;)}&quot; file.writeln &quot;&quot; file.writeln article_content end end</pre>Result of this part of the code looks like this:<pre>====== Article 3 ===== * Title: &#39;Migrate GCP projects across organizations, the gcloud way&#39; * Author: &#39;Riccardo Carlesso&#39; * URL: &#39;https://medium.com/google-cloud/how-to-migrate-projects-across-organizations-c7e254ab90af?source=rss-b5293b96912f------2&#39; * PublicationDate: &#39;Tue, 18 Apr 2023 13:16:26 GMT&#39; * Categories: gcp-security-operations, google-cloud-platform, migration [...] Nel mezzo del cammin di nostra vita, mi ritrovai per una selva oscura, ché la diritta via era smarrita”— Dante Alighieri(*), Divine Comedy(*) the Italian version of Shakespeare, </pre><h4>Phase 3 — Get LLM to infer information and spark some humour</h4>If this was a deterministic program we could just end here: I have all the info, I’m basically translating from XML to JSON. Not majorly useful.I’ll use GenAI to add some color here:<ul><li>Infer author nationality, favorite languages and writing style.</li><li>Per article, write a small summary, infer accuracy and if it’s a GCP article or not. There you go, we have a rudimental QA tester and a classifier! — wOOOt! 😮</li></ul>For my buddy Romin I get this:<pre>{ &quot;prompt_version&quot;: &quot;1.6b&quot;, &quot;author_name&quot;: &quot;Romin Irani&quot;, &quot;author_nationality&quot;: null, &quot;author_style&quot;: &quot;verbose, professional&quot;, &quot;author_favorite_languages&quot;: &quot;Python, Java&quot;, &quot;typos&quot;: [ { &quot;current&quot;: &quot;criterias&quot;, &quot;correct&quot;: &quot;criteria&quot; } ], &quot;articles_feedback&quot;: [ { &quot;title&quot;: &quot;Integrating langchain4j and PaLM 2 Chat Bison Model&quot;, &quot;summary&quot;: &quot;The article describes how to integrate Langchain4j and PaLM 2 Chat Bison Model. It provides detailed instructions on how to set up the environment, create a Google Cloud Function, and deploy the model. The article also includes a code sample and a link to the GitHub repository.&quot;, &quot;url&quot;: &quot;https://medium.com/google-cloud/integrating-langchain4j-and-palm-2-chat-bison-model-a684cefd67af?source=rss-802a4d428d95------2&quot;, &quot;publication_date&quot;: &quot;Mon, 06 Nov 2023 11:01:02 GMT&quot;, &quot;accuracy&quot;: 9, &quot;is_gcp&quot;: true, &quot;movie_or_book&quot;: null }, { &quot;title&quot;: &quot;Google Cloud Platform Technology Nuggets \u2014 October 15\u201331, 2023 Edition&quot;, &quot;summary&quot;: &quot;The article provides a summary of Google Cloud Platform Technology Nuggets for the period of October 15-31, 2023. It covers topics such as infrastructure, containers and Kubernetes, identity and security, networking, machine learning, storage, databases and data analytics, and developers and practitioners.&quot;, &quot;url&quot;: &quot;https://medium.com/google-cloud/google-cloud-platform-technology-nuggets-october-15-31-2023-edition-4d5ea0689e30?source=rss-802a4d428d95------2&quot;, &quot;publication_date&quot;: &quot;Tue, 31 Oct 2023 10:22:07 GMT&quot;, &quot;accuracy&quot;: 8, &quot;is_gcp&quot;: true, &quot;movie_or_book&quot;: null }, { &quot;title&quot;: &quot;Develop a FlutterFlow App powered by Vertex AI PaLM 2 Integration&quot;, &quot;summary&quot;: &quot;The article describes how to integrate FlutterFlow with Vertex AI PaLM 2. It provides a step-by-step guide on how to set up the environment, create a Google Cloud Function, and deploy the model. The article also includes a code sample and a link to the GitHub repository.&quot;, &quot;url&quot;: &quot;https://medium.com/google-cloud/flutterflow-and-vertex-ai-palm-2-integration-14c137e83053?source=rss-802a4d428d95------2&quot;, &quot;publication_date&quot;: &quot;Thu, 26 Oct 2023 09:37:11 GMT&quot;, &quot;accuracy&quot;: 8, &quot;is_gcp&quot;: true, &quot;movie_or_book&quot;: null } ] }</pre>For my buddy Guillaume I get this:<pre>{ &quot;prompt_version&quot;: &quot;1.6b&quot;, &quot;author_name&quot;: &quot;Guillaume Laforge&quot;, &quot;author_nationality&quot;: &quot;French&quot;, &quot;author_style&quot;: &quot;Guillaume Laforge&#39;s writing style can be described as professional, informative, and engaging. He often writes about technology, open-source software, and programming.&quot;, &quot;author_favorite_languages&quot;: &quot;Java, Python&quot;, &quot;articles_feedback&quot;: [ { &quot;title&quot;: &quot;Tech Watch #4 \u2014 October, 27, 2023&quot;, &quot;summary&quot;: &quot;The article provides a summary of the latest developments in the field of artificial intelligence (AI). It covers topics such as the use of LLMs in vector embeddings, the scheduling of PostgreSQL tasks with pg_cron, and the creation of maps with Protomaps. The article also includes links to the original sources for further reading.&quot;, &quot;url&quot;: &quot;https://glaforge.medium.com/tech-watch-4-october-27-2023-d48a1449eeb0?source=rss-431147437aeb------2&quot;, &quot;publication_date&quot;: &quot;Fri, 27 Oct 2023 15:04:58 GMT&quot;, &quot;accuracy&quot;: 8, &quot;is_gcp&quot;: false, &quot;movie_or_book&quot;: &quot;The Matrix&quot; }, { &quot;title&quot;: &quot;Tech Watch #3 \u2014 October, 20, 2023&quot;, &quot;summary&quot;: &quot;The article provides a summary of the latest developments in the field of technology. It covers topics such as the use of Groovy in 2023, the state of WebAssembly in 2023, and the use of large language models (LLMs) to solve logic problems. The article also includes links to the original sources for further reading.&quot;, &quot;url&quot;: &quot;https://glaforge.medium.com/tech-watch-3-october-20-2023-11a70245017d?source=rss-431147437aeb------2&quot;, &quot;publication_date&quot;: &quot;Fri, 20 Oct 2023 19:49:34 GMT&quot;, &quot;accuracy&quot;: 9, &quot;is_gcp&quot;: false, &quot;movie_or_book&quot;: &quot;2001: A Space Odyssey&quot; }, { &quot;title&quot;: &quot;Client-side consumption of a rate-limited API in Java&quot;, &quot;summary&quot;: &quot;The article discusses different approaches for consuming rate-limited APIs in Java. It covers topics such as using exponential backoff and jitter, scheduled execution, and using the Bucket4J library. The article also includes code examples for each approach.&quot;, &quot;url&quot;: &quot;https://glaforge.medium.com/client-side-consumption-of-a-rate-limited-api-in-java-9fbf08673791?source=rss-431147437aeb------2&quot;, &quot;publication_date&quot;: &quot;Mon, 02 Oct 2023 00:00:53 GMT&quot;, &quot;accuracy&quot;: 10, &quot;is_gcp&quot;: false, &quot;movie_or_book&quot;: &quot;The Imitation Game&quot; }, { &quot;title&quot;: &quot;Tech Watch #1 \u2014 Sept 29, 2023&quot;, &quot;summary&quot;: &quot;The article provides a summary of the latest developments in the field of technology. It covers topics such as observability-driven development for LLMs, rebuilding LLM documentation chatbots, container security, and the future of databases. The article also includes links to the original sources for further reading.&quot;, &quot;url&quot;: &quot;https://glaforge.medium.com/tech-watch-1-sept-29-2023-2ac0a3a5016c?source=rss-431147437aeb------2&quot;, &quot;publication_date&quot;: &quot;Fri, 29 Sep 2023 00:00:11 GMT&quot;, &quot;accuracy&quot;: 7, &quot;is_gcp&quot;: false, &quot;movie_or_book&quot;: &quot;Minority Report&quot; } ] }</pre>For myself I get something like this:<pre>{ &quot;prompt_version&quot;: &quot;1.6b&quot;, &quot;author_name&quot;: &quot;Riccardo Carlesso&quot;, &quot;author_nationality&quot;: &quot;Italian&quot;, &quot;author_style&quot;: &quot;Verbose, uses personal anecdotes.&quot;, &quot;author_favorite_languages&quot;: &quot;Ruby&quot;, &quot;articles_feedback&quot;: [ { &quot;title&quot;: &quot;What is toilet paper\u2019s right side?&quot;, &quot;summary&quot;: &quot;The author discusses the question of which side of toilet paper is the \u201cright\u201d side. They explore the different opinions on this topic and provide their own thoughts on the matter. Ultimately, they conclude that there is no definitive answer to this question and that it is up to each individual to decide which side they prefer.&quot;, &quot;url&quot;: &quot;https://medium.com/@palladiusbonton/what-is-toilet-papers-right-side-8da0504d6d0b?source=rss-b5293b96912f------2&quot;, &quot;publication_date&quot;: &quot;Tue, 08 Aug 2023 16:37:20 GMT&quot;, &quot;accuracy&quot;: 8, &quot;is_gcp&quot;: false, &quot;movie_or_book&quot;: null }, { &quot;title&quot;: &quot;Spaghetti Bolognese don\u2019t exist!!1!&quot;, &quot;summary&quot;: &quot;The author discusses the common misconception that spaghetti Bolognese is an Italian dish. They explain that this dish is actually not from Italy and that it is not considered to be a traditional Italian dish. They also provide some tips on how to make a more authentic Italian pasta dish.&quot;, &quot;url&quot;: &quot;https://medium.com/@palladiusbonton/spaghetti-bolognese-dont-exist-1-2088d85909dd?source=rss-b5293b96912f------2&quot;, &quot;publication_date&quot;: &quot;Fri, 21 Apr 2023 16:09:23 GMT&quot;, &quot;accuracy&quot;: 9, &quot;is_gcp&quot;: false, &quot;movie_or_book&quot;: null }, { &quot;title&quot;: &quot;Migrate GCP projects across organizations, the gcloud way&quot;, &quot;summary&quot;: &quot;The author provides a detailed guide on how to migrate GCP projects across organizations using the gcloud command-line tool. They cover everything from setting up the necessary permissions to executing the migration. This article is a valuable resource for anyone who needs to migrate GCP projects across organizations.&quot;, &quot;url&quot;: &quot;https://medium.com/google-cloud/how-to-migrate-projects-across-organizations-c7e254ab90af?source=rss-b5293b96912f------2&quot;, &quot;publication_date&quot;: &quot;Tue, 18 Apr 2023 13:16:26 GMT&quot;, &quot;accuracy&quot;: 10, &quot;is_gcp&quot;: true, &quot;movie_or_book&quot;: null }]}</pre><h4>Lessons learnt</h4>Today I learnt a few things:<ul><li>For the first time the token limitation was visible to me. Palm API’s text-bisonmodel has a 32k-token limit, and what I didn’t know is that it seems shared between input and output. If I increase the input size, this diminishes the output size (still to be confirmed, for the moment it’s just a hunch). For this reason I reduce my input from 32k to 22k. To see Token Count, Google gives you a <a href="https://cloud.google.com/vertex-ai/docs/generative-ai/get-token-count?hl=en">nice API to calculate it</a> (thanks Guillaume). You can see this very well from the API return JSON (note the sum of thetotalTokens here is exactly the maximum, 8192):</li></ul><pre>&quot;metadata&quot;: { &quot;tokenMetadata&quot;: { &quot;outputTokenCount&quot;: { &quot;totalTokens&quot;: 549, &quot;totalBillableCharacters&quot;: 1312 }, &quot;inputTokenCount&quot;: { &quot;totalBillableCharacters&quot;: 20713, &quot;totalTokens&quot;: 7643 } } }</pre><ul><li>Prompting is a (long) fine-tuning feedback loop: you try something out, and after a few answers you realize it doesn’t work, so you try to ‘bribe’ your model saying to “please do something as it’s very important to you”. Example: the movie or book is always empty, it’s probably a stretch for a 0.1 temperature API invocation. So I change “What existing movie or book would this article remind you the most of? Try a guess, use your fantasy” by adding “Please do NOT leave this null! It’s just for fun. yet its very important to me”. Note that doesn’t fix — output gets “None” instead of null, which is fun. But read on…</li><li>Temperature is an important parameter. When tasked to infer a title name for my articles, it would refuse, until I raised the temperature from 0.1 to 0.3. Now I get a curious result: my films are finally there! Wait — Ratatouille, seriously?</li></ul><pre>{ &quot;prompt_version&quot;: &quot;1.6b&quot;, &quot;author_name&quot;: &quot;Riccardo Carlesso&quot;, &quot;author_nationality&quot;: &quot;Italian&quot;, &quot;author_style&quot;: &quot;Verbose, uses humor and personal anecdotes. Seems to prefer Ruby on Rails.&quot;, &quot;author_favorite_languages&quot;: &quot;Ruby&quot;, &quot;typos&quot;: [ { &quot;current&quot;: &quot;cis-centralis /pendens&quot;, &quot;correct&quot;: &quot;cis-centralis/pendens&quot; }, { &quot;current&quot;: &quot;trans-centralis/mur\u00e0lis&quot;, &quot;correct&quot;: &quot;trans-centralis/muralis&quot; }, { &quot;current&quot;: &quot;spaghetti Bolognese don\u2019t&quot;, &quot;correct&quot;: &quot;Spaghetti Bolognese doesn&#39;t&quot; } ], &quot;articles_feedback&quot;: [ { &quot;title&quot;: &quot;What is toilet paper\u2019s right side?&quot;, &quot;summary&quot;: &quot;The author discusses the great \&quot;toilet paper orientation debate\&quot; and shares their own experiences and opinions on the matter, ultimately concluding that there is no one right answer.&quot;, &quot;url&quot;: &quot;https://medium.com/@palladiusbonton/what-is-toilet-papers-right-side-8da0504d6d0b?source=rss-b5293b96912f------2&quot;, &quot;publication_date&quot;: &quot;Tue, 08 Aug 2023 16:37:20 GMT&quot;, &quot;accuracy&quot;: 8, &quot;is_gcp&quot;: false, &quot;movie_or_book&quot;: &quot;The Big Lebowski&quot; }, { &quot;title&quot;: &quot;Spaghetti Bolognese don\u2019t exist!!1!&quot;, &quot;summary&quot;: &quot;The author argues that the popular dish \&quot;Spaghetti Bolognese\&quot; does not exist in Italy and is considered an \&quot;imaginary dish\&quot; by Italians. They explain that the traditional Italian dish is called \&quot;rag\u00f9 alla bolognese\&quot; and is typically served with tagliatelle or other types of pasta, not spaghetti.&quot;, &quot;url&quot;: &quot;https://medium.com/@palladiusbonton/spaghetti-bolognese-dont-exist-1-2088d85909dd?source=rss-b5293b96912f------2&quot;, &quot;publication_date&quot;: &quot;Fri, 21 Apr 2023 16:09:23 GMT&quot;, &quot;accuracy&quot;: 9, &quot;is_gcp&quot;: false, &quot;movie_or_book&quot;: &quot;Ratatouille&quot; }, { &quot;title&quot;: &quot;Migrate GCP projects across organizations, the gcloud way&quot;, &quot;summary&quot;: &quot;The author provides a detailed guide on how to migrate GCP projects across organizations using the gcloud command-line tool. They cover topics such as identifying the current state of the projects, managing IAM permissions, and handling special cases.&quot;, &quot;url&quot;: &quot;https://medium.com/google-cloud/how-to-migrate-projects-across-organizations-c7e254ab90af?source=rss-b5293b96912f------2&quot;, &quot;publication_date&quot;: &quot;Tue, 18 Apr 2023 13:16:26 GMT&quot;, &quot;accuracy&quot;: 10, &quot;is_gcp&quot;: true, &quot;movie_or_book&quot;: &quot;The Matrix&quot; } ] }</pre><ul><li>Finally, asking an LLM to create JSON or YAML can really speed up your development time, you can create <a href="https://api.rubyonrails.org/v3.1/classes/ActiveRecord/Fixtures.html">fixtures</a> for your blog or app, or you can just use a computer to further process an imperfect, half-processed entity.</li></ul><h3>Conclusions</h3>LLMs are a really powerful tool to read long quantities of text, summarize them and classify them based on your tastes.It can provide structured output (eg JSON) which you can in turn parse and use to populate a DB and an app. This way, a recommendation engine for your favourite articles (eg GCP articles, sorted by date of accuracy ) becomes easy and fun to build!<h4>Next Steps</h4>How could I extend this project?<figure><img alt="" src="https://cdn-images-1.medium.com/max/949/0*uCKwk5OiCddNrioH" /></figure><ol><li>Add a workflow, possibly with <a href="https://cloud.google.com/workflows">Cloud Workflows</a>. Iterate until I’m happy with the quality of the outputted JSON.</li><li>Use pre-vetted JSON to populate an easy-peasy Node.JS app, and run it on <a href="https://cloud.google.com/run">Cloud Run</a>.</li><li>Change the code to create the “Morning list of articles for Riccardo to read”, by pulling A LOT of articles and do queries by keyword (in this demo it’s ‘GCP’ but could also bi ‘Pistacchio’ or ‘Politics’).</li><li>sed s/keyword/embeddings/g to make it able to do semantic search.</li></ol><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=02fe9d30475a" width="1" height="1" alt=""><hr><a href="https://blog.devops.dev/parse-medium-articles-with-genai-and-add-some-fun-02fe9d30475a">Insights on Medium articles with GenAI and Ruby!</a> was originally published in <a href="https://blog.devops.dev">DevOps.dev</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.

Author

Link

Published date

Image url

Feed url

Guid

Hidden blurb

--- !ruby/object:Feedjira::Parser::RSSEntry
title: Insights on Medium articles with GenAI and Ruby!
url: https://blog.devops.dev/parse-medium-articles-with-genai-and-add-some-fun-02fe9d30475a?source=rss-b5293b96912f------2
author: Riccardo Carlesso
categories:
- vertex-ai
- google-cloud-platform
- medium
- ruby
- genai
published: 2023-11-16 13:54:40.000000000 Z
entry_id: !ruby/object:Feedjira::Parser::GloballyUniqueIdentifier
 is_perma_link: 'false'
 guid: https://medium.com/p/02fe9d30475a
carlessian_info:
 news_filer_version: 2
 newspaper: Riccardo Carlesso - Medium
 macro_region: Blogs
rss_fields:
- title
- url
- author
- categories
- published
- entry_id
- content
content: "<h3>Automated insights on Medium articles with GenAI and Ruby!</h3>A
 few months ago my manager asked me to ramp up on GenAI. That was one of the best
 work days of my life! Google offers a lot of toys to play with, and it’s hard not
 to have fun. I’m fortunate enough to have a job I really love! But enough with my
 dramatic style, since <a href=\"https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/api-quickstart\">Vertex
 AI </a>Palm API seems to think i’m a bit too Italian, look:<figure><img
 alt=\"\" src=\"https://cdn-images-1.medium.com/max/1024/1*wnUNrC0-Mg0j5Hef0kJXqQ.png\"
 /><figcaption>“Riccardo has a flair for the dramatic”: Palm API
 seems to have a sense of humour!</figcaption></figure>This article will illustrate
 how to crawl someone’s Medium page, and use GenAI to build a JSON file which has
 a mix of the crawled information and some new information which Vertex AI’s text-bison
 was able to infer for us.<h4>The code</h4>TL;DR If you want
 to delve into the code: <a href=\"https://github.com/palladius/genai-googlecloud-scripts/tree/main/03-ruby-medium-article-slurper\">https://github.com/palladius/genai-googlecloud-scripts/tree/main/03-ruby-medium-article-slurper</a><figure><img
 alt=\"\" src=\"https://cdn-images-1.medium.com/max/949/0*Nq7afPB21pYiBnx5\" /><figcaption>A
 Ruby crawler to find insights on Medium Articles</figcaption></figure>A Ruby
 crawler to find insights on Medium ArticlesMy script is simple: you give it a Medium
 handle (mine is “palladiusbonton”) and it will do two things:<ol><li>Parse
 the XML RSS feed from Medium (thanks for making it curl-able!) which provides
 a list of the latest (10?) articles by that person. [<a href=\"https://github.com/palladius/genai-googlecloud-scripts/blob/main/03-ruby-medium-article-slurper/inputs/medium-feed.palladiusbonton.xml\">example</a>]</li><li>Extract
 some significant fields (title, body, date, keywords, ..) [<a href=\"https://github.com/palladius/genai-googlecloud-scripts/blob/main/03-ruby-medium-article-slurper/inputs/medium-latest-articles.palladiusbonton.txt\">example</a>]</li><li>Paste
 those at the end of a convoluted <a href=\"https://github.com/palladius/genai-googlecloud-scripts/blob/main/03-ruby-medium-article-slurper/main.rb#L34-L93\">GenAI prompt</a>.</li><li>Call
 the Vertex AI Palm API to get an answer. [<a href=\"https://github.com/palladius/genai-googlecloud-scripts/blob/main/03-ruby-medium-article-slurper/outputs/medium-latest-articles.palladiusbonton.txt.json\">sample</a>]</li></ol>What
 am I trying to prove? I’m trying to use an LLM for a few things:<ul><li>Data
 Scraping. I can do this also in a deterministic way (with <a href=\"https://nokogiri.org/index.html\">nokogiri</a>).</li><li>Classification.
 I’m really playing with this a lot. I ask the system to rate articles
 from 1 to 10, and to tell whether they’re Google Cloud articles or not. I also ask
 it to infer author’s nationality and favorite languages.</li><li>Summarization.
 I get a small summary of each article, which is super useful.</li></ul><h4>Two prompts:
 Text and JSON</h4>This morning, my prompt looked like this:<pre>Prompt =
 &lt;&lt;-END_OF_PROMPT Provide a summary for each of the following articles. *
 Please write about the topics, the style, and rate the article from 1 to 10 in terms
 of accuracy or professionalism. * Please also tell me, for each article, whether
 it talks about Google Cloud. * Can you guess the nationality of the person (or
 geographic context of the article itself) writing all of these articles? * If
 you can find any typos or visible mistakes, please write them down. -- #{Medium
 content will be pasted here} END_OF_PROMPT</pre>My colleague Marc C has
 showed me that genAI can do better — it can write a JSON for you, with very little
 guidance! Actually it’s probably better than a human to close parenthesis and double
 quotes :)So after 3–4 hours of tinkering I got to this version:<pre> ###
 PROMPT HISTORY # 1.6 16nov23 Removed typos from articles. # 1.5 16nov23 Added
 movie. # 1.4 16nov23 M oved from TXT to JSON! PromptInJson = &lt;&lt;-END_OF_PROMPT You
 are an avid article reader and summarizer. I&#39;m going to provide a list of articles
 for a single person and ask you to do this: 1. For each article, I&#39;m
 going to ask a number of per-article questions 2. Overall, I&#39;m going to ask
 questions about the author. I&#39;m going to provide a JSON structure for
 the questions I ask. If you don&#39;t know some answer, feel free to leave NULL/empty
 values. 1. Per-article: * Please write about the topics, the style,
 and rate the article from 1 to 10 in terms of accuracy or professionalism. *
 Please also tell me, for each article, whether it talks about Google Cloud. *
 For each article, capture the original title and please produce a short 300-500-character
 summary. * What existing movie or book would this article remind you the most
 of? Try a guess, use your fantasy. 2. Overall (author): * Extract
 name and surname * Can you guess the nationality of the person (or geographic
 context of the article itself) writing all of these articles? * Please describe
 this author style. Is it professional or more personal? Terse or verbose? .. *
 Does this author prefer a certain language? In which language are their code snippets
 (if any)? * If you can find any typos or recurring mistakes in any article, please
 write them here. Please provide the output in a `JSON` file as an array of
 answer per article, like this: { &quot;prompt_version&quot;: &quot;1.6a&quot;,
 // do NOT change this, take verbatim &quot;author_name&quot;: &quot;&quot;,
 // name and surname of the author &quot;author_nationality&quot;: &quot;&quot;,
 // nationality here &quot;author_style&quot;: &quot;&quot;, // overall author
 style: is it professional or more personal? Terse or verbose? .. &quot;author_favorite_languages&quot;:
 &quot;&quot;, // which languages does the author use? Pascal? C++? Python? Java?
 Usa comma separated for the list. &quot;typos&quot;: [{ // array of mistakes
 or typos &quot;current&quot;: &quot;xxx&quot;, // typo or mistake 
 \ &quot;correct&quot;: &quot;yyy&quot;, // }], &quot;articles_feedback&quot;:
 [ // article 1 { &quot;title&quot;: &quot;&quot;,
 \ // This should be the ORIGINAL article title, you should be able to extract
 it from the TITLE XML part, like &quot;&lt;title&gt;&lt;![CDATA[What is toilet papers
 right side?]]&gt;&lt;/title&gt;&quot; &quot;summary&quot;: &quot;...&quot;,
 \ // This should be the article summary produced by you. &quot;publication_date&quot;:
 &quot;&quot; // This should be provided to you in input &quot;accuracy&quot;:
 XXX, // Integer 1 to 10 &quot;is_gcp&quot;: false, // boolean,
 true of false &quot;movie_or_book&quot;: &quot;&quot; // string, a
 book or film this article content reminds you of. ] }, 
 \ // Article 2, and so on.. ] } Make **ABSOLUTELY SURE**
 the result is valid JSON or I&#39;ll have to drop the result. Here are the
 articles: -- END_OF_PROMPT</pre>I’ve been playing around with the
 structure a lot: what pertains to the single article, what with the global part
 (author identity / style)?<h4>Hey, we want some action!</h4>Sure, here you
 are! Let’s look setp by step the output from the Ruby crawler and
 comment on it:<figure><img alt=\"\" src=\"https://cdn-images-1.medium.com/max/949/0*k25ajaAGq6ruaS2O\"
 /><figcaption>Typical Ruby crawler, in Nokogiri</figcaption></figure><h4>Phase 1 — download
 the XML.</h4>First, curl this XML: <a href=\"https://medium.com/feed/@palladiusbonton\">https://medium.com/feed/@palladiusbonton</a> .
 Results looks like this:<figure><img alt=\"\" src=\"https://cdn-images-1.medium.com/max/608/1*1TOI6AooifViF0eP-0YbkA.png\"
 /><figcaption>Zooming on a single article (XML RSS Feed)</figcaption></figure>XML
 looks like this, so far so good.<h4>Phase 2 — psarse the XML wityh Nokogiri</h4>For
 each item, I scrape things I care about. Probably I could have it just infer automatically,
 but it’s little effort, so why not! I’t just about finding the fields you care about:<ul><li>iterate
 through item</li><li>Per item, extract what you want: title , dc:creator, link ,
 pubDate , ..</li></ul>The heavy lifting is done by <a href=\"https://nokogiri.org/index.html\">nokogiri</a>,
 which is a real \U0001F48E gem among Ruby gems; no, it’s not a
 repetition, it’s just a double gem. Pythonists and Javascriptists, I wish you had
 it in your language too \U0001F61C<pre># Code: https://github.com/palladius/genai-googlecloud-scripts/blob/main/03-ruby-medium-article-slurper/main.rb#L120-L141 File.open(genai_input_filename,
 &#39;w&#39;) do |file| # file.write(&quot;your text&quot;) } ## Version
 2: Scrape more important metadatsa docSM.xpath(&quot;//item&quot;).each_with_index
 do |node,ix| # Article file.writeln &quot;\\n====== Article #{ix+1}
 =====&quot; title = node.xpath(&quot;title&quot;).inner_text 
 \ creator = node.xpath(&quot;dc:creator&quot;).inner_text url
 = node.xpath(&quot;link&quot;).inner_text pubDate = node.xpath(&quot;pubDate&quot;).inner_text 
 \ categories = node.xpath(&quot;category&quot;).map{|c| c.inner_text}
 \ # there&#39;s many, eg: [&quot;cycling&quot;, &quot;google-forms&quot;, &quot;data-studio&quot;,
 &quot;pivot&quot;, &quot;google-sheets&quot;] article_content = ActionView::Base.full_sanitizer.sanitize(node.inner_text) 
 \ file.writeln &quot;* Title: &#39;#{title}&#39;&quot; file.writeln
 &quot;* Author: &#39;#{creator}&#39;&quot; file.writeln &quot;* URL:
 &#39;#{url}&#39;&quot; file.writeln &quot;* PublicationDate: &#39;#{pubDate}&#39;&quot; 
 \ file.writeln &quot;* Categories: #{categories.join(&#39;, &#39;)}&quot; 
 \ file.writeln &quot;&quot; file.writeln article_content 
 \ end end</pre>Result of this part of the code looks like this:<pre>======
 Article 3 ===== * Title: &#39;Migrate GCP projects across organizations, the
 gcloud way&#39; * Author: &#39;Riccardo Carlesso&#39; * URL: &#39;https://medium.com/google-cloud/how-to-migrate-projects-across-organizations-c7e254ab90af?source=rss-b5293b96912f------2&#39; *
 PublicationDate: &#39;Tue, 18 Apr 2023 13:16:26 GMT&#39; * Categories: gcp-security-operations,
 google-cloud-platform, migration [...] Nel
 mezzo del cammin di nostra vita, mi ritrovai per una selva oscura, ché la diritta
 via era smarrita”— Dante Alighieri(*), Divine Comedy(*) the Italian version of Shakespeare,
 </pre><h4>Phase 3 — Get LLM to infer information and spark some humour</h4>If
 this was a deterministic program we could just end here: I have all the info, I’m
 basically translating from XML to JSON. Not majorly useful.I’ll use GenAI
 to add some color here:<ul><li>Infer author nationality,
 favorite languages and writing style.</li><li>Per article,
 write a small summary, infer accuracy and if it’s a GCP article or not. There you
 go, we have a rudimental QA tester and a classifier! — wOOOt! \U0001F62E</li></ul>For
 my buddy Romin I get this:<pre>{ &quot;prompt_version&quot;: &quot;1.6b&quot;, 
 \ &quot;author_name&quot;: &quot;Romin Irani&quot;, &quot;author_nationality&quot;:
 null, &quot;author_style&quot;: &quot;verbose, professional&quot;, &quot;author_favorite_languages&quot;:
 &quot;Python, Java&quot;, &quot;typos&quot;: [ { &quot;current&quot;:
 &quot;criterias&quot;, &quot;correct&quot;: &quot;criteria&quot; 
 \ } ], &quot;articles_feedback&quot;: [ { &quot;title&quot;:
 &quot;Integrating langchain4j and PaLM 2 Chat Bison Model&quot;, &quot;summary&quot;:
 &quot;The article describes how to integrate Langchain4j and PaLM 2 Chat Bison Model.
 It provides detailed instructions on how to set up the environment, create a Google
 Cloud Function, and deploy the model. The article also includes a code sample and
 a link to the GitHub repository.&quot;, &quot;url&quot;: &quot;https://medium.com/google-cloud/integrating-langchain4j-and-palm-2-chat-bison-model-a684cefd67af?source=rss-802a4d428d95------2&quot;, 
 \ &quot;publication_date&quot;: &quot;Mon, 06 Nov 2023 11:01:02 GMT&quot;, 
 \ &quot;accuracy&quot;: 9, &quot;is_gcp&quot;: true, 
 \ &quot;movie_or_book&quot;: null }, { &quot;title&quot;:
 &quot;Google Cloud Platform Technology Nuggets \\u2014 October 15\\u201331, 2023
 Edition&quot;, &quot;summary&quot;: &quot;The article provides a
 summary of Google Cloud Platform Technology Nuggets for the period of October 15-31,
 2023. It covers topics such as infrastructure, containers and Kubernetes, identity
 and security, networking, machine learning, storage, databases and data analytics,
 and developers and practitioners.&quot;, &quot;url&quot;: &quot;https://medium.com/google-cloud/google-cloud-platform-technology-nuggets-october-15-31-2023-edition-4d5ea0689e30?source=rss-802a4d428d95------2&quot;, 
 \ &quot;publication_date&quot;: &quot;Tue, 31 Oct 2023 10:22:07 GMT&quot;, 
 \ &quot;accuracy&quot;: 8, &quot;is_gcp&quot;: true, 
 \ &quot;movie_or_book&quot;: null }, { &quot;title&quot;:
 &quot;Develop a FlutterFlow App powered by Vertex AI PaLM 2 Integration&quot;, 
 \ &quot;summary&quot;: &quot;The article describes how to integrate FlutterFlow
 with Vertex AI PaLM 2. It provides a step-by-step guide on how to set up the environment,
 create a Google Cloud Function, and deploy the model. The article also includes
 a code sample and a link to the GitHub repository.&quot;, &quot;url&quot;:
 &quot;https://medium.com/google-cloud/flutterflow-and-vertex-ai-palm-2-integration-14c137e83053?source=rss-802a4d428d95------2&quot;, 
 \ &quot;publication_date&quot;: &quot;Thu, 26 Oct 2023 09:37:11 GMT&quot;, 
 \ &quot;accuracy&quot;: 8, &quot;is_gcp&quot;: true, 
 \ &quot;movie_or_book&quot;: null } ] }</pre>For
 my buddy Guillaume I get this:<pre>{ &quot;prompt_version&quot;: &quot;1.6b&quot;, 
 \ &quot;author_name&quot;: &quot;Guillaume Laforge&quot;, &quot;author_nationality&quot;:
 &quot;French&quot;, &quot;author_style&quot;: &quot;Guillaume Laforge&#39;s
 writing style can be described as professional, informative, and engaging. He often
 writes about technology, open-source software, and programming.&quot;, &quot;author_favorite_languages&quot;:
 &quot;Java, Python&quot;, &quot;articles_feedback&quot;: [ { 
 \ &quot;title&quot;: &quot;Tech Watch #4 \\u2014 October, 27, 2023&quot;, 
 \ &quot;summary&quot;: &quot;The article provides a summary of the latest
 developments in the field of artificial intelligence (AI). It covers topics such
 as the use of LLMs in vector embeddings, the scheduling of PostgreSQL tasks with
 pg_cron, and the creation of maps with Protomaps. The article also includes links
 to the original sources for further reading.&quot;, &quot;url&quot;:
 &quot;https://glaforge.medium.com/tech-watch-4-october-27-2023-d48a1449eeb0?source=rss-431147437aeb------2&quot;, 
 \ &quot;publication_date&quot;: &quot;Fri, 27 Oct 2023 15:04:58 GMT&quot;, 
 \ &quot;accuracy&quot;: 8, &quot;is_gcp&quot;: false, 
 \ &quot;movie_or_book&quot;: &quot;The Matrix&quot; }, 
 \ { &quot;title&quot;: &quot;Tech Watch #3 \\u2014 October,
 20, 2023&quot;, &quot;summary&quot;: &quot;The article provides a
 summary of the latest developments in the field of technology. It covers topics
 such as the use of Groovy in 2023, the state of WebAssembly in 2023, and the use
 of large language models (LLMs) to solve logic problems. The article also includes
 links to the original sources for further reading.&quot;, &quot;url&quot;:
 &quot;https://glaforge.medium.com/tech-watch-3-october-20-2023-11a70245017d?source=rss-431147437aeb------2&quot;, 
 \ &quot;publication_date&quot;: &quot;Fri, 20 Oct 2023 19:49:34 GMT&quot;, 
 \ &quot;accuracy&quot;: 9, &quot;is_gcp&quot;: false, 
 \ &quot;movie_or_book&quot;: &quot;2001: A Space Odyssey&quot; }, 
 \ { &quot;title&quot;: &quot;Client-side consumption of a rate-limited
 API in Java&quot;, &quot;summary&quot;: &quot;The article discusses
 different approaches for consuming rate-limited APIs in Java. It covers topics such
 as using exponential backoff and jitter, scheduled execution, and using the Bucket4J
 library. The article also includes code examples for each approach.&quot;, &quot;url&quot;:
 &quot;https://glaforge.medium.com/client-side-consumption-of-a-rate-limited-api-in-java-9fbf08673791?source=rss-431147437aeb------2&quot;, 
 \ &quot;publication_date&quot;: &quot;Mon, 02 Oct 2023 00:00:53 GMT&quot;, 
 \ &quot;accuracy&quot;: 10, &quot;is_gcp&quot;: false, 
 \ &quot;movie_or_book&quot;: &quot;The Imitation Game&quot; }, 
 \ { &quot;title&quot;: &quot;Tech Watch #1 \\u2014 Sept 29,
 2023&quot;, &quot;summary&quot;: &quot;The article provides a summary
 of the latest developments in the field of technology. It covers topics such as
 observability-driven development for LLMs, rebuilding LLM documentation chatbots,
 container security, and the future of databases. The article also includes links
 to the original sources for further reading.&quot;, &quot;url&quot;:
 &quot;https://glaforge.medium.com/tech-watch-1-sept-29-2023-2ac0a3a5016c?source=rss-431147437aeb------2&quot;, 
 \ &quot;publication_date&quot;: &quot;Fri, 29 Sep 2023 00:00:11 GMT&quot;, 
 \ &quot;accuracy&quot;: 7, &quot;is_gcp&quot;: false, 
 \ &quot;movie_or_book&quot;: &quot;Minority Report&quot; } 
 \ ] }</pre>For myself I get something like this:<pre>{ &quot;prompt_version&quot;:
 &quot;1.6b&quot;, &quot;author_name&quot;: &quot;Riccardo Carlesso&quot;, 
 \ &quot;author_nationality&quot;: &quot;Italian&quot;, &quot;author_style&quot;:
 &quot;Verbose, uses personal anecdotes.&quot;, &quot;author_favorite_languages&quot;:
 &quot;Ruby&quot;, &quot;articles_feedback&quot;: [ { &quot;title&quot;:
 &quot;What is toilet paper\\u2019s right side?&quot;, &quot;summary&quot;:
 &quot;The author discusses the question of which side of toilet paper is the \\u201cright\\u201d
 side. They explore the different opinions on this topic and provide their own thoughts
 on the matter. Ultimately, they conclude that there is no definitive answer to this
 question and that it is up to each individual to decide which side they prefer.&quot;, 
 \ &quot;url&quot;: &quot;https://medium.com/@palladiusbonton/what-is-toilet-papers-right-side-8da0504d6d0b?source=rss-b5293b96912f------2&quot;, 
 \ &quot;publication_date&quot;: &quot;Tue, 08 Aug 2023 16:37:20 GMT&quot;, 
 \ &quot;accuracy&quot;: 8, &quot;is_gcp&quot;: false, 
 \ &quot;movie_or_book&quot;: null }, { &quot;title&quot;:
 &quot;Spaghetti Bolognese don\\u2019t exist!!1!&quot;, &quot;summary&quot;:
 &quot;The author discusses the common misconception that spaghetti Bolognese is
 an Italian dish. They explain that this dish is actually not from Italy and that
 it is not considered to be a traditional Italian dish. They also provide some tips
 on how to make a more authentic Italian pasta dish.&quot;, &quot;url&quot;:
 &quot;https://medium.com/@palladiusbonton/spaghetti-bolognese-dont-exist-1-2088d85909dd?source=rss-b5293b96912f------2&quot;, 
 \ &quot;publication_date&quot;: &quot;Fri, 21 Apr 2023 16:09:23 GMT&quot;, 
 \ &quot;accuracy&quot;: 9, &quot;is_gcp&quot;: false, 
 \ &quot;movie_or_book&quot;: null }, { &quot;title&quot;:
 &quot;Migrate GCP projects across organizations, the gcloud way&quot;, &quot;summary&quot;:
 &quot;The author provides a detailed guide on how to migrate GCP projects across
 organizations using the gcloud command-line tool. They cover everything from setting
 up the necessary permissions to executing the migration. This article is a valuable
 resource for anyone who needs to migrate GCP projects across organizations.&quot;, 
 \ &quot;url&quot;: &quot;https://medium.com/google-cloud/how-to-migrate-projects-across-organizations-c7e254ab90af?source=rss-b5293b96912f------2&quot;, 
 \ &quot;publication_date&quot;: &quot;Tue, 18 Apr 2023 13:16:26 GMT&quot;, 
 \ &quot;accuracy&quot;: 10, &quot;is_gcp&quot;: true, 
 \ &quot;movie_or_book&quot;: null }]}</pre><h4>Lessons learnt</h4>Today
 I learnt a few things:<ul><li>For the first time the token limitation
 was visible to me. Palm API’s text-bisonmodel has a 32k-token limit, and what I
 didn’t know is that it seems shared between input and output. If I increase the
 input size, this diminishes the output size (still to be confirmed, for the moment
 it’s just a hunch). For this reason I reduce my input from 32k to 22k. To see Token
 Count, Google gives you a <a href=\"https://cloud.google.com/vertex-ai/docs/generative-ai/get-token-count?hl=en\">nice
 API to calculate it</a> (thanks Guillaume). You can see this very well from the
 API return JSON (note the sum of thetotalTokens here is exactly the maximum, 8192):</li></ul><pre>&quot;metadata&quot;:
 { &quot;tokenMetadata&quot;: { &quot;outputTokenCount&quot;: { 
 \ &quot;totalTokens&quot;: 549, &quot;totalBillableCharacters&quot;:
 1312 }, &quot;inputTokenCount&quot;: { &quot;totalBillableCharacters&quot;:
 20713, &quot;totalTokens&quot;: 7643 } } }</pre><ul><li>Prompting
 is a (long) fine-tuning feedback loop: you try something out, and
 after a few answers you realize it doesn’t work, so you try to ‘bribe’ your model
 saying to “please do something as it’s very important to you”. Example: the movie
 or book is always empty, it’s probably a stretch for a 0.1 temperature API invocation.
 So I change “What existing movie or book would this article remind you the most
 of? Try a guess, use your fantasy” by adding “Please do NOT leave this
 null! It’s just for fun. yet its very important to me”. Note that doesn’t fix — output
 gets “None” instead of null, which is fun. But read on…</li><li>Temperature
 is an important parameter. When tasked to infer a title name for my articles,
 it would refuse, until I raised the temperature from 0.1 to 0.3. Now I get a curious
 result: my films are finally there! Wait — Ratatouille,
 seriously?</li></ul><pre>{ &quot;prompt_version&quot;: &quot;1.6b&quot;, 
 \ &quot;author_name&quot;: &quot;Riccardo Carlesso&quot;, &quot;author_nationality&quot;:
 &quot;Italian&quot;, &quot;author_style&quot;: &quot;Verbose, uses humor
 and personal anecdotes. Seems to prefer Ruby on Rails.&quot;, &quot;author_favorite_languages&quot;:
 &quot;Ruby&quot;, &quot;typos&quot;: [ { &quot;current&quot;:
 &quot;cis-centralis /pendens&quot;, &quot;correct&quot;: &quot;cis-centralis/pendens&quot; 
 \ }, { &quot;current&quot;: &quot;trans-centralis/mur\\u00e0lis&quot;, 
 \ &quot;correct&quot;: &quot;trans-centralis/muralis&quot; }, 
 \ { &quot;current&quot;: &quot;spaghetti Bolognese don\\u2019t&quot;, 
 \ &quot;correct&quot;: &quot;Spaghetti Bolognese doesn&#39;t&quot; 
 \ } ], &quot;articles_feedback&quot;: [ { &quot;title&quot;:
 &quot;What is toilet paper\\u2019s right side?&quot;, &quot;summary&quot;:
 &quot;The author discusses the great \\&quot;toilet paper orientation debate\\&quot;
 and shares their own experiences and opinions on the matter, ultimately concluding
 that there is no one right answer.&quot;, &quot;url&quot;: &quot;https://medium.com/@palladiusbonton/what-is-toilet-papers-right-side-8da0504d6d0b?source=rss-b5293b96912f------2&quot;, 
 \ &quot;publication_date&quot;: &quot;Tue, 08 Aug 2023 16:37:20 GMT&quot;, 
 \ &quot;accuracy&quot;: 8, &quot;is_gcp&quot;: false, 
 \ &quot;movie_or_book&quot;: &quot;The Big Lebowski&quot; }, 
 \ { &quot;title&quot;: &quot;Spaghetti Bolognese don\\u2019t
 exist!!1!&quot;, &quot;summary&quot;: &quot;The author argues that
 the popular dish \\&quot;Spaghetti Bolognese\\&quot; does not exist in Italy and
 is considered an \\&quot;imaginary dish\\&quot; by Italians. They explain that the
 traditional Italian dish is called \\&quot;rag\\u00f9 alla bolognese\\&quot; and
 is typically served with tagliatelle or other types of pasta, not spaghetti.&quot;, 
 \ &quot;url&quot;: &quot;https://medium.com/@palladiusbonton/spaghetti-bolognese-dont-exist-1-2088d85909dd?source=rss-b5293b96912f------2&quot;, 
 \ &quot;publication_date&quot;: &quot;Fri, 21 Apr 2023 16:09:23 GMT&quot;, 
 \ &quot;accuracy&quot;: 9, &quot;is_gcp&quot;: false, 
 \ &quot;movie_or_book&quot;: &quot;Ratatouille&quot; }, 
 \ { &quot;title&quot;: &quot;Migrate GCP projects across organizations,
 the gcloud way&quot;, &quot;summary&quot;: &quot;The author provides
 a detailed guide on how to migrate GCP projects across organizations using the gcloud
 command-line tool. They cover topics such as identifying the current state of the
 projects, managing IAM permissions, and handling special cases.&quot;, &quot;url&quot;:
 &quot;https://medium.com/google-cloud/how-to-migrate-projects-across-organizations-c7e254ab90af?source=rss-b5293b96912f------2&quot;, 
 \ &quot;publication_date&quot;: &quot;Tue, 18 Apr 2023 13:16:26 GMT&quot;, 
 \ &quot;accuracy&quot;: 10, &quot;is_gcp&quot;: true, 
 \ &quot;movie_or_book&quot;: &quot;The Matrix&quot; } ] }</pre><ul><li>Finally,
 asking an LLM to create JSON or YAML can really speed up your development time,
 you can create <a href=\"https://api.rubyonrails.org/v3.1/classes/ActiveRecord/Fixtures.html\">fixtures</a>
 for your blog or app, or you can just use a computer to further process an imperfect,
 half-processed entity.</li></ul><h3>Conclusions</h3>LLMs are a really powerful
 tool to read long quantities of text, summarize them and classify them based on
 your tastes.It can provide structured output (eg JSON) which you can in turn
 parse and use to populate a DB and an app. This way, a recommendation engine
 for your favourite articles (eg GCP articles, sorted by date of accuracy ) becomes
 easy and fun to build!<h4>Next Steps</h4>How could I extend this project?<figure><img
 alt=\"\" src=\"https://cdn-images-1.medium.com/max/949/0*uCKwk5OiCddNrioH\" /></figure><ol><li>Add
 a workflow, possibly with <a href=\"https://cloud.google.com/workflows\">Cloud
 Workflows</a>. Iterate until I’m happy with the quality of the outputted
 JSON.</li><li>Use pre-vetted JSON to populate an easy-peasy Node.JS app, and run
 it on <a href=\"https://cloud.google.com/run\">Cloud Run</a>.</li><li>Change the
 code to create the “Morning list of articles for Riccardo to read”, by pulling A
 LOT of articles and do queries by keyword (in this demo it’s ‘GCP’
 but could also bi ‘Pistacchio’ or ‘Politics’).</li><li>sed s/keyword/embeddings/g
 to make it able to do semantic search.</li></ol><img src=\"https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=02fe9d30475a\"
 width=\"1\" height=\"1\" alt=\"\"><hr><a href=\"https://blog.devops.dev/parse-medium-articles-with-genai-and-add-some-fun-02fe9d30475a\">Insights
 on Medium articles with GenAI and Ruby!</a> was originally published in <a href=\"https://blog.devops.dev\">DevOps.dev</a>
 on Medium, where people are continuing the conversation by highlighting and responding
 to this story."

Language

Active

Ricc internal notes

Ricc source

Show this article Back to articles