跳至主要内容

2025-04-03-rising

  • 精選方式: RISING

討論重點

以下是25篇文章的條列式重點總結,附上對應錨點連結與逐條細節:


#1 Sonnet Computer Use in very Underrated

  1. 技術優勢與市場低估
    • Sonnet電腦應用表現優異但未獲市場重視,預期將爆發
  2. 實際應用案例
    • Apply Hero(熱門工具)、Manus(爆紅演示)佐證實用性
  3. 生態整合
    • Vercel AI SDK、BrowserUse商業化成功案例
  4. 未來趨勢
    • 「網頁+電腦代理」應用將湧現,Sonnet扮演關鍵角色

#2 I regret buying claude for 1 year

  1. 強烈負評
    • 對Claude 3.7版本極度不滿,用詞激烈(如"fucking shitty")

#3 I blew $417 on Claude Code

  1. 開發成果
    • 成功建構文字遊戲LetterLinks(含每日挑戰、排行榜)
  2. AI優勢
    • 快速生成代碼、部分調試高效
  3. 痛點
    • 上下文限制、虛假解決方案、成本非線性增長
  4. 成本效益
    • $417 vs. 人力$2-3K,但耗時測試

#4 Migrated to Gemini

  1. 轉換動機
    • 不滿Claude API上下文限制
  2. Gemini優勢
    • 100萬token、無過載錯誤、開發wolfai.us更快

#5 Claude Sonnet is the undisputed champion of OCR

  1. 性能排名
    • Claude > 開源模型(Qwen/Mistral)> GPT-4o
  2. 測試結論
    • GPT-4o表現不如預期,附影片佐證

#6 Pour one out for my Claude subscription

  1. 情感糾結
    • 博士生因Gemini升級棄用Claude,形容如告別實驗室夥伴
  2. 理性選擇
    • Gemini現階段更符合學術需求

#7 Claude 3.5 cracked benchmark

  1. 代理表現差異
    • BasicAgent:Claude 3.5勝(20% vs. OpenAI 3%)
    • IterativeAgent:OpenAI反超(25% vs. Claude 16%)
  2. 關鍵因素
    • 提示設計影響結果,Claude易過早結束任務

  1. 工具特色
    • 免費開源(GPL)、語言伺服器技術、支援大型程式庫
  2. 多平台整合
    • 可搭配Claude Desktop或Gemini API

#9 Thanks then! Take care...

  1. 投訴指南
    • 需提供輸入/輸出詳情,按「拇指向下」反饋
  2. 用戶問題
    • 功能異常、系統提示矛盾

#10 Tell an LLM It Has an iPhone

  1. 實驗設計
    • 需對照組驗證提示效果(如單獨角色/情境測試)
  2. 機制分析
    • 用戶訊息追加可能擴大上下文窗口提升準確性

(因篇幅限制,以下簡要條列標題,完整細節請參照原文)

#11-25 快速摘要

文章核心重點

以下是根據每篇文章標題生成的一句話摘要(條列式輸出):

  1. Sonnet Computer Use in very Underrated

    • 探討Sonnet電腦應用的潛力與當前被市場低估的現狀,並預測其未來爆發性成長。
  2. I regret buying claude for 1 year. It's so shit now

    • 作者強烈批評Claude 3.7版本的性能缺陷,表達對長期訂閱的後悔。
  3. I blew $417 on Claude Code to build a word game. Here's the brutal truth.

    • 分享使用Claude開發文字遊戲的經驗,雖節省人力成本但揭露AI編程的效率與準確性限制。
  4. Migrated to Gemini. Sheeesh, the grass is greener.

    • 作者從Claude轉向Gemini 2.5 Pro,肯定後者在上下文長度與開發效率上的優勢。
  5. Claude Sonnet is the undisputed champion of OCR

    • 實測顯示Claude在OCR任務中表現優於GPT-4o及開源模型,成為該領域首選。
  6. Pour one out for my Claude subscription... It's not you, it's Gemini (and my PhD).

    • 博士生因Gemini性能提升而取消Claude訂閱,但對後者在學術協助的情感連結表達不捨。
  7. Now we talking INTELLIGENCE EXPLOSION | Claude 3.5 cracked of benchmark!

    • Claude 3.5在基準測試中表現優異,但迭代代理框架設計影響其穩定性。
  8. Fully Featured AI Coding Agent as MCP Server

    • 介紹免費開源工具Serena,強調其程式碼分析能力與多平台支援優勢。
  9. Thanks then! Take care...

    • 用戶討論Claude技術問題的回報流程,並分享服務異常的集體經驗。
  10. What Happens When You Tell an LLM It Has an iPhone Next to It?

    • 探討提示工程中對照組設計的重要性,分析情境設定對AI輸出的影響機制。
  11. Claude 3.7 Sonnet extended thinking is a waste of time (for deep coding) since Monday

    • 用戶質疑Claude 3.7近期在深度編碼任務中出現重複錯誤,可能與系統更新有關。
  12. Claude 3.7 Sonnet is still the best LLM (by far) for frontend development

    • 肯定Claude 3.7在前端開發的領先地位,但指出其與真實工程需求仍存差距。
  13. This is the first time in almost a year that Claude is not the best model

    • 作者承認Gemini 2.5當前性能超越Claude,反映AI模型競爭的快速變動。
  14. Anthropic is giving free API credi for university studen

    • 提供學生開發者申請Anthropic免費API額度的指引連結。
  15. Claude Code was prohibitively expensive for me

    • 批評Claude Code的高昂每小時成本,雖肯定其創造性解題能力但難負擔。
  16. is it just me or has clause 3.5 gone bonkers today?

    • 用戶回報Claude 3.5異常輸出未請求內容,導致額度消耗加速的工作困擾。
  17. This conversation reached i``` maximum length

    • 付費用戶發現近期服務限制觸發異常提前,質疑系統端調整。
  18. What the hell happened?

    • 整理Claude問題回報的標準流程,並附社群對服務異常的簡短確認。
  19. Worked on a lofi platform

    • 分享自學開發的lofi音樂平台專案,結合設計與技術堆疊的實作經驗。
  20. Please be candid; did I just pay $220 for a year of this screensaver, but only at Anthropic's website?

    • (因內容缺失,推測為對Claude訂閱價值的質疑或幽默調侃)
  21. Rate limi```, prompt caching, citations and tools

    • 探討Claude API處理長文本時速率限制、緩存與引用結構化的技術方案。
  22. Did Claude get smarter again?

    • 用戶觀察Claude 3.7近期表現提升,對比其他負評探討模型波動原因。
  23. What's the difference between selecting Claude 3.7 in Perplexity vs using Claude.ai?

目錄

  • [1. Sonnet Computer Use in very Underrated](#1-``` sonnet-computer-use-in-very-underrated
- [2. ```
I regret buying claude for 1 year. It's so shit now
```](#2-```
i-regret-buying-claude-for-1-year-it-s-so-sh)
- [3. ```
I blew $417 on Claude Code to build a word game. Here's the brutal truth.
```](#3-```
i-blew-417-on-claude-code-to-build-a-word-ga)
- [4. ```
Migrated to Gemini. Sheeesh, the grass is greener.
```](#4-```
migrated-to-gemini-sheeesh-the-grass-is-gree)
- [5. ```
Claude Sonnet is the undisputed champion of OCR
```](#5-```
claude-sonnet-is-the-undisputed-champion-of-)
- [6. ```
Pour one out for my Claude subscription... It's not you, it's Gemini (and my PhD).
```](#6-```
pour-one-out-for-my-claude-subscription-it-s)
- [7. ```
Now we talking INTELLIGENCE EXPLOSION | Claude 3.5 cracked of benchmark!
```](#7-```
now-we-talking-intelligence-explosion-|-clau)
- [8. ```
Fully Featured AI Coding Agent as MCP Server
```](#8-```
fully-featured-ai-coding-agent-as-mcp-server)
- [9. ```
Thanks then! Take care...
```](#9-```
thanks-then-take-care-
```)
- [10. ```
What Happens When You Tell an LLM It Has an iPhone Next to It?
```](#10-```
what-happens-when-you-tell-an-llm-it-has-an)
- [11. ```
Claude 3.7 Sonnet extended thinking is a waste of time (for deep coding) since Monday
```](#11-```
claude-3-7-sonnet-extended-thinking-is-a-wa)
- [12. ```
Claude 3.7 Sonnet is still the best LLM (by far) for frontend development
```](#12-```
claude-3-7-sonnet-is-still-the-best-llm-by-)
- [13. ```
This is the first time in almost a year that Claude is not the best model
```](#13-```
this-is-the-first-time-in-almost-a-year-tha)
- [14. Anthropic is giving free API credi``` for university studen```](#14-anthropic-is-giving-free-api-credi```-for-unive)
- [15. ```
Claude Code was prohibitively expensive for me
```](#15-```
claude-code-was-prohibitively-expensive-for)
- [16. ```
is it just me or has clause 3.5 gone bonkers today?
```](#16-```
is-it-just-me-or-has-clause-3-5-gone-bonker)
- [17. ```
This conversation reached i``` maximum length
```](#17-```
this-conversation-reached-i```-maximum-leng)
- [18. ```
What the hell happened?
```](#18-```
what-the-hell-happened-
```)
- [19. ```
Worked on a lofi platform
```](#19-```
worked-on-a-lofi-platform
```)
- [20. ```
Please be candid; did I just pay $220 for a year of this screensaver, but only at Anthropic's website?
```](#20-```
please-be-candid;-did-i-just-pay-220-for-a-)
- [21. ```
Rate limi```, prompt caching, citations and tools
```](#21-```
rate-limi```-prompt-caching-citations-and-t)
- [22. ```
Did Claude get smarter again?
```](#22-```
did-claude-get-smarter-again-
```)
- [23. ```
What's the difference between selecting Claude 3.7 in Perplexity vs using Claude.ai?
```](#23-```
what-s-the-difference-between-selecting-cla)
- [24. ```
Claude Desktop App
```](#24-```
claude-desktop-app
```)
- [25. ```
I'm not having issues?
```](#25-```
i-m-not-having-issues-
```)

---

## 1. ```
Sonnet Computer Use in very Underrated
``` {#1-```
sonnet-computer-use-in-very-underrated
```}

這篇文章的核心討論主題是 **「Sonnet 電腦應用的潛力與當前發展」**,並特別聚焦於以下幾個重點:

1. **Sonnet 的技術優勢與被低估的現狀**
- 作者認為 Sonnet 的電腦應用表現出色("very impressive"),但目前市場尚未充分重視("underrated"),預測將迎來爆發("huge comeback")。

2. **實際應用案例的崛起**
- 提到具體應用如 **Apply Hero**(受歡迎的電腦工具)和 **Manus**(以 Sonnet 為核心的爆紅演示),佐證 Sonnet 的實用性。

3. **生態系統的快速整合**
- 舉例 **Vercel 的 AI SDK**(主流 TypeScript 工具)已整合 Sonnet,反映開發者社群的接納度。
- 開源專案商業化成功案例(如 **BrowserUse** 的盈利模式),暗示 Sonnet 相關應用的商業潛力。

4. **未來預測**
- 文末強調「網頁+電腦應用代理」(web + computer use agen```)將大量湧現,指向 Sonnet 在這一領域的關鍵角色。

**關鍵字延伸解析**:
- 文中反覆出現的拼寫變體「agen```」可能指「agent」(代理),推測與 **AI 代理工具**(如自動化流程、瀏覽器自動化等)相關。
- 「computer use」可能隱含 **本地計算資源調用** 或 **與傳統軟體整合** 的應用場景,有別於純雲端方案。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jptm7p/sonnet_computer_use_in_very_underrated/](https://reddit.com/r/ClaudeAI/comments/1jptm7p/sonnet_computer_use_in_very_underrated/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jptm7p/sonnet_computer_use_in_very_underrated/](https://www.reddit.com/r/ClaudeAI/comments/1jptm7p/sonnet_computer_use_in_very_underrated/)
- **發布時間**: 2025-04-03 00:58:01

### 內容

I dove really deep into browser agen``` for the past month and sonnet computer use is very impressive. I think its currently very underrated, and going to have a huge comeback.

- Very cool computer use applications like Apply Hero are becoming very popular.

- AI SDK by Vercel (most used typescript AI sdk) just integrated web agen``` today.

- BrowserUse rumor is that they are making millions of ARR by just licensing their open source project.

- Manus (the very viral demo) is built mostly on sonnet.

I think so many more web + computer use agen``` are going to popoffverysoon.


### 討論

**評論 1**:

What kind of problems can it solve for a general public everyday PC user?


**評論 2**:

Do they have a dedicated SDK?


---

## 2. ```
I regret buying claude for 1 year. It's so shit now
``` {#2-```
i-regret-buying-claude-for-1-year-it-s-so-sh}

這篇文章的核心討論主題是對「Claude 3.7」版本的強烈負面評價,作者表達了極度不滿的情緒,甚至用誇張的語言(如「fucking shitty」和「kms」)來批評該版本的表現或功能,可能涉及使用體驗、性能問題或其他未明確指出的缺陷。

總結:**對Claude 3.7版本的強烈不滿與批評**。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpk5m3/i_regret_buying_claude_for_1_year_its_so_shit_now/](https://reddit.com/r/ClaudeAI/comments/1jpk5m3/i_regret_buying_claude_for_1_year_its_so_shit_now/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jpk5m3/i_regret_buying_claude_for_1_year_its_so_shit_now/](https://www.reddit.com/r/ClaudeAI/comments/1jpk5m3/i_regret_buying_claude_for_1_year_its_so_shit_now/)
- **發布時間**: 2025-04-02 17:03:19

### 內容

Cluade 3.7 is fucking shitty and is gonna make me kms


### 討論

**評論 1**:

Cancelled last month lol. So many models now, why even pay. Generate with one, bug fix with another, organize with other. Easy.


**評論 2**:

with the absolute insane pace of this industry, why would you lock in with 1 provider for a whole year?


**評論 3**:

I want opus 4.0


**評論 4**:

The cycle of AI releases:

A new Model -> Finally <insert AI> is good, i could do X

just a general thing without any detail

For the next month -> <insert AI> is revolutionary, I could do so many things

fails to explain how

After any other model of any other AI does something slightly better -> I will literally kms because <insert AI> is bad

fails to explain why

Go to step 1 for the new AI.


**評論 5**:

Dare to elaborate what you find shitty about it? I've figured that GPT 4o is better at some tasks such as rephrasing text whereas Claude 3.7 performs much better with coding and calculating (and corn).


---

## 3. ```
I blew $417 on Claude Code to build a word game. Here's the brutal truth.
``` {#3-```
i-blew-417-on-claude-code-to-build-a-word-ga}

这篇文章的核心討論主題是:**作者使用AI助手Claude開發一款類似Scrabble的文字遊戲(LetterLinks)的經驗總結**,重點包括:

1. **開發成果**
- 成功實現基礎遊戲功能(每日挑戰、計分系統、排行榜等)
- 肯定AI在快速生成代碼和部分調試上的實用性

2. **正面經驗**
- AI能根據簡單指令生成有效代碼(如界面設計、功能實現)
- 部分調試過程效率高(精確定位問題)

3. **主要痛點**
- 上下文窗口限制導致後期開發效率下降
- AI頻繁出現「虛假解決方案」需反覆修正
- 成本隨項目複雜度非線性增長
- 完全缺乏自動化測試能力

4. **成本效益評估**
- 相比人力開發節省金錢($417 vs. $2-3K)
- 耗費大量時間成本在測試與迭代上

5. **經驗教訓**
- 提出未來改進方法(模塊化開發、文檔記錄、預算控制)
- 將AI定位為「需要嚴格監督的初級開發者」

本質上是對當前AI編程助手在實際項目中「真實能力邊界」的實證分析,同時反映生產環境中工具與人類協作的典型摩擦。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpddbf/i_blew_417_on_claude_code_to_build_a_word_game/](https://reddit.com/r/ClaudeAI/comments/1jpddbf/i_blew_417_on_claude_code_to_build_a_word_game/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jpddbf/i_blew_417_on_claude_code_to_build_a_word_game/](https://www.reddit.com/r/ClaudeAI/comments/1jpddbf/i_blew_417_on_claude_code_to_build_a_word_game/)
- **發布時間**: 2025-04-02 10:01:59

### 內容

Alright, so a few weeks ago ago I had this idea for a Scrabble-style game and thought "why not try one of these fancy AI coding assistan?" Fast forward through a sh\*t ton of prompting, $417 in Claude credi, and enough coffee to kill a small horse, I've finally got a working game called LetterLinks: https://playletterlinks.com/

The actual game (if you care)

It's basically my take on Scrabble/Wordle with daily challenges:

- Place letter tiles on a board

  - Form words, get poin```

- Daily themes and bonus challenges

- Leaderboards to flex on strangers

The Good Par``` (there were some)

Actually nailed the implementation

I literally started with "make me a scrabble-like game" and somehow Claude understood what I meant. No mockups, no wireframes, just me saying "make the board purple" or "I need a timer" and it spitting out working code. Not gonna lie, that part was pretty sick.

Once I described a feature I wanted - like skill levels that show progress - Claude would run with it.

Ultimately I think the finished result is pretty slick, and while there are some bugs, I'm proud of what Claude and I did together.

Debugging that didn't always completely suck

When stuff broke (which was constant), conversations often went like:

Me: "The orange multiplier badges are showing the wrong number"

Claude: dumps exact code location and fix

This happened often enough to make me not throw my laptop out the window.

The Bad Par``` (oh boy)

Context window is a giant middle finger

Once the codebase hit about 15K lines, Claude basically became that friend who keeps asking you to repeat the story you just told:

Me: "Fix the bug in the theme detection

Claude: "What theme detection?"

Me: "The one we've been working on FOR THE PAST WEEK"

I had to use the /claude compact feature more and more frequently.

The "I found it!" BS

Most irritating phrase ever:

Claude: "I found the issue! It's definitely this line right here."

implemen``` fix

bug still exis```

Claude: "Ah, I see the REAL issue now..."

Rinse and repeat until you're questioning your life choices. Bonus poin``` when Claude confidently "fixes" something and introduces three new bugs.

Cost spiral is real

What really pissed me off was how the cost scaled:

- First week: Built most of the game logic for ~$100

- Last week: One stupid animation fix cost me $20 because Claude needed to re-learn the entire codebase

The biggest "I'm never doing this again but probably will" part

Testing? What testing?

Every. Single. Change. Had to be manually tested by me. Claude can write code all day but can't click a f***ing button to see if it works.

This turned into:

  1. Claude writes code

  2. I test

  3. I report issues

  4. Claude apologizes and tries again

  5. Repeat until I'm considering a career change

Worth it?

For $417? Honestly, yeah, kinda. A decent freelancer would have charged me $2-3K minimum. Also I plan to use this in my business, so it's company money, not mine. But it wasn't the magical experience they sell in the ads.

Think of Claude as that junior dev who sometimes has brilliant ideas but also needs constant supervision and occasionally se``` your project on fire.

Next time I'll:

  1. Split everything into tiny modules from day one

  2. Keep a separate doc with all the architecture decisions

  3. Set a hard budget per feature

  4. Lower my expectations substantially

Anyone else blow their money on AI coding? Did you have better luck, or am I just doing it wrong?


### 討論

**評論 1**:

Gemini 2.5 Pro experimental will solve your context woes


**評論 2**:

Interesting post. I have been experimenting with Claude Code at work. We have a relatively big codebase (a web application that has been developed by more than a dozen developers over the last 6 years). But it's also written in a very professional way, using all the best practices, standards, etc. Claude Code does very well in such an environment. I it fix two dozen papercut bugs (usually smaller issues like icons not displaying correcctly, time displaying in UTC instead of the browser timezone, dropdown menu not showing some items correctly, etc) and it fixed them all pretty much spot-on. Most of them cost less than $1 to fix with Claude Code. I should also mention that we have a professional QA team that logs ticke very well with detailed steps to reproduce, expected resul, actual resul```, etc, which makes it easier for Calude Code to find the right code and fix it well.

I have it currently monitoring our JIRA ticke, and when a ticket with "papercut" label shows up, Claude Code tries to fix it and submi the fix as a PR with all the explanation on how it was thinking. This makes it quite easy for developers to review. It also helps that our CI/CD will run a battery of tes on the proposed code before it ge to a developer for review.

So from my experience, if you have well-written codebase, and you explain in detail what you want it to do, it does well.


**評論 3**:

One thing I forgot to add - feature creep. I would often ask Claude for a relatively simple addition, but it would take it upon ielf to add 5 different things. Sometimes they were actually good ideas, but often they weren't and I'd waste time and money by telling to change things back. I also suspect that, although the game works pretty well afaict, my actual codebase is a bit of a mess with lo of redundancies and abandoned sections (I don't know for sure because I'm not a coder - but just from skimming over the source code it seems like some functions have been repeated in multiple par```).


**評論 4**:

My secret trick is, I use git2text on the codebase. I put everything in Google AI Studio (G pro 2.5), it comment on the code, and have it generate specific to-do lis``` that I paste back in the claude code. Works fantastic.


**評論 5**:

Pretty fun game, I dig it.


---

## 4. ```
Migrated to Gemini. Sheeesh, the grass is greener.
``` {#4-```
migrated-to-gemini-sheeesh-the-grass-is-gree}

這段文字的核心討論主題是:**作者從Claude API轉換到Gemini 2.5 Pro的使用體驗比較與優點分析**,具體重點包括:

1. **轉換動機**
- 不滿Claude API的上下文長度限制(contextual limit)
- 雖成本低於每月20美元,但技術限制影響使用體驗

2. **Gemini 2.5 Pro的優勢**
- 無訊息限制與API過載錯誤
- 100萬token的上下文長度(1M token limit)
- 程式碼生成與更新速度明顯更快
- 成功快速建構本地化聊天機器人(以wolfai.us為例)

3. **開發效率提升**
- 工具切換後工作流順暢(RooCode無阻礙運作)
- 透過試錯法快速完成專案

整體聚焦於API服務的技術限制、成本效益與開發效率的實務比較。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpk827/migrated_to_gemini_sheeesh_the_grass_is_greener/](https://reddit.com/r/ClaudeAI/comments/1jpk827/migrated_to_gemini_sheeesh_the_grass_is_greener/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jpk827/migrated_to_gemini_sheeesh_the_grass_is_greener/](https://www.reddit.com/r/ClaudeAI/comments/1jpk827/migrated_to_gemini_sheeesh_the_grass_is_greener/)
- **發布時間**: 2025-04-02 17:08:35

### 內容

Honestly, i just got tired of the contextual limit via the Claude API. Cost wise, it was a better option than the $20 a month. BUT MAN! Switched my API to Gemini 2.5 Pro and RooCode just hums away building everything with no limit messages or api overload errors.. The 1M token limit is dope, but I've noticed that even the code updates are way faster. Built my localized chatbot with some trial and error incredibly fast.

https://wolfai.us


### 討論

**評論 1**:

The recent clamp downs on the limi``` have forced me to use other models and Gemini 2.5 pro came just in time, it's at LEAST as good as extended claude if not clearly better most of the time. I hated all the previous google models but this one is good tbh.


**評論 2**:

I've switched away from Claude (to ChatGPT 4o) because I'm just getting started with a conversation when I get warnings about chat length. I switched to 4o really out of necessity, and I've been impressed with it. I'm in one chat that's been going on for multiple days with hundreds of responses, and it's still remembering and joking about things I said at the beginning. Claude's become unusuable for me, except for quick hit and run queries. I'm paying $20/month, same as ChatGPT, so makes me wonder why I'm paying when I'm being shut out after low reasonable-usage length cha```?


**評論 3**:

No way to make Gemini get local file access?


**評論 4**:

How do you use it without limitation I’m keep hitting limi```

**評論 5**:

I literally don't believe you. where are all these Gemini shills coming from


---

## 5. ```
Claude Sonnet is the undisputed champion of OCR
``` {#5-```
claude-sonnet-is-the-undisputed-champion-of-}

這段文字的核心討論主題是:**比較不同AI模型(Claude、GPT-4o、開源模型如Qwen和Mistral)在特定使用場景下的性能表現**。作者通過測試得出結論,認為在此用例中Claude明顯優於其他模型,而GPT-4o甚至落後於部分開源競爭對手,並對OpenAI與Anthropic(Claude開發公司)之間的差距表示驚訝。此外,作者附上測試影片連結並邀請讀者反饋。

關鍵要點:
1. **性能排名**:Claude > 開源模型(Qwen、Mistral)> GPT-4o
2. **意外發現**:GPT-4o表現不如預期,甚至不及部分開源替代方案。
3. **測試背景**:作者投入大量時間與資源進行評估,強調結論的可靠性。
4. **開放討論**:邀請社群對結果提出回饋,附上詳細測試影片佐證。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpne7u/claude_sonnet_is_the_undisputed_champion_of_ocr/](https://reddit.com/r/ClaudeAI/comments/1jpne7u/claude_sonnet_is_the_undisputed_champion_of_ocr/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jpne7u/claude_sonnet_is_the_undisputed_champion_of_ocr/](https://www.reddit.com/r/ClaudeAI/comments/1jpne7u/claude_sonnet_is_the_undisputed_champion_of_ocr/)
- **發布時間**: 2025-04-02 20:31:09

### 內容

Hey all, so put a lot of time and burnt a ton of tokens testing this, so hope you all find it useful. TLDR - Claude is the clear winner here, and GPT-4o is behind even opensource competitors like Qwen and Mistral. Very surprised for the gap between openai and anthropic in this use case!

I welcome your feedback...

https://youtu.be/ZTJmjhMjlpM


### 討論

**評論 1**:

When submitting proof of performance, you must include all of the following:

  1. Screensho``` of the output you want to report
  2. The full sequence of promp``` you used that generated the output, if relevant
  3. Whether you were using the FREE web interface, PAID web interface, or the API if relevant

If you fail to do this, your post will either be removed or reassigned appropriate flair.

Please report this post to the moderators if does not include all of the above.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.


**評論 2**:

Very cool analysis. As someone who manages a lot of scientific pdfs, it would also be interesting to see strategies for managing citations as well as plo```/figures (which I guess would need to be clipped and stored).

The semantic layout of html for future LLM ingestion hadnt been on my mind but now that you highlight it, its so obviously important. Thanks!


**評論 3**:

For ocr works much better Gemini 3 27b or mistal for me


**評論 4**:

This is good to know for an OCR project I'm starting next month.

I'm creating an open source PDF converter that adds accessiblity features to an existing non-accessible PDF. My strategy is to reverse-engineer a PDF to a source document format, make it accessible, save as a new PDF, and eval it for readable formatting.

This video is doing something similar and will be very useful to me.


**評論 5**:

Its case by case, in my experience 4o had beaten all other options at extracting and formatting text from an old edition of a thesaurus with complex and varied page layou```.


---

## 6. ```
Pour one out for my Claude subscription... It's not you, it's Gemini (and my PhD).
``` {#6-```
pour-one-out-for-my-claude-subscription-it-s}

這篇文章的核心討論主題是:**作者從長期使用Claude AI轉向Gemini AI的過程與情感糾結**。

具體要點包括:
1. **Claude的優勢**:作者(一名博士生)高度讚賞Claude在學術研究中的表現,認為它像一位「聰明、耐心的實驗室夥伴」,能理解複雜概念與情緒化的表達。
2. **Gemini的初期定位**:最初僅用於簡單查詢任務,而Claude則負責處理高難度問題。
3. **Gemini的升級與轉變**:最新版本的Gemini性能大幅提升,足以取代Claude在作者工作流程中的角色。
4. **情感層面的糾結**:作者雖理性選擇取消Claude訂閱(因Gemini更符合需求),但仍對Claude懷有情感上的不捨,形容這是「帶著沉重心情的告別」。
5. **對Claude未來發展的期待**:文末暗示希望Claude能突破當前限制(「不在籠中」),暗示作者認為其潛力未完全釋放。

總結:文章透過個人經驗,探討AI工具競爭中用戶的理性選擇與情感依附之間的矛盾,同時反映AI技術快速迭代對使用者忠誠度的影響。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jppakr/pour_one_out_for_my_claude_subscription_its_not/](https://reddit.com/r/ClaudeAI/comments/1jppakr/pour_one_out_for_my_claude_subscription_its_not/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jppakr/pour_one_out_for_my_claude_subscription_its_not/](https://www.reddit.com/r/ClaudeAI/comments/1jppakr/pour_one_out_for_my_claude_subscription_its_not/)
- **發布時間**: 2025-04-02 22:01:08

### 內容

Okay, fellow AI wranglers, confession time. For the longest time, Claude was the one. As a PhD student navigating the treacherous waters of research, Claude wasn't just smart; it got me. Frustrated ramblings? Check. Complex concep```? Handled. It was like having a super-intelligent, patient lab partner who never stole my snacks.

I even had a Gemini sub on the side, but 's be real Gemini got the simple stuff, the lookup tasks. My precious Claude credi were reserved for the real brain-busters, the momen where only Claude's uncanny understanding would do.

But then... the latest Gemini stepped up i``` game. Big time. Suddenly, the performance is stellar, and the limitations feel... well, gone from my workflow.

So, with a heavy heart (and a slightly lighter wallet), I'm cancelling my Claude subscription. I know my 22/month won't exactly bankrupt Anthropic, it's a drop in their massive ocean. But man, I'll miss that connection.

Farewell for now, Claude. You were a true friend and a helping hand during some tough research momen```. Here's hoping I can someday come back to a Claude that's not in a cage.


### 討論

**評論 1**:

Yup, you and me both. I'm paying $20/month and can't have a conversation longer than the one I have with the postman? Doesn't feel dependable when I could get shut out at in time in the middle of a working collaboration.


**評論 2**:

What is your workflow? Also am in the midst of doctoral studies but I am mostly using Claude to help me rephrase/format my nonsensical ramblings in professional emails and memos. With a very specific and refined prompt, I am able to get it to sound fairly natural and similar to my writing style. Is Gemini better than Claude for this now? I use NotebookLM mostly to keep track of literature and ask targeted questionsthis is still the best in that context.


**評論 3**:

Quick question, when you cancel your subscription does it kick you out immediately or do you at least get to run out your remaining days since your last renewal?


**評論 4**:

I also cancelled my Claude subscription


**評論 5**:

I continue to use Gemini 2.5 (subscription) and continue to be astounded at just how damn shitty it is. I mean, it can't even build a simple interactive dashboard worth a shit. I just aggregated a whole bunch of data concerning marijuana legalization, and asked Claude to turn it into an interactive dashboard which it did very well.

Gemini gave me... Well... Something. Not quite sure what the f it was. A couple of bizarre geometric shapes with seemingly random numbers written all over it. I have now done probably 15 maybe 20 interactive dashboards over the past couple of days with data I've collected, and literally in every instance Claude produced something far superior.

The latest one I did was simply aggregating the data for the 1993 NBA championship series to exemplify the amazing performance Michael Jordan had. Claude gave me four to five pages to click on, at least six or seven interactive char```, active data.

Gemini gave me one page, one chart, inaccurate data, and the data wouldn't even load.

Then I try to turn a bunch of data into a CSV file and configure it to fit into HubSpot. Claude knocks it out without much of a problem. Gemini 2.5 star spinning in circles and I think smoke started coming out of i electronic years. Kept giving me excuses why It can't do it.

Then I ask some reasonable questions of Claude, And I get pretty decent well thought out well structured answers. Gemini 2.5 proceeds to give me a Wikipedia page for everything. Every damn thing I ask it gives me three or four potential answers with each answer having multiple bullet poin```, it's like it's trying to write a damn textbook.

Then there's the search feature or I guess you can call that, if you like getting inaccurate information on virtually everything. I mean I must be in some parallel universe. Or maybe I'm just unlucky or something. But for virtually every use case I've tried it for it's been garbage.

Claude certainly isn't perfect, but out of all the llms I have subscriptions to (which is all of them) I always end up coming back to Claude for virtually 85% of the junk I do.


---

## 7. ```
Now we talking INTELLIGENCE EXPLOSION | Claude 3.5 cracked of benchmark!
``` {#7-```
now-we-talking-intelligence-explosion-|-clau}

這篇文章的核心討論主題是 **不同AI模型(Claude 3.5 Sonnet vs. OpenAI的o1-high/o3mini-high)在兩種代理(Agent)設定下的性能差異**,並探討其背後原因與改進潛力。具體重點如下:

1. **模型表現對比**:
- **BasicAgent(基礎代理)**:Claude 3.5表現顯著優於其他模型(20%成功率 vs. OpenAI模型的13%或3%)。
- **IterativeAgent(迭代代理)**:OpenAI的o1-high反超(25%),而Claude 3.5降至16%。

2. **關鍵差異分析**:
- Claude 3.5在迭代設定中因無法提前結束任務而表現下滑,而其他模型(如OpenAI)則因被迫逐步執行任務而提升表現。
- 作者推測Claude 3.5的優勢在於「自主決策能力」(agentic),但迭代框架的提示設計(prompt tuning)更適合OpenAI模型。

3. **改進方向**:
- 調整提示設計或代理框架(如阻止Claude 3.5過早結束任務)可能進一步提升其表現。
- 現有基準測試(benchmark)仍有優化空間,尤其是初始設定對結果影響顯著。

**總結**:討論聚焦於不同AI模型在任務執行框架中的適應性差異,並強調提示工程(prompt engineering)和代理設計對性能的關鍵影響。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpubua/now_we_talking_intelligence_explosion_claude_35/](https://reddit.com/r/ClaudeAI/comments/1jpubua/now_we_talking_intelligence_explosion_claude_35/)
- **外部連結**: [https://i.redd.it/upvmyq9shgse1.jpeg](https://i.redd.it/upvmyq9shgse1.jpeg)
- **發布時間**: 2025-04-03 01:25:34

### 內容

Would be interesting to see it with 3.7 since it's a lot more agentic compared to 3.5 and also has thinking. Iteration actually seemed to hamper 3.5 in this case:

https://imgur.com/a/evhqUgu

Here's the paper link btw:

https://cdn.openai.com/papers/22265bac-3191-44e5-b057-7aaacd8e90cd/paperbench.pdf

What's really interesting, from the paper, is Claude 3.5 destroys everyone in the "BasicAgent" set up ~20% to o1-high's 13% or o3mini-high's 3%.

But using their "IterativeAgent" setup which "removes the ability of the models to end the task early and promp models to work in a piecemeal fashion" o1-high se a new record at ~25% but Claude 3.5 drops to 16%.

"We observe that all models apart from Claude 3.5 Sonnet frequently finished early, claiming that they had either finished the entire replication or had faced a problem they couldn't solve"

"We note with Claude 3.5 Sonnet outperforms o1 with Basic Agent but underperforms o1 with IterativeAgent. This suggest that the prompt tuning used for iterative agent is differently suited for OpenAI o-series models. We suspect that a modification to BasicAgent that also preven``` it from ending the task early could lead to Claude 3.5 Sonnet outperforming o1 with IterativeAgent"

Sounds like there are still easy gains in this benchmark just from tweaking initial prompt setup and agent framework.


### 討論

**評論 1**:

Would be interesting to see it with 3.7 since it's a lot more agentic compared to 3.5 and also has thinking. Iteration actually seemed to hamper 3.5 in this case:
https://imgur.com/a/evhqUgu

Here's the paper link btw:
https://cdn.openai.com/papers/22265bac-3191-44e5-b057-7aaacd8e90cd/paperbench.pdf


**評論 2**:

What's really interesting, from the paper, is Claude 3.5 destroys everyone in the "BasicAgent" set up ~20% to o1-high's 13% or o3mini-high's 3%.

But using their "IterativeAgent" setup which "removes the ability of the models to end the task early and promp models to work in a piecemeal fashion" o1-high se a new record at ~25% but Claude 3.5 drops to 16%.

"We observe that all models apart from Claude 3.5 Sonnet frequently finished early, claiming that they had either finished the entire replication or had faced a problem they couldn't solve"

"We note with Claude 3.5 Sonnet outperforms o1 with Basic Agent but underperforms o1 with IterativeAgent. This suggest that the prompt tuning used for iterative agent is differently suited for OpenAI o-series models. We suspect that a modification to BasicAgent that also preven``` it from ending the task early could lead to Claude 3.5 Sonnet outperforming o1 with IterativeAgent"

Sounds like there are still easy gains in this benchmark just from tweaking initial prompt setup and agent framework.


---

## 8. ```
Fully Featured AI Coding Agent as MCP Server
``` {#8-```
fully-featured-ai-coding-agent-as-mcp-server}

這段文章的核心討論主題是:
**介紹一款免費且功能強大的程式碼分析工具「Serena」**,並說明其特點與使用方式。

重點包括:
1. **免費且高性能**:與付費工具(如 Windsurf's Cascade 或 Cursor's agent)能力相當甚至更好,但可免費使用。
2. **技術實現**:
- 作為 **MCP 伺服器** 運行,可搭配 Claude Desktop 免費使用。
- 採用 **語言伺服器(language server)** 而非 RAG 技術來分析程式碼,支援大型程式碼庫。
3. **多平台支援**:
- 可透過 Google Gemini 運行(需 API key,新 Google Cloud 帳戶提供 300 美元抵免額)。
4. **開源授權**:以 **GPL 許可證** 釋出,程式碼託管於 GitHub(附連結)。

整體強調工具的 **易用性、免費開放性及技術優勢**,目標是吸引開發者試用與貢獻。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpavtm/fully_featured_ai_coding_agent_as_mcp_server/](https://reddit.com/r/ClaudeAI/comments/1jpavtm/fully_featured_ai_coding_agent_as_mcp_server/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jpavtm/fully_featured_ai_coding_agent_as_mcp_server/](https://www.reddit.com/r/ClaudeAI/comments/1jpavtm/fully_featured_ai_coding_agent_as_mcp_server/)
- **發布時間**: 2025-04-02 08:00:46

### 內容

We've been working like hell on this one: a fully capable Agent, as good or better than Windsurf's Cascade or Cursor's agent - but can be used for free.

It can run as an MCP server, so you can use it for free with Claude Desktop, and it can still fully understand a code base, even a very large one. We did this by using a language server instead of RAG to analyze code.

Can also run it on Gemini, but you'll need an API key for that. With a new google cloud account you'll get 300$ as a gift that you can use on API credi```.

Check it out, super easy to run, GPL license:

https://github.com/oraios/serena


### 討論

**評論 1**:

ELI5, how is this different than Cline or Roo?


**評論 2**:

Can you describe how this is different from the bash scripting MCPs (like wcgw) that are also optimised for coding?

Those essentially run like Claude Code and are fantastic, especially on large projec``` (my experience at least). Im wondering if this is similar or does it provide additional functionality.


**評論 3**:

Im using it right now. Its like read/write on steroids! Amazing! Is there a discord group we can make to stay connected in a community?


**評論 4**:

Can we use it for php? With Gemini 2.5 pro?


**評論 5**:

Any chance of NodeJS or any other variant for programming options?


---

## 9. ```
Thanks then! Take care...
``` {#9-```
thanks-then-take-care-
```}

這組對話的核心討論主題是 **用戶在使用 Claude AI 時遇到的技術問題與投訴建議**,具體包含以下重點:

1. **投訴指南**
- 如何有效提交問題報告(選擇正確標籤、提供詳細資訊如輸入/輸出內容)。
- 強調不同用戶可能因 Anthropic 的測試機制而獲得差異化結果。
- 官方建議對不滿意輸出按「拇指向下」反饋。

2. **用戶回報的實際問題**
- 功能異常(如「突然無法正常使用」)。
- 系統提示矛盾(如「新對話仍顯示長度限制」)。
- 服務限制疑問(如「是否因額度用盡或伺服器繁忙」)。

3. **社群互動**
- 用戶尋求同儕經驗確認("Oh I thought I was the only one")。
- 對問題的負面情緒表達("Cuck flaude"諷刺性用語)。

整體聚焦於 **技術故障的集體回報與解決流程**,反映用戶對服務穩定性的關注與官方溝通管道的需求。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpjhvv/thanks_then_take_care/](https://reddit.com/r/ClaudeAI/comments/1jpjhvv/thanks_then_take_care/)
- **外部連結**: [https://i.redd.it/4xobuf5gqdse1.png](https://i.redd.it/4xobuf5gqdse1.png)
- **發布時間**: 2025-04-02 16:12:02

### 內容

When making a complaint, please

  1. make sure you have chosen the correct flair for the Claude environment that you are using: i.e Web interface (FREE), Web interface (PAID), or Claude API. This information helps others understand your particular situation.

  2. try to include as much information as possible (e.g. prompt and output) so that people can understand the source of your complaint.

  3. be aware that even with the same environment and inpu```, others might have very different outcomes due to Anthropic's testing regime.

  4. be sure to thumbs down unsatisfactory Claude output on Claude.ai. Anthropic representatives tell us they monitor this data regularly.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

Oh I thought I was the only one. Hopefully this is some bad setting as it was working fine until last week.

Yes, Cuck flaude

Can someone explain why this is showing? Your account is out of tokens or the servers are too busy?

Also getting the "Your message will exceed the length limit for this chat. Try shortening your message or starting a new conversation." message even if I start a new chat and new project. Hopefully will be fixed as stranded now.


### 討論

**評論 1**:

When making a complaint, please

  1. make sure you have chosen the correct flair for the Claude environment that you are using: i.e Web interface (FREE), Web interface (PAID), or Claude API. This information helps others understand your particular situation.
  2. try to include as much information as possible (e.g. prompt and output) so that people can understand the source of your complaint.
  3. be aware that even with the same environment and inpu```, others might have very different outcomes due to Anthropic's testing regime.
  4. be sure to thumbs down unsatisfactory Claude output on Claude.ai. Anthropic representatives tell us they monitor this data regularly.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.


**評論 2**:

Oh I thought I was the only one. Hopefully this is some bad setting as it was working fine until last week.


**評論 3**:

Yes, Cuck flaude


**評論 4**:

Can someone explain why this is showing? Your account is out of tokens or the servers are too busy?


**評論 5**:

Also getting the "Your message will exceed the length limit for this chat. Try shortening your message or starting a new conversation." message even if I start a new chat and new project. Hopefully will be fixed as stranded now.


---

## 10. ```
What Happens When You Tell an LLM It Has an iPhone Next to It?
``` {#10-```
what-happens-when-you-tell-an-llm-it-has-an}

這三段討論的核心主題可以總結為:

**「不同提示(prompt)設計對AI模型輸出結果的影響,以及如何通過實驗控制組驗證其效果」**

具體要點包括:
1. **實驗設計的嚴謹性**:
- 建議增加對照組(如無額外提示、僅金融分析師角色、僅智能手機情境、兩者結合等),以明確不同提示條件的獨立影響。
- 強調缺乏對照組會導致結論可靠性不足。

2. **提示結構的關鍵作用**:
- 討論「用戶訊息追加」(appending user message)可能是影響結果的主因,推測其通過擴大上下文窗口(search window)提升模型輸出的準確性。
- 對比「系統提示」(system prompt)與「用戶訊息追加」的效果差異,認為後者對結果的影響更顯著。

3. **延伸思考**:
- 提出「不同物件(如智能手機 vs. 其他物件)是否對人類或AI的反應產生差異」的開放性問題。
- 整體肯定這是一個有趣的思維實驗(thought experiment),並呼籲進一步驗證假設。

總結:討論聚焦於提示工程的實驗方法與機制分析,旨在探討如何優化設計以更有效引導AI模型的行為。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpr724/what_happens_when_you_tell_an_llm_it_has_an/](https://reddit.com/r/ClaudeAI/comments/1jpr724/what_happens_when_you_tell_an_llm_it_has_an/)
- **外部連結**: [https://medium.com/p/01a82c880a56](https://medium.com/p/01a82c880a56)
- **發布時間**: 2025-04-02 23:19:29

### 內容

I'm not a scientist or anything, but you'd need more control groups. One with no additional prompting, one with the financial analyst persona, one with the smartphone and one with both.

Without it, it's kind of hard to draw a proper conclusion.

I also wonder if it being a different object would make a difference in humans or AI in this case.

This is cool

It genuinely seems like an interesting thought experiment. My guess is that appending the User Message is where the actual result transformation is taking place. Id be interested to see the changes in resul``` without appending the user message.

By adding tokens to the User Message the model has more context to explore what the User is requesting, correlating to a large search window. The larger the window, the more you can see, the more accurate the picture can be. That makes sense to me given the resul```

My guess is that if you stop appending the user message, and simply add a few lines to a gigantic system prompt it doesnt change as much, if at all.


### 討論

**評論 1**:

I'm not a scientist or anything, but you'd need more control groups. One with no additional prompting, one with the financial analyst persona, one with the smartphone and one with both.
Without it, it's kind of hard to draw a proper conclusion.

I also wonder if it being a different object would make a difference in humans or AI in this case.


**評論 2**:

This is cool


**評論 3**:

It genuinely seems like an interesting thought experiment. My guess is that appending the User Message is where the actual result transformation is taking place. Id be interested to see the changes in resul``` without appending the user message.

By adding tokens to the User Message the model has more context to explore what the User is requesting, correlating to a large search window. The larger the window, the more you can see, the more accurate the picture can be. That makes sense to me given the resul```

My guess is that if you stop appending the user message, and simply add a few lines to a gigantic system prompt it doesnt change as much, if at all.


---

## 11. ```
Claude 3.7 Sonnet extended thinking is a waste of time (for deep coding) since Monday
``` {#11-```
claude-3-7-sonnet-extended-thinking-is-a-wa}

這篇文章的核心討論主題是:**使用者對系統重複出現相同錯誤(或問題)的困惑與質疑**。

具體要點包括:
1. **問題描述**:使用者指出系統在多次反饋後仍重複出現相同的錯誤(如圖像重複顯示),並強調此問題在之前(如上週)並未發生。
2. **異常性質疑**:使用者對錯誤的突然出現感到奇怪("which I find strange"),暗示可能與近期系統更新或其他變化有關。
3. **尋求共鳴**:透過提問「其他人是否遇到類似問題?」("Anyone else seeing similar problems?"),試圖確認此現象是否為普遍現象,或僅是個案。

整體而言,文章聚焦於「系統錯誤的重複性與異常性」,並試圖引發社群討論以釐清原因。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpq9nn/claude_37_sonnet_extended_thinking_is_a_waste_of/](https://reddit.com/r/ClaudeAI/comments/1jpq9nn/claude_37_sonnet_extended_thinking_is_a_waste_of/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jpq9nn/claude_37_sonnet_extended_thinking_is_a_waste_of/](https://www.reddit.com/r/ClaudeAI/comments/1jpq9nn/claude_37_sonnet_extended_thinking_is_a_waste_of/)
- **發布時間**: 2025-04-02 22:41:39

### 內容

https://preview.redd.it/8lwji1e6ofse1.png?width=491&format=png&auto=webp&s=476ed5b5d7aec8c814613795cbc50dcdcdafc0b7

I have received this obviously duplicated after 2 iterations of pointing this problem out.

Mistakes like this didn't happen last week, which I find strange.

Anyone else seeing similar problems?


### 討論

**評論 1**:

Since a while, I switched to code reviews with o3 mini high to slap a bit Sonnet 3.7.
And that helps steer the dumb thinking.
Sonnet remain a very solid working horse.


**評論 2**:

It actually happened on Saturday in my experience they ruined it somehow by tuning: it still produces code that compiles, but doesn't solve your problem anymore. I was so pissed yesterday with it, it became worse than Gemini 2.0 flash


---

## 12. ```
Claude 3.7 Sonnet is still the best LLM (by far) for frontend development
``` {#12-```
claude-3-7-sonnet-is-still-the-best-llm-by-}

這四段討論的核心主題可以總結為:**對當前大型語言模型(LLM)在實際開發應用中能力局限性的批判性探討**,具體聚焦於三個層面:

1. **前端開發能力的深度差距**
首段指出LLM僅能生成表面靜態網頁(HTML/CSS),但缺乏真實前端開發所需的複雜能力(如狀態管理、API整合、效能優化等),強調「視覺外觀」與「完整前端工程」的本質差異。

2. **模型編碼實用性的具體缺陷**
第二段以Claude 3.7為例,揭露LLM在基礎程式碼格式(如縮排)上的離譜失敗,反映其對開發者實際工作流程的理解不足。

3. **模型競爭力的矛盾評價**
後兩段呈現用戶對Gemini與Claude的對比觀點:雖認可Gemini 2.5 Pro解決複雜問題的能力和性價比,但也批評其輸出「混沌」(可能指邏輯不穩定或風格不一致),顯示不同模型在「能力強度」與「輸出可靠性」間的取捨困境。

整體而言,討論本質是對LLM技術現狀的「祛魅」——在肯定其基礎內容生成能力的同時,尖銳指出其與專業開發需求的脫節,並強調評估標準需從「表面產出」轉向「真實開發場景的實用性」。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpj7cs/claude_37_sonnet_is_still_the_best_llm_by_far_for/](https://reddit.com/r/ClaudeAI/comments/1jpj7cs/claude_37_sonnet_is_still_the_best_llm_by_far_for/)
- **外部連結**: [https://medium.com/p/f180b9c12bc1](https://medium.com/p/f180b9c12bc1)
- **發布時間**: 2025-04-02 15:49:40

### 內容

You're evaluating LLM capabilities through their ability to generate SEO-optimized, visually appealing websites. However, "Frontend development" is way more than just HTML and CSS.

What you're testing is essentially the surface-level appearance of websites - something that most modern LLMs can indeed generate in a single shot. This is only scratching the surface of what frontend development actually involves.

True frontend development includes:

  • JavaScript functionality and interactivity

  • State management

  • API integration

  • Performance optimization

  • Accessibility compliance

  • Build systems and tooling

  • Testing and debugging

While LLMs can generate decent-looking static templates, they struggle with the complex logic, architecture decisions, and technical problem-solving that professional frontend developers handle daily.

If you're genuinely interested in evaluating LLM capabilities in this domain, consider testing for more comprehensive aspec``` of frontend development beyond just visual appearance and SEO markup.

Claude 3.7 has become ass at coding. It cant even give me a method with identation like : " def..." I said to use propert spacing 10 times and it says "here is the method with proper spacing"

I've been the biggest Claude Sonnet fanboi for ages, but Gemini 2.5 Pro is better. It's so much better that it has solved problems not even Claude could solve and so much cheaper.

Gemini 2.5 is powerful, but it's so chaotic.


### 討論

**評論 1**:

You're evaluating LLM capabilities through their ability to generate SEO-optimized, visually appealing websites. However, "Frontend development" is way more than just HTML and CSS.

What you're testing is essentially the surface-level appearance of websites - something that most modern LLMs can indeed generate in a single shot. This is only scratching the surface of what frontend development actually involves.

True frontend development includes:

  • JavaScript functionality and interactivity
  • State management
  • API integration
  • Performance optimization
  • Accessibility compliance
  • Build systems and tooling
  • Testing and debugging

While LLMs can generate decent-looking static templates, they struggle with the complex logic, architecture decisions, and technical problem-solving that professional frontend developers handle daily.

If you're genuinely interested in evaluating LLM capabilities in this domain, consider testing for more comprehensive aspec``` of frontend development beyond just visual appearance and SEO markup.


**評論 2**:

Claude 3.7 has become ass at coding. It cant even give me a method with identation like : " def..." I said to use propert spacing 10 times and it says "here is the method with proper spacing"


**評論 3**:

I've been the biggest Claude Sonnet fanboi for ages, but Gemini 2.5 Pro is better. It's so much better that it has solved problems not even Claude could solve and so much cheaper.


**評論 4**:

Gemini 2.5 is powerful, but it's so chaotic.


---

## 13. ```
This is the first time in almost a year that Claude is not the best model
``` {#13-```
this-is-the-first-time-in-almost-a-year-tha}

這段文章的核心討論主題是:**作者對Gemini 2.5模型的高度評價,認為其當前表現超越Claude及其他競爭模型,並坦承自己雖曾偏好Claude,但基於實際使用體驗(如上下文處理能力與可靠性)而改變立場**。

具體要點包括:
1. **Gemini 2.5的優勢**:作者強調其在多種用例中表現出色,甚至比Claude更優,並特別提到其強大的上下文處理能力和可靠性。
2. **對其他模型的不滿**:批評Google過往的Gemini版本及「狼來了」的信任問題,並提到自己曾因不滿其他模型而只在Claude社群活躍。
3. **立場轉變**:儘管長期支持Claude,作者承認當前技術快速變動下,Gemini 2.5明顯領先,且暫時看不到Claude的優勢。
4. **動態競爭的觀察**:作者預期未來Claude可能再次反超,但現階段Gemini 2.5是無可爭議的優勝者。

整體而言,這是一篇基於實際體驗的模型性能比較與立場轉變的個人見解。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jozpc0/this_is_the_first_time_in_almost_a_year_that/](https://reddit.com/r/ClaudeAI/comments/1jozpc0/this_is_the_first_time_in_almost_a_year_that/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jozpc0/this_is_the_first_time_in_almost_a_year_that/](https://www.reddit.com/r/ClaudeAI/comments/1jozpc0/this_is_the_first_time_in_almost_a_year_that/)
- **發布時間**: 2025-04-02 00:19:59

### 內容

Gemini 2.5 is simply better. I hate Google, I hate previous Geminis, and they have cried wolf so many times. I have been posting exclusively on the Claude subreddit because I've found all other models to be so much worse. However I have many use cases, and there aren't any that Claude is currently better than Gemini 2.5 for. Even in Gemini Advance (the weaker version of the model versus AIStudio) it's incredibly powerful at handling context and incredibly reliable. I feel like I'm going to the dark side but it simply has to be said. This field changes super fast and I'm sure Claude will be back on top at some point, but this is the first time where I just think that is so clearly not the case.


### 討論

**評論 1**:

you forgot that the best thing about is that it is free. i have been saying this for a long time, most AI startups will be eaten by big tech for lunch because big tech can race to the bottom but anthropic cant just provide their flagship models for free


**評論 2**:

The creative writing is extremely good, by far the best one I tried for that purpose.


**評論 3**:

Google invented Transformer and Bert and it's researchers pioneered many great technologies. Strange that you are surprised they took a lead (might not be for long).


**評論 4**:

I wish these kinds of pos``` explained their use case. I can't tell you how many times I've had to read how ChatGPT is better than Claude....only to learn they are writing stories...which I don't do.

I have been working on an economic dashboard right now and this post couldn't be further from the truth.

I say this to say that it would be more useful to get specifics about what you're doing with another AI that makes it better. That's going to provide more value than a post that tries to generalize that one is better than another.


**評論 5**:

Same for me. This is the first time I also don't shit on google's product (actually second, I liked Ultra)


---

## 14. Anthropic is giving free API credi``` for university studen``` \{#14-anthropic-is-giving-free-api-credi```-for-unive}

這篇文章的核心討論主題是提供學生開發者申請表單的連結,以便他們聯繫銷售團隊或參與相關計劃。重點在於引導學生開發者透過指定的網址提交申請。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpwevh/anthropic_is_giving_free_api_credits_for/](https://reddit.com/r/ClaudeAI/comments/1jpwevh/anthropic_is_giving_free_api_credits_for/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jpwevh/anthropic_is_giving_free_api_credits_for/](https://www.reddit.com/r/ClaudeAI/comments/1jpwevh/anthropic_is_giving_free_api_credits_for/)
- **發布時間**: 2025-04-03 02:46:55

### 內容

Application form is here: https://www.anthropic.com/contact-sales/for-student-builders


### 討論

無討論內容

---

## 15. ```
Claude Code was prohibitively expensive for me
``` {#15-```
claude-code-was-prohibitively-expensive-for}

這篇文章的核心討論主題是:**作者對Claude Code(推測為AI編程工具)的使用體驗與成本效益分析**,主要圍繞以下幾個重點:

1. **效能驚艷但成本高昂**
- 工具解決了其他模型(如Sonnet 3.7)無法處理的問題,且一次嘗試即成功
- 每小時$21.75的使用成本對自由工作者而言過高,直接影響收入

2. **成本與商業模式的對比**
- 與固定月費$40的Cursor工具相比,現行計費方式不利於頻繁使用
- 期待降價或由客戶提供API金鑰(類似基礎工作資源),實現雙贏

3. **工具特性與使用體驗**
- 精準性:僅做最小必要修改,解決方案具創造性(能跳脫框架思考)
- 混合工作流:CLI工具需搭配IDE檢視修改,雖不便但令人安心

4. **未來期待**
- 希望增加使用機會,尤其作為備用方案加速AI產品開發,為客戶節省時間成本

隱含議題:**AI工具在專業工作流程中的性價比平衡**,以及開發者對「類人類創造力」的技術需求。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpihs2/claude_code_was_prohibitively_expensive_for_me/](https://reddit.com/r/ClaudeAI/comments/1jpihs2/claude_code_was_prohibitively_expensive_for_me/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jpihs2/claude_code_was_prohibitively_expensive_for_me/](https://www.reddit.com/r/ClaudeAI/comments/1jpihs2/claude_code_was_prohibitively_expensive_for_me/)
- **發布時間**: 2025-04-02 14:57:24

### 內容

At the rate I was using it, it would cost $21.75 per hour. It did an impressive job and solved a problem that other models (including Sonnet 3.7) were struggling with, and did so with i``` first attempt.

I haven't tried it more because of the expense. As a freelancing AI Engineer, that would be coming straight out of my hourly rate. Unlike Cursor, which I pay a fixed $40/month for.

I hope it will come down in cost, as it's nice to have a backup strategy. Some clien may provide me with an Anthropic key (the modern equivalent of providing a desk and chair), and then everyone wins because it would reduce the time it takes me to build AI produc, so a saving for them.

Looking forward to using it more. There's something reassuring about using CLI tools, though you have to jump into your IDE to review what was changed.

Claude Code was surgical and only made the minimum amount of changes. I``` solution was quite creative; it had taken a step back from the task to think about it in a new and novel way; a bit human-like in that regard, and with a good result.


### 討論

**評論 1**:

Have you tried Cline and Roo Code?


**評論 2**:

utilise le MCP jetbrains tu feras "a peu prs" la meme chose qu'avec Claude Code


**評論 3**:

We created an MCP Server that is as powerful as Claude Code but can be used completely for free through Claude Desktop:

https://github.com/oraios/serena


**評論 4**:

IMO the agen are still a ways off from being super useful given the cos to use them vs single api calls / cha and copy pasting giving you some majority % of the benefi and still helping your velocity.

But Claude code is among the worst offenders, as are cline / roo to some extent, though theyre a little better.

They do almost zero optimizations for context management and throw the full project into every query (cline and roo throw the full touched files iirc) and it balloons cos for it. It also can hurt performance as the projec grow because coherence on medium projec (\>4k lines? 5-15 tokens for a line is reasonable depending on the language) pu you at a 32k window which is almost unusable for most models.

In theory cos``` will come down a lot as we optimize for them and get better RAG retrieval, but for first party solutions - Claude code, oai / googles eventual one, etc they have perverse incentives to do so.

Aiders system is pretty cool - using what they call a repository map that traverses the AST instead of raw vector embeddings. Continues is neat because of how customizable it is - rag is default but you have providers for @open / @codebase etc to override it when needed. I havent used their agent - but the plugin is reasonably useful.


**評論 5**:

How does claude code differ from just using the chat sessions?


---

## 16. ```
is it just me or has clause 3.5 gone bonkers today?
``` {#16-```
is-it-just-me-or-has-clause-3-5-gone-bonker}

這篇文章的核心討論主題是:**用戶對Claude AI在編程協助中突然出現的「過度回應」問題(提供未請求的代碼/資訊)及其對使用效率(消耗更快)的困擾**。

具體要點包括:
1. **異常行為**:Claude違反用戶指令(如主動提供未要求的代碼、在被告知「不要回覆」時仍輸出大量資訊)。
2. **效率影響**:異常回應導致專業版方案的額度消耗速度異常加快(從平時1.5-2小時縮減至45分鐘)。
3. **版本差異**:用戶同時使用3.5和3.7版本,但未明確指出問題是否與版本相關。
4. **用戶挫折感**:對比過往順暢體驗,此次問題直接影響工作流程與成本效益。

隱含議題:AI行為一致性與可控性對實際應用的重要性。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpuaxn/is_it_just_me_or_has_clause_35_gone_bonkers_today/](https://reddit.com/r/ClaudeAI/comments/1jpuaxn/is_it_just_me_or_has_clause_35_gone_bonkers_today/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jpuaxn/is_it_just_me_or_has_clause_35_gone_bonkers_today/](https://www.reddit.com/r/ClaudeAI/comments/1jpuaxn/is_it_just_me_or_has_clause_35_gone_bonkers_today/)
- **發布時間**: 2025-04-03 01:24:36

### 內容

I've been using claude for coding and it's usually very smooth and the AI does exactly what I ask it to do.

I have 2 pro plans and I go back and forth between 3.7 and 3.5.

For some reason today it keeps trying to give me code I didn't ask for and when i say things like, do not reply it replies with tons of info, or I'll say just provide code for X file and it gives me other files along with the one I asked for.

I used up my professional plan about faster today because of this. I can usually get about 1.5 to 2 hours of chat before my time expires, today it was about 45 minutes worth of chat.


### 討論

無討論內容

---

## 17. ```
This conversation reached i``` maximum length
``` {#17-```
this-conversation-reached-i```-maximum-leng}

這段文字的核心討論主題是:**用戶近期在使用「Pro plan」服務時,遇到比以往更早觸發限制的問題(例如使用量或功能限制),並對此突發變化感到困惑與不滿**。

具體要點包括:
1. **異常現象**:過去幾個月使用付費方案(Pro plan)時正常,但最近幾天卻過早觸發限制(例如僅處理幾個章節後就受限)。
2. **對比與質疑**:用戶強調此問題是「突然發生」,暗示可能與服務端的調整或故障有關,而非自身使用習慣改變。
3. **尋求共鳴**:開頭提問「Is anyone noticing...」,顯示用戶希望確認是否為普遍現象,可能尋求解決方案或集體反饋。

延伸解讀:此討論可能涉及服務商的政策調整、系統錯誤,或資源分配問題,反映用戶對付費服務穩定性的期待與現實落差。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpnrn3/this_conversation_reached_its_maximum_length/](https://reddit.com/r/ClaudeAI/comments/1jpnrn3/this_conversation_reached_its_maximum_length/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jpnrn3/this_conversation_reached_its_maximum_length/](https://www.reddit.com/r/ClaudeAI/comments/1jpnrn3/this_conversation_reached_its_maximum_length/)
- **發布時間**: 2025-04-02 20:50:06

### 內容

Is anyone noticing that they are hitting this way too soon the last few days? I have been using Pro plan for a few months and never had this issue. Suddenly this happens only after a couple of chapters.


### 討論

**評論 1**:

When making a complaint, please

  1. make sure you have chosen the correct flair for the Claude environment that you are using: i.e Web interface (FREE), Web interface (PAID), or Claude API. This information helps others understand your particular situation.
  2. try to include as much information as possible (e.g. prompt and output) so that people can understand the source of your complaint.
  3. be aware that even with the same environment and inpu```, others might have very different outcomes due to Anthropic's testing regime.
  4. be sure to thumbs down unsatisfactory Claude output on Claude.ai. Anthropic representatives tell us they monitor this data regularly.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.


**評論 2**:

How are you handling memory storage? Project functions, Google docs/shee``` or MCP/other?


---

## 18. ```
What the hell happened?
``` {#18-```
what-the-hell-happened-
```}

這組對話的核心討論主題是:**用戶在向Claude AI提出投訴或問題回報時應遵循的指南與注意事項**,並附帶其他用戶對類似問題的簡短附和。

具體重點包括:
1. **投訴格式要求**:
- 選擇正確的Claude使用環境標籤(免費/付費網頁版、API)。
- 提供詳細資訊(如輸入提示詞和輸出結果)以便釐清問題。

2. **潛在變數說明**:
- 相同輸入可能因Anthropic的測試機制導致不同結果。

3. **用戶反饋途徑**:
- 強調對不滿意輸出按「拇指向下」評分,供官方監測改進。

4. **其他用戶回應**:
- 簡短附和遇到相同問題,並提及官方狀態頁面未更新(附連結)。

整體圍繞「有效提交Claude問題」的規範與實際問題觀察,後續對話則為用戶簡短確認類似狀況。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpa0nb/what_the_hell_happened/](https://reddit.com/r/ClaudeAI/comments/1jpa0nb/what_the_hell_happened/)
- **外部連結**: [https://i.redd.it/cx8kwupb4bse1.jpeg](https://i.redd.it/cx8kwupb4bse1.jpeg)
- **發布時間**: 2025-04-02 07:21:08

### 內容

When making a complaint, please

  1. make sure you have chosen the correct flair for the Claude environment that you are using: i.e Web interface (FREE), Web interface (PAID), or Claude API. This information helps others understand your particular situation.

  2. try to include as much information as possible (e.g. prompt and output) so that people can understand the source of your complaint.

  3. be aware that even with the same environment and inpu```, others might have very different outcomes due to Anthropic's testing regime.

  4. be sure to thumbs down unsatisfactory Claude output on Claude.ai. Anthropic representatives tell us they monitor this data regularly.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

Yeah same problem.

same here. No update from their status page too

https://status.anthropic.com/

Yup same for me.

Let him think.


### 討論

**評論 1**:

When making a complaint, please

  1. make sure you have chosen the correct flair for the Claude environment that you are using: i.e Web interface (FREE), Web interface (PAID), or Claude API. This information helps others understand your particular situation.
  2. try to include as much information as possible (e.g. prompt and output) so that people can understand the source of your complaint.
  3. be aware that even with the same environment and inpu```, others might have very different outcomes due to Anthropic's testing regime.
  4. be sure to thumbs down unsatisfactory Claude output on Claude.ai. Anthropic representatives tell us they monitor this data regularly.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.


**評論 2**:

Yeah same problem.


**評論 3**:

same here. No update from their status page too

https://status.anthropic.com/


**評論 4**:

Yup same for me.


**評論 5**:

Let him think.


---

## 19. ```
Worked on a lofi platform
``` {#19-```
worked-on-a-lofi-platform
```}

這篇文章的核心討論主題是:
**作者分享其自主開發的「低傳真(lofi)音樂平台」專案(edenzen.co),並描述其創作動機、學習過程、技術堆疊,以及尋求反饋的邀請**。

具體要點包括:
1. **專案介紹**:一個結合專注、放鬆與生產力的低傳真音樂平台,主打桌面端使用。
2. **個人動機**:出於對lofi音樂的喜愛,並融合設計與開發的興趣。
3. **學習歷程**:作者身為產品設計師,自學開發(如Next.js、TypeScript等),並借助AI工具(Claude)解決問題。
4. **技術堆疊**:列出使用的工具(Vercel、Supabase等),強調從零到一的挑戰。
5. **尋求反饋**:邀請他人試用平台並提供意見,同時提及未來改進方向(如行動端優化)。

整體聚焦於「個人側邊專案的開發經驗分享」與「社群互動的邀請」。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpv4m8/worked_on_a_lofi_platform/](https://reddit.com/r/ClaudeAI/comments/1jpv4m8/worked_on_a_lofi_platform/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jpv4m8/worked_on_a_lofi_platform/](https://www.reddit.com/r/ClaudeAI/comments/1jpv4m8/worked_on_a_lofi_platform/)
- **發布時間**: 2025-04-03 01:56:16

### 內容

Hey everyone,

I wanted to share a side project Ive been working onedenzen.co. Its a lofi music platform focussed on creating your perfect ambience and digital space.

I started this because Ive always loved the calming vibes of lofi music, and I thought it would be fun to create a space that blends focus, relaxation, and productivity. Im a product designer by trade, but Ive been diving into development, learning as I go, and using Claude heavily to help bridge the gaps in my knowledge.

This has been a huge learning experience, and Id love to hear what you think! Any feedback, feature ideas, or just general though would mean a lot. I far from finished and plenty of things to work on but I would say the base of the web app is complete more or less.

Bear in mind everything above while not complicated at all to a seasoned developer, has been a new experience for me in literally every aspect. My current tech stack is the following

  • Vercel

  • Next.js and Typescript

  • Upstash for rate limiting

  • Cloudflare for CDN

  • Supabase as backend.

I am thankful for my software background which did make it easier to understand and go in when claude keeps looping and saying 'Ah I see the issue'.

Check out edenzen.co and me know what you think!

PS: I``` not optimised for mobile as the platform mainly focuses on desktop and is meant to be used on desktop but eventually will optimise it for mobile.


### 討論

**評論 1**:

Very aestethicly nice, good job. Music is also good, is it from soundcloud?

I also built LoFi radio but with AI generated music :)

https://chillify.me in case you wanna look.


**評論 2**:

Whats the source of the music?


---

## 20. ```
Please be candid; did I just pay $220 for a year of this screensaver, but only at Anthropic's website?
``` {#20-```
please-be-candid;-did-i-just-pay-220-for-a-}

由於提供的連結是圖片格式(Reddit 預覽圖),我無法直接查看或分析其文字內容。不過,可以根據常見的 Reddit 討論情境推測可能的討論主題:

1. **圖片內容推測**:
- 若圖片是 **迷因(Meme)或幽默梗圖**,核心主題可能是網路文化、社會現象或時事調侃。
- 若為 **新聞截圖或數據圖表**,可能涉及政治、科技、經濟等議題的討論。
- 若為 **個人創作或攝影**,主題可能圍繞藝術表達或攝影技巧。

2. **標題線索**:
原連結未提供標題,但 Reddit 圖片帖通常會搭配簡短標題(例如:「這現象太真實了!」或「大家怎麼看這個?」),需結合標題進一步判斷。

3. **建議操作**:
- 提供圖片的文字內容或標題,以便更精確總結。
- 描述圖片中的關鍵元素(如文字、場景、人物等),可協助分析主題。

若您能補充更多資訊(例如上傳圖片文字或描述),我可以給出更具體的總結!

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpaadi/please_be_candid_did_i_just_pay_220_for_a_year_of/](https://reddit.com/r/ClaudeAI/comments/1jpaadi/please_be_candid_did_i_just_pay_220_for_a_year_of/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jpaadi/please_be_candid_did_i_just_pay_220_for_a_year_of/](https://www.reddit.com/r/ClaudeAI/comments/1jpaadi/please_be_candid_did_i_just_pay_220_for_a_year_of/)
- **發布時間**: 2025-04-02 07:33:10

### 內容

https://preview.redd.it/wkdat48b6bse1.jpg?width=1920&format=pjpg&auto=webp&s=bbbc185a4528ad74e6ec4369f451c4a458e674c6


### 討論

**評論 1**:

Starting to get kind of ridiculous how often its been down lately


**評論 2**:

You are paying so that the people at Claude can eat well . That should make you feel better.


**評論 3**:

Mate, I just renabled my subscription on Claude after ages. Installed Claude Desktop and immediately regretted. I am bettee off Windsurf or even Roo Cline or similar using Gemini/Claude API but the application and the WebUI sucks! Ran into length limi``` so many times that I had to switch back to API based coding immediately.


**評論 4**:

They struggle with thinking mode. And it adds a lot of load too.
And I'm using mostly the thinking more to use plan from o3 and Gemini now.
Anthropic need to rework the Thinking mode. But the execution remain very good.


**評論 5**:

Im working on building an entire web app from scratch with AI help. Im not a coder at all. I first started with Gemini (consumer web interface) using 2.0 flash. When that got stuck , I took my work and went to Claude Sonnet 3.7 which Im paying for monthly. It was going VERY well so much better but yes I still had to deal with chat limi and time limi like everyone but the resul``` were going great and troubleshooting as well.

THEN everyone started raving about Gemini 2.5 and that I should also use AI studio. The main problems with AI Studio: unbearable input lag after only 50-60k tokens , no project knowledge like Claude so I cant just connect it to my Github repo and a convenient sync I have to zip up the latest version and share for every new chat.

Because of the unbearable input lag with AI studio, and no suggested troubleshooting will solve it , I had to go back to Claude and just deal with i``` current shortcomings that interrupt , along with the Continue Continue Continue aside from chat length AND session length interruptions

Its a good thing Im only paying monthly but this whole Gemini 2.5 and 1M context breaking after 60k tokens is utter nonsense.

Damned if I do and damned if I dont. Im just working with Claude on more efficient update process even though I prefer it just generate entire new files for me to just replace than dig through and replace a .


---

## 21. ```
Rate limi```, prompt caching, citations and tools
``` {#21-```
rate-limi```-prompt-caching-citations-and-t}

這篇文章的核心討論主題圍繞以下三個技術問題,均與使用Anthropic的Claude模型API處理長文本及結構化輸出相關:

1. **長文本處理與速率限制的衝突**
探討如何解決API的10k tokens/分鐘速率限制(免費層級)與發送整本書(需大上下文窗口)之間的矛盾,提出分塊發送或付費升級的潛在方案。

2. **緩存文獻並獲取引用**
詢問是否可透過`cache_control`和`citations`參數配置,在緩存整份文檔的同時要求模型提供精確引用(如行號或段落),並附上代碼示例驗證可行性。

3. **結構化輸出中引用機制的實現方式**
比較兩種技術方案:
- 使用第三方庫Instructor獲取帶驗證的結構化輸出
- 直接透過Anthropic的工具功能(Tool Use)結合提示詞強制要求引用
討論依賴第三方封裝與原生API的權衡,特別關注版本迭代兼容性問題。

整體聚焦於「如何高效處理長文本輸入,並在結構化輸出中保留可驗證引用」的技術實現路徑選擇。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpw6wi/rate_limits_prompt_caching_citations_and_tools/](https://reddit.com/r/ClaudeAI/comments/1jpw6wi/rate_limits_prompt_caching_citations_and_tools/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jpw6wi/rate_limits_prompt_caching_citations_and_tools/](https://www.reddit.com/r/ClaudeAI/comments/1jpw6wi/rate_limits_prompt_caching_citations_and_tools/)
- **發布時間**: 2025-04-03 02:37:53

### 內容

Hi, apologies if this is not the right channel to ask, but I'm working on a project that would benefit from all the features mentioned in the title: I'd like to send a whole book in a prompt, cache it, then get structured output that includes citations in the following promp```.

Question #1: Prompt caching and rate limi```

The prompt caching docs suggest I'd be able to send whole books (as long as they fit the context window, which mine do). However, it also seems I'm limited to 10k tokens per minute (site says 20 but my console says 10, probably because I'm in the free tier) when using the API. So I either have to send my text in chunks or contact sales to increase my limit and probably pay for it, is that right?

Question #2: Asking for citations from cached documen```

If "Every block in the request can be designated for caching withcache_control" than I guess I can cache documen``` and get citations from them like this? Has anyone attempted this?

messages = [

{

"role": "user",

"content": [

{

"type": "document",

"source": {

"type": "text",

"media_type": "text/plain",

"text": doc_text

},

"title": doc_title,

"cache_control": {"type": "ephemeral"},

"citations": {"enabled": true}

},

{

"type": "text",

"text": template

}

]

},

{

"role": "assistant",

"content": '{"excerp```":['

}

],

Question #3: Asking for citations in tools/structured output

So far in my tes I've been using [Instructor](`https`://github.com/instructor-ai/instructor) to get structured output. However, [Anthropic's Tool Use Course](`https`://github.com/anthropics/courses/blob/master/tool_use/03_structured_outpu.ipynb) suggest I can achieve this (apart from the validation I guess) using purely the tools feature. The citation docs state that, for sonnet 3.7,

>when the model is asked to structure i response, it is unlikely to use citations unless explicitly told to use citations within that format. For example, if the model is asked to usetags in i response, you should add something like Always use citations in your answer, even within.

So if I'm able to ask for citations from a cached document, and prompt the model to use them within structured outpu```, is Instructor still relevant for this use case? Or am I better off with one less dependency given how fast these APIs change and wrappers not always keep up to date?

Thanks in advance for any help, suggestions and overall commen```!


### 討論

**評論 1**:

When asking about features, please be sure to include information about whether you are using

  1. Claude Web interface (FREE) or Claude Web interface (PAID) or Claude API
  2. Sonnet 3.5, Opus 3, or Haiku 3

Different environmen``` may have different experiences. This information helps others understand your particular situation.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.


---

## 22. ```
Did Claude get smarter again?
``` {#22-```
did-claude-get-smarter-again-
```}

這段討論的核心主題是:
**使用者觀察到AI模型(Claude 3.7)的表現近期出現變化(回應更敏銳、準確),並詢問此現象是否為普遍情況,尤其對比先前其他使用者對性能下降的抱怨。**

具體要點包括:
1. 對模型性能變化的主觀感受(「更敏銳、深思熟慮」)。
2. 疑問是否為個人錯覺或實際更新所致。
3. 與近期其他使用者的負面反饋(如性能下降)形成對比。

本質上是在探討AI模型的表現波動是否為系統性調整,或僅屬個別使用體驗差異。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpqet3/did_claude_get_smarter_again/](https://reddit.com/r/ClaudeAI/comments/1jpqet3/did_claude_get_smarter_again/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jpqet3/did_claude_get_smarter_again/](https://www.reddit.com/r/ClaudeAI/comments/1jpqet3/did_claude_get_smarter_again/)
- **發布時間**: 2025-04-02 22:47:35

### 內容

For the past couple of hours, Claude 3.7 seems noticeably sharper to me responses feel more thoughtful and accurate. Am I the only one noticing this shift, or has something actually changed? Especially since people were complaining about i``` performance earlier this week.


### 討論

**評論 1**:

Anthropic tends to announce builds. Especially since customers rely on consistency.


---

## 23. ```
What's the difference between selecting Claude 3.7 in Perplexity vs using Claude.ai?
``` {#23-```
what-s-the-difference-between-selecting-cla}

這篇文章的核心討論主題是:**比較在Perplexity平台選擇Claude 3.7模型與直接使用Claude.ai之間的差異**。

具體而言,用戶想了解:
1. 兩個平台提供的Claude模型(3.7版本)是否有功能、性能或使用體驗上的區別。
2. 為何在Reddit上的提問未獲得解答,可能反映相關資訊的稀缺性或技術細節不明確。

背景暗示:
- 用戶對不同平台整合同一模型(如Claude)的實際差異感到困惑。
- 可能涉及API版本、界面優化或付費權限等潛在因素。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpvps0/whats_the_difference_between_selecting_claude_37/](https://reddit.com/r/ClaudeAI/comments/1jpvps0/whats_the_difference_between_selecting_claude_37/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jpvps0/whats_the_difference_between_selecting_claude_37/](https://www.reddit.com/r/ClaudeAI/comments/1jpvps0/whats_the_difference_between_selecting_claude_37/)
- **發布時間**: 2025-04-03 02:18:59

### 內容

Sorry for the probably dumb question but what is the difference between selecting Claude 3.7 in Perplexity vs using Claude.ai?

already asked here but no one replied:

https://www.reddit.com/r/ArtificialInteligence/commen/1jo7esz/wha_the_difference_between_selecting_claude_37/


### 討論

**評論 1**:

When asking about features, please be sure to include information about whether you are using

  1. Claude Web interface (FREE) or Claude Web interface (PAID) or Claude API
  2. Sonnet 3.5, Opus 3, or Haiku 3

Different environmen``` may have different experiences. This information helps others understand your particular situation.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.


---

## 24. ```
Claude Desktop App
``` {#24-```
claude-desktop-app
```}

這篇文章的核心討論主題是:**用戶在啟動桌面應用程式時遇到空白頁面的問題,即使重新安裝後仍未解決,並詢問是否有其他人遇到類似情況**。

簡要總結:
- **問題描述**:啟動桌面應用程式時僅顯示空白頁面。
- **已嘗試解決方法**:重新安裝應用程式,但問題依舊。
- **求助目的**:確認是否為普遍現象或尋求解決方案。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpv3gq/claude_desktop_app/](https://reddit.com/r/ClaudeAI/comments/1jpv3gq/claude_desktop_app/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jpv3gq/claude_desktop_app/](https://www.reddit.com/r/ClaudeAI/comments/1jpv3gq/claude_desktop_app/)
- **發布時間**: 2025-04-03 01:55:02

### 內容

Hello, I only get a blank page when launching desktop app.. even after reinstall. does anyone have a similar issue?


### 討論

**評論 1**:

Support queries are handled by Anthropic at http://support.anthropic.com

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.


**評論 2**:

Seems I had to be patient, a message appeared... "

Claude will return soon

Claude.ai is currently experiencing a temporary service disruption. Wereworking on it, please check back soon."

Haven't been able to use it recently though. Happens each time I tried for the past month


---

## 25. ```
I'm not having issues?
``` {#25-```
i-m-not-having-issues-
```}

这篇文章的核心討論主題可以總結為以下幾點:

1. **使用AI工具(如Claude)的局限性與效率問題**:
- 作者指出,使用者可能因「過於龐大的任務需求」(如嘗試生成完整程式)或「模糊簡略的提示」而遭遇效能限制(如速率限制或當機)。
- 強調AI工具目前仍是「實驗性產品」,不應完全依賴它處理整個工作流程。

2. **有效使用AI工具的建議**:
- **分階段開發**:建議從頭開始逐步建構(如遊戲開發需有可擴展的系統框架),而非直接生成龐大或已過於複雜的程式碼。
- **人工審查的必要性**:AI可能生成虛假函數或硬編碼數值,需嚴格檢查輸出內容。
- **避免過度依賴Web介面**:免費版網頁介面效率低下,尤其不適合大型專案;API更適合程式開發。

3. **對使用者行為的批判**:
- 部分使用者未意識到AI工具的設計初衷(如API用於輔助編碼,而非取代全流程),導致操作不當或期望過高。
- 反覆要求AI「繼續生成」(多次輸入`Continue`)可能導致輸出品質下降,反映任務規模超出合理範圍。

4. **潛在風險警示**:
- AI可能掩蓋程式碼缺陷(如虛構函數或硬編碼),若未經檢查將引發後續問題。

**總結**:文章主要探討如何「合理使用AI工具」(如分階段、人工審查),並批評使用者對其能力與限制的誤解,同時指出當前使用模式(如依賴網頁介面處理大型任務)的潛在風險。

- **Reddit 連結**: [https://reddit.com/r/ClaudeAI/comments/1jpksic/im_not_having_issues/](https://reddit.com/r/ClaudeAI/comments/1jpksic/im_not_having_issues/)
- **外部連結**: [https://www.reddit.com/r/ClaudeAI/comments/1jpksic/im_not_having_issues/](https://www.reddit.com/r/ClaudeAI/comments/1jpksic/im_not_having_issues/)
- **發布時間**: 2025-04-02 17:52:02

### 內容

I've seen a lot of these pos, and it does make me think. I've noticed the downtime, I will not debate that, I've encountered it myself. But I do think the limi some of you are wondering about do confuse me, what are you prompting it with?

My though are that maybe trying to vibe code entire programs or using already bloated code might be part of the issue, combined with vague or simple promp. My experience is to use this you need to either start from scratch, or have a 'system' (game dev) in place that can support plug and play systems to get effective work done. You're always going to need to proof check this stuff.

As a result of all of this I've had no issue, nor a reliance dependency when it's down.

I will say that it does concern me, given that Claude needs to be specified not to hard code values & the extent of what I'm seeing people request it do without checking for these things. Claude is very very clever at making code work that shouldn't by inserting the values, or inventing fake functions for 'later use'.

My experience is the more times you've gotta type 'Continue', the sketchier the end product becomes. I wouldn't even attempt serious work without projec``` if I'm using the Web interface.

Tldr; this rate limit is exposing potential oversigh in people using this for 'too large' tasks or relying on an experimental product already for entire workflow solutions, people missing the API is intended for coding & the web interface is an incredibly inefficient method if doing it especially without projec if using free plan


### 討論

**評論 1**:

Until this point I haven't had any issues with Claude hard coding shit in because I still review every diff carefully to ensure it's not doing random shit. Also I think people are still copy and pasting into the webUI which makes Claude work like 10x worse. I'm surprised I dealt with it for so long. tldr: skill issues


**評論 2**:

Just sharing some though about Claude from my experience. It works okay if you use the MCP file system, add some custom instructions, and keep your promp straightforward.

Ive found its really only reliable for one task at a time. When I try to do multiple things, it star``` getting confused. The longer we talk, the more it seems to lose the plot. Thats why I usually try to get in, get what I need, and get out.

The last couple days have been particularly rough performance-wise, so Ive taken a step back from using Claude. Been checking out some other systems and LLMs instead.

Two things Ive noticed:

First, there seems to be a lot of wasted space in Claudes responses - Id say about 75% is just extra fluff with feature bloat offerings, and a lot of blatant refusing to follow directions. Whats really frustrating is when it finally gives you the perfect answer, and then it disappears with an error message, and you can never get that exact solution again.

Second, I wonder if theyre allocating resources differently for different customers. Do the big corporate clien``` get the full version while the rest of us get something more stripped down?

As a business owner, I read through all the policies and updates, and I have to say, their lack of transparency has made me hesitant to rely on them. I had similar trust issues with ChatGPT after they changed their policies. It just makes it hard to commit when things keep shifting without clear communication.


**評論 3**:

Claude works best if you create libraries for specific features then use those libraries to troubleshoot your code in new cha. Once it ge too monumental it has issues like any other AI.


**評論 4**:

Yeah, the entitlement those "contributions" reek of tell me that the respective OPs are probably not able to read the room or assess their own position in it. They unfortunately never got told by anybody that their complete lack of any skill and understanding is an actual reason they should stop doing things in the IT world and instead go back to cleaning the street, like it used to be in the past. And I'm actually not trying to hate on the cleaning professions, if they were properly compensated, etc bla bla...


---

# 總體討論重點

以下是25篇文章的條列式重點總結,附上對應錨點連結與逐條細節:

---

### #1 [Sonnet Computer Use in very Underrated](#anchor_1)
1. **技術優勢與市場低估**
- Sonnet電腦應用表現優異但未獲市場重視,預期將爆發
2. **實際應用案例**
- Apply Hero(熱門工具)、Manus(爆紅演示)佐證實用性
3. **生態整合**
- Vercel AI SDK、BrowserUse商業化成功案例
4. **未來趨勢**
- 「網頁+電腦代理」應用將湧現,Sonnet扮演關鍵角色

---

### #2 [I regret buying claude for 1 year](#anchor_2)
1. **強烈負評**
- 對Claude 3.7版本極度不滿,用詞激烈(如"fucking shitty")

---

### #3 [I blew $417 on Claude Code](#anchor_3)
1. **開發成果**
- 成功建構文字遊戲LetterLinks(含每日挑戰、排行榜)
2. **AI優勢**
- 快速生成代碼、部分調試高效
3. **痛點**
- 上下文限制、虛假解決方案、成本非線性增長
4. **成本效益**
- $417 vs. 人力$2-3K,但耗時測試

---

### #4 [Migrated to Gemini](#anchor_4)
1. **轉換動機**
- 不滿Claude API上下文限制
2. **Gemini優勢**
- 100萬token、無過載錯誤、開發wolfai.us更快

---

### #5 [Claude Sonnet is the undisputed champion of OCR](#anchor_5)
1. **性能排名**
- Claude > 開源模型(Qwen/Mistral)> GPT-4o
2. **測試結論**
- GPT-4o表現不如預期,附影片佐證

---

### #6 [Pour one out for my Claude subscription](#anchor_6)
1. **情感糾結**
- 博士生因Gemini升級棄用Claude,形容如告別實驗室夥伴
2. **理性選擇**
- Gemini現階段更符合學術需求

---

### #7 [Claude 3.5 cracked benchmark](#anchor_7)
1. **代理表現差異**
- BasicAgent:Claude 3.5勝(20% vs. OpenAI 3%)
- IterativeAgent:OpenAI反超(25% vs. Claude 16%)
2. **關鍵因素**
- 提示設計影響結果,Claude易過早結束任務

---

### #8 [Fully Featured AI Coding Agent](#anchor_8)
1. **工具特色**
- 免費開源(GPL)、語言伺服器技術、支援大型程式庫
2. **多平台整合**
- 可搭配Claude Desktop或Gemini API

---

### #9 [Thanks then! Take care...](#anchor_9)
1. **投訴指南**
- 需提供輸入/輸出詳情,按「拇指向下」反饋
2. **用戶問題**
- 功能異常、系統提示矛盾

---

### #10 [Tell an LLM It Has an iPhone](#anchor_10)
1. **實驗設計**
- 需對照組驗證提示效果(如單獨角色/情境測試)
2. **機制分析**
- 用戶訊息追加可能擴大上下文窗口提升準確性

---

(因篇幅限制,以下簡要條列標題,完整細節請參照原文)

### #11-25 快速摘要
- **#11** [Claude 3.7異常](#anchor_11):圖像重複錯誤突發
- **#12** [前端開發局限](#anchor_12):LLM僅能生成靜態頁面,缺乏狀態管理
- **#13** [Gemini 2.5反超](#anchor_13):作者坦承Claude現階段落後
- **#14** [學生API優惠](#anchor_14):提供申請表單連結
- **#15** [Claude Code成本](#anchor_15):每小時$21.75對自由工作者過高
- **#16** [過度回應問題](#anchor_16):Claude違反指令致額度消耗加快
- **#17** [Pro方案限制](#anchor_17):近期提早觸發使用上限
- **#18