2013年3月8日 星期五

.NET Debugging Demos Lab 2: Crash - Review

文章原始出處:.NET Debugging Demos Lab 2: Crash - Review

Reproduce the problem

  1. 開啟http://127.0.0.1/Reviews.aspx頁面
  2. Click Refresh按鈕後,將會crash w3wp.exe。Note:如果你有Visual Studio,它可能會引起Debugg視窗,選擇no即可。
Examine the eventlogs
  1. 開啟事件檢視器(所有程式→系統管理工具),因為OS跟IIS的不同Log可能會有些差異,在這之中,會有一個系統事件看起來如下圖....
    • Q:看到了那些事件?
    • A:在我的機器上(Win7,原作者是Win2k3, .NET 2.0 SP1),我看到了上述的系統紀錄以及兩則應用程式的紀錄
    • 在Report中,問題簽章中可以看到是w3wp.exe crash,原因是System.NullReferenceException,而在Error event中更可以看到,System.NullReferenceException發生於 Review.Finalize(),這兩則訊息讓我們不用真的去Debugg就可以知道答案,但有些IIS/OS/Framework 你並不會取得Error event log。
    • Q:結束代碼'0xe0434f4d'(在系統事件中)指的是?
    • A:文章的解釋為結束代碼'0xe0434f4d'代表SQLException, NullReferenceException or a FileNotFoundException.,但是在我的系統事件中沒有,只有事件識別碼5011,代表WAS發生錯誤。
    • Q:從事件紀錄當中找到甚麼原因導致Crash
    • A:從第二則應用程式紀錄中,就可以知道原因了。在 Review.Finalize()時,發生了非預期的NullReferenceException。

Get a memory dump

  1. 再次開啟http://127.0.0.1/Reviews.aspx頁面,但先不點選refresh
  2. 用工作管理員開啟cmd.exe 同時移至Debugger tool 目錄cd C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64 並執行 Adplus -crash -pn w3wp.exe -o D:\dumps
    • Q:會出現一個新的視窗,那是甚麼?
    • A:這代表Debugger,一種跟cmd.exe很像的視窗,就是沒有UI的WinDbg。
    • Q:這個Debugger在等甚麼?Hint:在WinDbg中可看一下ADplus/crash mode 的說明文件
    • A:從下表可知在ADplus/crash mode 下,evens跟Exception的關係,以及發生時的處理方式。縮寫  L=Log entry in debuglog, T=Record Time, S=Log stack, M=Mini dump, F=Full dump, E=log in the eventlog。也可以藉由cfg控制每個事件的處理方式。(從ADplus_report.txt內可以看到)
    • Code Exception/Event Explanation/Note 1st Chance 2nd Chance
      av Access Violation An attempt to access to uninitialized memory.
      This is also thrown for .net NullReferenceExceptions
      LTSM LTSFE
      ch Invalid Handle An attempt to access data using an invalid handle. Eg. window, file or other native handle. LTSM LTSFE
      ii Illegal Instruction An attempt is made to execute an invalid instruction. LTSM LTSFE
      dz Integer divide Integer divide by zero. Even in .net. LTSM LTSFE
      c000008e Floating divide Floating point divide by zero. LTSM LTSFE
      iov Integer overflow An attempt to store more than 32/64 bits in an integer. LTSM LTSFE
      lsq Invalid Lock Sequence An attempt is made to execute an invalid lock sequence. LTSM LTSFE
      sov Stack overflow The memory used for the stak (method pointers, parameters, local vars etc.) exceeds the amount allocated for the stack. Typically caused by recursion. LTSM LTSFE
      eh C++ EH exception C++ EH Exception LTS LTSFE
      * Unknown exception Any exception not specifically defined in this list LTS LTSFE
      clr .NET Exception Any .net exception - LTSFE
      bpe Breakpoint A break point is hit or the debugger window is shut down. LTSME -
      wkd Wake Debugger Wake debugger. LTSME -
      ld Dll Load A dll is loaded in the process L L
      ud Dll Unload A dll is unloaded from the process L L
      epr Exit Process The process exits. LTSFE -
      aph Application Hang The Windows OS has determined that the process is no longer responding. LTSM LTSFE
      cce Ctrl+C Console App Ctrl+C or Ctrl+Break is passed to a console app. LTSM LTSFE
      dm Data Misaligned A unit of data is read from or written to an address that is not a multiple of the data size. LTSM LTSFE
      gp Guard page violation An attempt to access data within a guard page. LTSM LTSFE
      ip In Page IO Error I/O Error LTSM LTSFE
      isc Invalid System Call Invalid System Call LTSM LTSFE
      sbo Stack Buffer overflow Stack buffer overflow LTSM LTSFE
  3. 藉由點擊refresh重現剛剛的事件
    • Q:在Dump資料夾中,出現了那些檔案?
    • A:將會出現以下檔案:
    • ADPlus_report.txt
    • DebuggerScript.txt
      • (原著:<DIR> CDBScripts)
    • ADPlus_log_2054_2013-03-08_10-08-19-545.log
      • (原著:PID-4628__W3WP.EXE__Date_02-11-2008__Time_13-14-0808.log)
    • Process_List.txt
      • (我沒有)
    • MINIDUMP_FirstChance_av_AccessViolation_w3wp.exe__2054_2013-03-08_10-10-06-360_1b48.dmp
      • 原著:PID-4628__W3WP.EXE__1st_chance_AccessViolation__mini_17a4_2008-02-11_13-14-19-358_1214.dmp
    • MINIDUMP_FirstChance_av_AccessViolation_w3wp.exe__2054_2013-03-08_10-10-07-231_1b48.dmp
      • 原著:PID-4628__W3WP.EXE__1st_chance_Process_Shut_Down__full_17a4_2008-02-11_13-14-31-889_1214.dmp
      • 我有兩個AccessViolation而沒有chance_Process_Shut_Down
    • FULLDUMP_SecondChance_clr_NET_CLR_w3wp.exe__2054_2013-03-08_10-10-07-940_1b48.dmp
      • 原著:PID-4628__W3WP.EXE__2nd_chance_NET_CLR__full_17a4_2008-02-11_13-14-20-093_1214.dmp
    • Apart from this we have a minidump for a 1st chance access violation (probably our NullReferenceException), and then a 2nd chance CLR exception (the same nullreference exception, but now 2nd chance since it wasn't handled), followed by a process shutdown dump.  You can see in which order they occurred by looking at the timestamps on them.(很難翻,先放著)

Open the dump in windbg

  1. 開啟WinDbg並載入FULLDUMP_SecondChance_clr_NET_CLR_w3wp.exe__2054_2013-03-08_10-10-07-940_1b48.dmp。這個dump比FirstChance先出現。
    • Note: 當出現Exception時,你有機會在Try Catch 區塊中擷取。當Exception第一次出現時,它就成為了1st chance exception而並非無法挽救(如果有try catch區塊)。但是如果沒有任何例外控制,將會引發2nd chance exception (unhandled exception) 而且任何 2nd chance exceptions 將會中止程序。
  2. 設定Symbol Path跟 Load SOS(忘了請查閱.NET Debugging Demos Lab 1: Hang - Review)
    • 在Crash dump中,一開始的Thread 就是產生Exception的Thread
    • Q:當你打開Dump時,是哪個Thread?Hint:從指令列可以看出。
    • A:一開始的Thread id是20,如果執行!threads 從輸出中即可發現它就是Finalizer thread
      • 從 !threads 的輸出中也可以看到發生Eceeption的Address 00000000ffc76d80

Examine the callstacks and the exception

  1. 檢視callstacks
    • kb 2000
    • !clrstack
    • Q:Thread的格式是?
    • A:從kb的結果可以知道,這個Thread是一個finalizer thread 但是!clrstack並沒有辦法給更多其他的資訊,因為我們處在Exception Handlecode裡。
    • Q:Thread在幹嘛?
    • A:Thread嘗試的去呼叫某個物件的finalizer,在呼叫的過程中引發了1st exception而現在引發了2ec exception
  2. 藉由執行 !pe 檢視exception(Note: !pe/!PrintException 將會印出這stack中的exception內容)
    • Q:是甚麼Exception?
    • A:A NullReferenceException 
    • Note:在某些狀況下,Exception將會被rethrown,此時stacktrace可能無法得知該Exception的詳細內容;若能找到原始的Exception(1 st)才能得更多的資訊。
  3. 藉由執行 !dso 查看Stack中的Object去找到原始的Exception的位址
    • Q:原始位置為何?
    • Hint:查看剛剛用 !pe 從rethrown exception查出的Address(00000000ffc76d80) 跟你現在查出的物件Address進行比對,將會發現有多重exception,有一些Address與rethrown exception相同,而最底部的就是最原始的exception(會有一個不同address的)
    • A:唯一擁有不同Address的Exception為000000013fc7fa40
  4. 藉由!pe <original exception address>  印出原始Exception的資訊

    • Q:在哪個method時發生exception?
    • A:在Review.Finalize()
    • Q:哪個物件?
    • A:Review
    • Q:正常來說從ASP.NET所拋出的Exception皆藉由global exception handler 跟error page顯示給使用者知道,為什麼裡不會呢?而且還引發Crash?
    • A:ASP.NET的global exception handler只handle來自執行request 時。但在這個範例裡,exception是來自於finalizer ,也就是說跟user沒有關係,所以就沒有讓看到error page的必要。
    • In 1.1. and if you use legacy exception handling in 2.0, exceptions on the finalizer are non-fatal, i.e. they will not crash the process, instead the finalizer would just stop finalizing the current object.  This is very dangerous since this means that you may not process code that releases native objects and thus you may end up with a memory leak that is very hard to troubleshoot or other resource leaks like connection leaks etc.  All unhandled exceptions on the finalizer thread should be investigated and taken seriously.(第一句難翻.....擱置)

Examine the code for verification

1. 打開 Review.cs並找到在Review class中的 destructor/finalizer
~Review()
    {
        if (quote.ToString() != string.Empty)
        {
            quote = null;
        }
    }
  • Q:哪一列指令或者哪個方法將會引發Exception
  • A:唯一有可能會引發NullRef 的是當quote = null 呼叫quote.ToString()
也可以藉由檢視拆解後的function,試著更明確的指出是哪個funcation引發Exception
執行 !u <IP shown in the exceptionstack>(IP從!pe <original exception address> 的結果中找到→000007FF0020195A )

藉由上圖及Review.cs中的methods,可以發現幾件事情。
1.找到了000007FF0020195A 的位置
2.在000007FF0020195A 位置之前我們將r11設給 ptr [rbx+8],這就是當我們使用r11時所引發的nullreference exception。rbx 是用來hold住Pointer(這個case指的是Review object),所以問題在於[rbx+8]。

執行!do <object address>Dumping out the Review object (可以從 !dso 中找到它的位置→000000017fcb8ac0 ),我們可以發現在offset 8的位置他的值為null(0000000000000000 )
 3.  如果這還不足以證明,也至少可以看大概是哪個地方引發的exception。
以下太難翻.....
 We have an instruction to move "" into edx, i.e. an empty string (string.Empty), and right after that there is a string compare operation which matches up to if(quote.ToString() != string.Empty)

Btw, the reason that quote was null is because ClearReview() sets it to null... The proper way to resolve this issue would have been to not have a finalizer (since we don't really need one given that we don't have to clean up any native resources) or at the very least we should have called GC.SuppressFinalize() in the ClearReview() method.